-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can Autosklearn handle Multi-Class/Multi-Label Classification and which classifiers will it use? #1429
Comments
Hey again @asmgx, Just as a note, the example you give at first is multi-label as there are multiple label columns, and not just one. Method 2 will not work as we do not natively support Multi-class mutli-label classification. This is due to the fact sklearn models usually don't support this naitevly and require adapters, similiar to the ones you show in option 1.. However option 1. will also not work, read the description of it carefully https://scikit-learn.org/stable/modules/generated/sklearn.multiclass.OneVsRestClassifier.html#sklearn-multiclass-onevsrestclassifier. It supports one or the other but not both simultaneously. In general, I don't think support for multi-class multi-label is very widespread and I would advise reframing the problem as you suggest in 3.. One option as you suggest is to fit one classifier per multi-class target column, combining their results at the end. Another option is basically one hot-encode each multi-class target column into multiple binary one. In the same way you can one-hot encode categorical columns, you can do the same to target columns which contain multiple classes, repeating this for each column in your output. This can increase your target columns dramitically depending on the number of classes and it also makes translating between your original targets and the one-hot encoded variant more difficult to implement. But to reiterate, we don't support it natively and implementation is left to the user. Best, |
Hello to all, For my undergraduate thesis, I am trying to benchmark some automl tools. Specifically, I am trying to plot ROC curves and calculate Area under ROC for multiclass (not multilabel) classification for some datasets coming from OpenML-CC18 using Autosklearn. Basicaly I am trying to implement this using AutoSklearnClassifier. As eddiebergman already correctly pointed out, the clf = OneVsRestClassifier(automl, n_jobs=-1) clf.fit(X_train, y_train) bit cann't be directly used. Can you please provide me an example of how can be done? Thanks in advance! |
Hi @vgargan2, We support regular Multi-class classification out of the box. I realize we don't have an example to show this but we regular test on benchmark openml/s/218 which is similar in spirit to OpenML-CC18. Incase this thread begins to confuse other readers, I'm going to make the 4 distinctions and clarify which we support.
Best, |
@eddiebergman this is confusing. Do you mean if I have a data set with targeted values as following is Supported?
Can you advice how can we work with this example? |
@asmgx, I apologise, I misread your example in the very first section. Yes it would support that example which is I read the column headers as being non binary and assumed you meant This whole issue seems to illuminate that we should have a clear section about this. I also sometimes mix up which is For those scrolling to the bottom of the issue # Nothing has to be done for mutli-label OR multi-class
X = np.random.rand(4, 2) # 4 examples, 2 features
# For binary
binary_y = [1, 0, 1, 1]
automl = AutoSklearnClassifier()
automl.fit(X, binary_y)
# For multiclass
multiclass_y = [1, 2, 0, 2]
automl = AutoSklearnClassifier()
automl.fit(X, multiclass_y)
# For multilabel
multilabel_y = [[1, 0], [0, 0], [1, 1], [1, 0]]
automl = AutoSklearnClassifier()
automl.fit(X, multilabel_y)
# For multiclass-multilabel y
# NOT SUPPORTED
mutliclass_multilabel_y = [[1, 2], [0, 2], [0, 0], [2, 1]] |
@eddiebergman Thanks, Any documents support that? |
There are no special things done, when doing multi-label classification, we only consider models that natively support multilabel classification.
There's no document to support this but there probably should be to describe all this. |
We document the supported tasks here, but we should potentially rename this to "support target types" and link to scikit-learn's glossary, for example for multi-label we should make this a link to https://scikit-learn.org/stable/glossary.html#term-multilabel. Indeed, we have no documentation on which classifier is used for which target types and it would be great to have that. |
I have been trying to use AutoSklearn with Multi-class classification
so my labels are like this
0 1 2 3 4 ... 200
1 0 1 1 1 ... 1
0 1 0 0 1 ... 0
1 0 0 1 0 ... 0
1 1 0 1 0 ... 1
0 1 1 0 1 ... 0
1 1 1 0 0 ... 1
1 0 1 0 1 ... 0
I used this code
but now I want to train Autosklearn on Multi-class Multi-label classification
Which method of these shall i use?
1-
2-
3-
I have to loop one class at a time and use
so it will be like
so I get a different model for each label?
The text was updated successfully, but these errors were encountered: