-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle "multicategorical" columns #227
Comments
Hey @davidfstein I can look into this, but you can just consider the column that can take multiple categorical values as text and use this library as it is (?) Or turn the multicategorical columns into multiple columns if that is possible and proceed as usual? But I will look into this :) |
Thanks @jrzaurin ! Actually right now I am following your first suggestion and processing them as text. I was only concerned that this might become inefficient for many features if a separate RNN needs to be trained for each feature. As for splitting into multiple columns, I was thinking you might lose information if each column doesn't contain the full complement of possible categories, but I'm not sure if this concern would lead to substantive performance decrease or not. |
Let's see if I can put some functioning code tomorrow :) |
That would be awesome! Thanks for the great library! |
The pytorch_frame library natively handles categorical variables where the variable may take on multiple categories simultaneously, e.g. row1 = [1, .5, ['a', 'b', 'c']], row2 = [2, .3, ['a']] ...
It would be a nice quality of life enhancement to have this sort of functionality added to the widedeep library.
I believe, though I need to look more carefully, that they do something along the lines of 1) label encode the categories 2) convert to tensors such that multicategorical feature a is replaced with an "embedding" of shape n rows x max categories for single row. Rows with variables taking on fewer than max categories for single row take -1 in the "missing" columns. I imagine there are other options for handling this also.
The text was updated successfully, but these errors were encountered: