Support for FastText word embeddings #2154
Replies: 6 comments
-
We do support changing the It wouldn't let us take advantage of the character features when using the vectors as features in models --- but then, neither would changing the |
Beta Was this translation helpful? Give feedback.
-
Yes it would be possible to to use Changing the |
Beta Was this translation helpful? Give feedback.
-
The ELMO vectors would naturally use the Perhaps we could support back-off logic within the Vocab's |
Beta Was this translation helpful? Give feedback.
-
Indeed for ELMO vectors the If we implement the switch in Vocab's What I thought of was to split the
so we can in each case (word2vec-like or fasttext) use the We can possibly add a new method to return vectors in context (something like |
Beta Was this translation helpful? Give feedback.
-
I think having this type of flexibility would be a massive boon considering the rich variety of embedding types which now exist. Flair embeddings come to mind, alongside BERT and ELMO. |
Beta Was this translation helpful? Give feedback.
-
@samhardyhey the more word embedding models the better |
Beta Was this translation helpful? Give feedback.
-
Currently, Vectors.__getitem__ returns the vector associated with the given key (https://github.com/explosion/spaCy/blob/master/spacy/vectors.pyx#L88).
This is expected for most word vectors, but it makes the integration of FastText word embeddings difficult. Indeed, the vector of a word is generated by summing the vectors associated with its char n-grams (see the gensim implementation for more details: https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/models/keyedvectors.py#L1657).
This is convenient as we can generate an embedding for an OOV word if we have a vector for at least one of its char n-grams.
One way to support FastText word embedding would be to let __getitem__ behave differently given a
backend
attribute of the Vectors (which in this case would befasttext
).If you think this feature is worth adding to spaCy, I can participate in the implementation.
Your Environment
Beta Was this translation helpful? Give feedback.
All reactions