What does it mean to have CNN as both model and tok2vec? #12260
-
Apologies if this has been asked previously, but diving deeper into my configurations, I realized I'm not sure I understand what it means to have the model component and the model's tok2vec component both have CNN specified. The component section looks like this:
In my understanding - the HashEmbedCNN embeds each word and encodes them as vectors containing their context. That would mean for each document we have a TxE matrix (where T is the number of tokens and E is the encoding size). The docs for TextCatCNN say "A neural network model where token vectors are calculated using a CNN. The vectors are mean pooled and used as features in a feed-forward network." If I understand correctly - the tok2vec component is creating token vectors. Does the TextCatCNN in the config above just implement the mean-pooling and feed-forward network? Or is it doing another set of convolutions on the encoded text? I read through this post, but was a bit confused by what it said about sentence splitting. Is this going on under the hood? Hope this question makes sense - realized I've been using this for a while without really understanding it. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 5 replies
-
This comes down to the details of the "listener" mechanism. Rest assured that we don't love this part either --- I tried really hard to find a better solution, and unfortunately I still don't have one. The listener is a way for multiple components to share weights. So you can have a textcat and a POS tagger, and they both get the same token vectors, and those token vectors will be updated by gradients from both components. Consider the following two layer definitions:
Both of these functions return a |
Beta Was this translation helpful? Give feedback.
-
The CNN in the cnn_model = (
tok2vec
>> list2ragged()
>> attention_layer
>> reduce_sum()
>> residual(maxout_layer >> norm_layer >> Dropout(0.0))
) This code here represents a chain of functions. In the usual notation if would be something like The This is going to be combined with spaCy/spacy/pipeline/textcat.py Line 38 in 2d4fb94 "spacy.TextCatBOW.v2" .
The architectures get combined here So in conclusion, one member of the textcat ensemble is a the neural network that starts with a |
Beta Was this translation helpful? Give feedback.
This comes down to the details of the "listener" mechanism. Rest assured that we don't love this part either --- I tried really hard to find a better solution, and unfortunately I still don't have one.
The listener is a way for multiple components to share weights. So you can have a textcat and a POS tagger, and they both get the same token vectors, and those token vectors will be updated by gradients from both components.
Consider the following two layer definitions:
HashEmbedCNN
: https://github.com/explosion/spaCy/blob/v3.5.0/spacy/ml/models/tok2vec.py#L34Tok2VecListener
: https://github.com/explosion/spaCy/blob/v3.5.0/spacy/ml/models/tok2vec.py#L17 , https://github.com/explosion/spaCy…