annotating_components don't seem to annotate #10654
Replies: 2 comments
-
Sorry you're having trouble with this.
If you need From the info above, it looks like you might also need to add Developing a new tok2vec embed architecture and feature extractor that handle features outside of the |
Beta Was this translation helpful? Give feedback.
-
@marcuscollins Had you find a solution to this? I also have a custom |
Beta Was this translation helpful? Give feedback.
-
I am playing around with building a custom parser, for which I want to add some custom features derived from dependencies and lemmas. I'm using Spacy 3.2 with python 3.8.11. Here's the top of the config:
the "extra_features" part is borrowed from #6527, and the "ner" part is based on the rel_components project/tutorial from Explosion, but is heavily modified. That part works, but previously only used features that were from tok2vec. The first four parts of my pipeline are all taken from
en_core_web_lg
, like so:While debugging the "extra_features" pipe, I discovered that
Here, for instance, are the first four rows of the usual features (these happen to be NORM, PREFIX, SUFFIX, POS):
and the extra_features, which are just the default value of
np.ndarray(size=(10))
:If the system was producing the extra features correctly, it should be a 1 x 17 array of ones and zeros. I chose the default so that the system would break if it didn't produce the right output on test sentences of any other length. But then I noticed that all the tags were 0, and started digging around to find that none of .tag, .dep, and .lemma were actually set.
It was my understanding that adding
tagger, parser, lemmatizer
to theannotating_components
would cause them to produce the required values, but that does not seem to be happening. Here the the training section of my config:And finally, the pipe definition for extra_features:
Any idea what I'm doing incorrectly? It's really quite frustrating that this doesn't seem to be easy, and that there isn't better documentation. I have considered the possibility that by changing the FeatureExtractor and MultiHashEmbed that I'm using, maybe I'm messing up the original Tok2Vec pipeline, but mine are registered under different names and the functions have different names too, so I don't see how that would happen. It would be nice to figure out where I'm at in the pipeline when this happens.
Beta Was this translation helpful? Give feedback.
All reactions