Spacy NER model extracting entire sentence instead of stopping at correct end index #12627
Replies: 2 comments 10 replies
-
Hello @YasmineMh, that's strange. Would you have a minimal example to give (if it's not confidential), so we can explore this in more depth? One reason that pops to mind and could explain this behavior is the interaction between spaCy and HuggingFace tokenizers (see here for a development on the problem). Is a warning issued about the size of the tokenized example? |
Beta Was this translation helpful? Give feedback.
-
Reason might be the model being uncased and your data is not cased. According to my experience of building NER models, lack of casing is a huge issue in general. Do you know a Transformer based NER built upon to this model? How about using a cased model? |
Beta Was this translation helpful? Give feedback.
-
Hey, I have been using spacy version 3.4.1 to train multiple NER models on different types of data such as dates, text, and amounts. However, I have noticed that sometimes the model is not stopping at the correct end index and instead is extracting all the remaining text.
This is an example:
paragraph :
<some text> date <some text>
annotation :
date
prediction :
date <some text>
I'm using "spacy-transformers.TransformerModel.v3" and "nlpaueb/legal-bert-small-uncased" from hugging face in my config file.
I checked the annotated data to be sure that I don't have long sentences annotated like this one.
I'm not sure if this behavior is normal, or if there are any hyperparameters that could fix it.
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions