zshot spaCy ignoring email and phone entities? #13419
-
Hi All, We are using spaCy for entity and keyword extraction and noticed that it generally ignores phone numbers and emails within our text data. I have included a basic, redacted example below where would have expected the phone number to be extracted. Example, "Our Air Conditioning Installation service in XXXX offers top-quality solutions to keep your home or office cool and comfortable. Call us ###-###-#### or email at [email protected]" Anything im missing? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hi! The pretrained NER model from the English models doesn't have specific labels for For this type of information extraction, I would recommend using builtin attributes that spaCy defines on the token-level, such as So for instance, you can write something like:
and you'll get
|
Beta Was this translation helpful? Give feedback.
Hi!
The pretrained NER model from the English models doesn't have specific labels for
Email
orPhone_number
. It does however recognizeCardinal
andOrdinal
entities, which might be relevant for your use-case. They might not capture exactly what you want though.For this type of information extraction, I would recommend using builtin attributes that spaCy defines on the token-level, such as
like_num
andlike_email
. These can then be used to design custom matcher rules, see here.So for instance, you can write something like: