ValueError: [E949] Unable to align tokens for the predicted and reference docs. #13529
Unanswered
ykyogoku
asked this question in
Help: Coding & Implementations
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello!
I tried to train a POS-Tagger for Tibetan using a custom tokenizer named Botok, but encountered the following error many times:
Another error looks like as follows:
It is clear to me what this error message means: The tokenization of the prediction does not correspond to that of the reference.
In the first example, ཤ is repeated in the prediction, while in the second example, འི་ is missing in the prediction. There are actually much more errors similar to them, and I just took up two of them, in which the errors are visible. My question is where exactly the prediction is done in the training code, and how to fix it. In the reference, there is neither repetition nor missing of the signs.
The following is the code for the custom tokenizer.
[What I have tried so far to fix this error]
Beta Was this translation helpful? Give feedback.
All reactions