Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CustomStreamer sample can print replacement character � #1381

Open
e87tn95h opened this issue Dec 13, 2024 · 2 comments
Open

CustomStreamer sample can print replacement character � #1381

e87tn95h opened this issue Dec 13, 2024 · 2 comments

Comments

@e87tn95h
Copy link

The following comparison in multinomial_causal_lm sample as CustomStreamer sample Python code does not make sense. It never match because two string (text[-3:] and chr(65533)) lengths do not match.

elif len(text) >= 3 and text[-3:] == chr(65533):
# Don't print incomplete text.
pass

It looks like that elif text[-1] == chr(65533): is enough to detect � (U+FFFD: replacement character) for this case in Python.

@e87tn95h e87tn95h changed the title CustomStreamer sample will print replacement character � CustomStreamer sample can print replacement character � Dec 13, 2024
@pavel-esir
Copy link
Contributor

@e87tn95h you are right. Without much consideration we copied logic from c++ here, but it was wrong and you are right. is 3 bytes long, but len('�') == 1. Thanks for noticing this!

Will you open PR yourself or should we fix it?

@e87tn95h
Copy link
Author

I don't have PR, also I won't be able to make PR for this project as speedy as you.
If this shared idea is good for you and your project, please feel free taking it (and then close this issue).

Thank you for reading and regards,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants