-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tokenizer suggestions #63
Comments
davidkoski
added a commit
to ml-explore/mlx-swift-examples
that referenced
this issue
Mar 14, 2024
- swift-tokenizers is getting a lot of updates and fixes, let's track main for now - remove some workarounds that are no longer needed - huggingface/swift-transformers#63
|
davidkoski
added a commit
to ml-explore/mlx-swift-examples
that referenced
this issue
Mar 14, 2024
* switch swift-tokenizers to main, remove some workarounds - swift-tokenizers is getting a lot of updates and fixes, let's track main for now - remove some workarounds that are no longer needed - huggingface/swift-transformers#63
Opened #70 and ml-explore/mlx-swift-examples#28 to address the first topic. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I have a couple of suggestions for the tokenizer API -- things that I have needed to work around here: https://github.com/ml-explore/mlx-swift-examples/blob/main/Libraries/LLM/Tokenizer.swift
add
eosToken
/eosTokenId
to theTokenizer
protocolTokenizer
already hasunknownToken
bosToken
have a way to add to
TokenizerModel/knownTokenizers
or otherwise handle unknown tokenizers"PreTrainedTokenizer": BPETokenizer.self
TokenizerModel
is internal as are the various classes likeBPETokenizer
"Qwen2Tokenizer": "PreTrainedTokenizer"
, which is perhaps the right level -- not exposing too much of the implementationIf these fit in with the vision for the tokenizer API, please consider them!
Thanks
The text was updated successfully, but these errors were encountered: