Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to use token embeddings? #434

Open
aliozts opened this issue Nov 8, 2024 · 2 comments
Open

Ability to use token embeddings? #434

aliozts opened this issue Nov 8, 2024 · 2 comments

Comments

@aliozts
Copy link

aliozts commented Nov 8, 2024

Feature request

Hello, first thank you for the great work for this library and making it open source. I wanted to ask a possibility to get the token embeddings directly from the input. I was not able to find that option. If it's possible it would be great to imitate sentence transformers behaviour

model = SentenceTransformer("all-mpnet-base-v2")
model.encode("This is a test sentence", output_value="token_embeddings")

output-value can be added to the api parameters if possible. Moreover, if possible other types of poolings can be utilized on runtime (I don't know if this is possible in the library)

Motivation

I want to use late chunking and for that, the token embeddings are required.

Your contribution

I can try to help.

@aliozts
Copy link
Author

aliozts commented Dec 2, 2024

Hi, are there any updates on this?

@LLukas22
Copy link

@aliozts If you just need the token embeddings before pooling you can use the embed_all endpoint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants