Ability to use token embeddings? #434

aliozts · 2024-11-08T12:05:26Z

Feature request

Hello, first thank you for the great work for this library and making it open source. I wanted to ask a possibility to get the token embeddings directly from the input. I was not able to find that option. If it's possible it would be great to imitate sentence transformers behaviour

model = SentenceTransformer("all-mpnet-base-v2")
model.encode("This is a test sentence", output_value="token_embeddings")

output-value can be added to the api parameters if possible. Moreover, if possible other types of poolings can be utilized on runtime (I don't know if this is possible in the library)

Motivation

I want to use late chunking and for that, the token embeddings are required.

Your contribution

I can try to help.

The text was updated successfully, but these errors were encountered:

aliozts · 2024-12-02T10:26:51Z

Hi, are there any updates on this?

LLukas22 · 2024-12-12T14:00:18Z

@aliozts If you just need the token embeddings before pooling you can use the embed_all endpoint.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ability to use token embeddings? #434

Ability to use token embeddings? #434

aliozts commented Nov 8, 2024

aliozts commented Dec 2, 2024

LLukas22 commented Dec 12, 2024

Ability to use token embeddings? #434

Ability to use token embeddings? #434

Comments

aliozts commented Nov 8, 2024

Feature request

Motivation

Your contribution

aliozts commented Dec 2, 2024

LLukas22 commented Dec 12, 2024