-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EIS Unified ChatCompletions Integration #118871
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fortunately this worked mostly out of the box. I had so change EIS a bit to reflect the SSE.
https://github.com/elastic/eis-gateway/pull/207
It sends the response with a data prefix.
Did we want to implement more tests?
return new URI(elasticInferenceServiceComponents().elasticInferenceServiceUrl() + "/api/v1/chat/completions"); | ||
} | ||
|
||
// TODO create the Configuration class? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why you had this TODO? I'm not sure what it brings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jason and I talked about this offline. Basically we need to add a Configuration
class like this: https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/openai/completion/OpenAiChatCompletionModel.java#L127
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a follow up, I think we can address this after we merge this PR. Maybe create an issue so we don't forget it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
public static final String NAME = "elastic_inference_service_completion_service_settings"; | ||
|
||
// TODO what value do we put here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@timgrein , do you have any suggestion? I'm not up to speed on the state of rate limiting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, I guess we could use the default from bedrock for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends on the environment and quota set... We should leave it as is for now unless any objection. Is it ok to leave the TODO? I'll drop a note in the ES integration issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the other bedrock service settings for chat completion it's like 240: https://github.com/elastic/elasticsearch/blob/32ddbb3449d19a0970b96eefe960d4ab006357fc/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/amazonbedrock/AmazonBedrockServiceSettings.java
Maybe we lower it to something closer to that 🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put it to 240 for now, but, a customers quota and our shared quota can be different. In any case rate limiting is mildly opaque to me. This is a good enough number for now.
public static ElasticInferenceServiceCompletionServiceSettings fromMap(Map<String, Object> map, ConfigurationParseContext context) { | ||
ValidationException validationException = new ValidationException(); | ||
|
||
// TODO does EIS have this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@timgrein, same thing, do we want limit per model at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean rate limit grouping per model? Not yet, I think we'll group on project ids first. When ELSER is available on EIS we can additionally group by model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not clear. I meant in the context of ES. Or did you mean we should rate limit on project id within ES?
private static final String ROLE = "user"; | ||
private static final String USER = "a_user"; | ||
|
||
// TODO remove if EIS doesn't use the model and user fields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maxhniebergall, we need the model. The user field is a bit ambiguous. Do we set it and ignore it or should we stop sending it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's discuss at the inference sync tomorrow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we'll get rid of it for now. It's available for some Bedrock models but it has to passed in an odd way. I'll remove the references to it in the code as well.
As for its usage, I don't think we use it in a meaningful way. My brief Googling shows that its useful for the provider to identify one of your users who is "jailbreaking" the LLM should you get suspended.
Pinging @elastic/search-inference-team (Team:Search - Inference) |
Pinging @elastic/search-eng (Team:SearchOrg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice 👏
Left some comments. Requested changes, because of the usage of TransportVersion.V_8_16_0
in getMinimalSupportedTransportVersion
, that's something we must change I think.
.../elasticsearch/xpack/inference/external/elastic/EISUnifiedChatCompletionResponseHandler.java
Outdated
Show resolved
Hide resolved
...g/elasticsearch/xpack/inference/external/http/sender/EISUnifiedCompletionRequestManager.java
Outdated
Show resolved
Hide resolved
...g/elasticsearch/xpack/inference/external/http/sender/EISUnifiedCompletionRequestManager.java
Outdated
Show resolved
Hide resolved
...g/elasticsearch/xpack/inference/external/http/sender/EISUnifiedCompletionRequestManager.java
Outdated
Show resolved
Hide resolved
...a/org/elasticsearch/xpack/inference/external/unified/UnifiedChatCompletionRequestEntity.java
Outdated
Show resolved
Hide resolved
...java/org/elasticsearch/xpack/inference/services/elastic/ElasticInferenceServiceSettings.java
Show resolved
Hide resolved
|
||
public static final String NAME = "elastic_inference_service_completion_service_settings"; | ||
|
||
// TODO what value do we put here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, I guess we could use the default from bedrock for now?
public static ElasticInferenceServiceCompletionServiceSettings fromMap(Map<String, Object> map, ConfigurationParseContext context) { | ||
ValidationException validationException = new ValidationException(); | ||
|
||
// TODO does EIS have this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean rate limit grouping per model? Not yet, I think we'll group on project ids first. When ELSER is available on EIS we can additionally group by model.
|
||
@Override | ||
public TransportVersion getMinimalSupportedVersion() { | ||
return TransportVersions.V_8_16_0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a new TransportVersion
here, right? (requesting changes, I guess this could break things otherwise)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the correct version? 8.18?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to add a specific one to TransportVersions
. Collapsing transport versions happens after a release AFAIU
Something like:
public static final TransportVersion ELASTIC_INFERENCE_SERVICE_UNIFIED_CHAT_COMPLETIONS_INTEGRATION = def(8_XXX_00_0);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing request changes to unblock this PR, when I'm on PTO - we chatted about the TransportVersion
change, which needs to be addressed
Unblocking the PR, when I'm on PTO - we've discussed the changes, which need to be addressed
Co-authored-by: Tim Grein <[email protected]>
…inference/external/unified/UnifiedChatCompletionRequestEntity.java Co-authored-by: Tim Grein <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, I left a few questions/suggestions
@Override | ||
public InferenceServiceResults parseResult(Request request, Flow.Publisher<HttpResult> flow) { | ||
var serverSentEventProcessor = new ServerSentEventProcessor(new ServerSentEventParser()); | ||
var openAiProcessor = new OpenAiUnifiedStreamingProcessor(); // EIS uses the unified API spec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to myself to move and rename that class: #119085
@@ -97,7 +127,7 @@ protected void doInfer( | |||
TimeValue timeout, | |||
ActionListener<InferenceServiceResults> listener | |||
) { | |||
if (model instanceof ElasticInferenceServiceModel == false) { | |||
if (model instanceof ElasticInferenceServiceExecutableActionModel == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note for reviewers, the reason I did this is because the completion model doesn't adhere to the visitor pattern like the sparse embedding model does. This could get weird if we eventually support the completion task type in the non-unified API. If that happens I suppose we could either create a new model class or undo this class hierarchy and add the visitor pattern.
) { | ||
this.unifiedChatInput = Objects.requireNonNull(unifiedChatInput); | ||
this.model = Objects.requireNonNull(model); | ||
this.uri = model.uri(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Since we're keeping a reference to the model we can probably just do this.model.uri()
when we need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch ;)
builder.field(MAX_COMPLETION_TOKENS_FIELD, unifiedRequest.maxCompletionTokens()); | ||
} | ||
|
||
// Underlying providers except OpenAI only return 1 possible choice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Underlying providers except OpenAI only return 1 possible choice. | |
// Underlying providers expect OpenAI to only return 1 possible choice. |
try { | ||
this.uri = createUri(); | ||
} catch (URISyntaxException e) { | ||
throw new RuntimeException(e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I know right now this is a setting that we set, but long term are we going to hard code this or is it injected? I'm just thinking maybe we should go ahead and return an ElasticsearchStatusException
since I wouldn't expect to see this error very often 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... You bring up an interesting point. I would argue that the minute the setting is loaded, if its not a valid URI, then it should error. Our hardcoded suffix should never be wrong.
The way it currently works (I think), is we take the raw string verbatim and then pass the buck, where the error would be caught here. I'll make it return the error you suggested and leave a TODO to revisit this. I think it can be improved but I don't want to creep out of the scope of this PR.
return new URI(elasticInferenceServiceComponents().elasticInferenceServiceUrl() + "/api/v1/chat/completions"); | ||
} | ||
|
||
// TODO create the Configuration class? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jason and I talked about this offline. Basically we need to add a Configuration
class like this: https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/openai/completion/OpenAiChatCompletionModel.java#L127
public XContentBuilder toXContent(XContentBuilder builder, Params params) throws IOException { | ||
builder.startObject(); | ||
unifiedRequestEntity.toXContent(builder, params); | ||
builder.field(MODEL_FIELD, modelId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminder for myself that we can probably merge this and the openai class back together since they're sending the same stuff.
|
||
public static final String NAME = "elastic_inference_service_completion_service_settings"; | ||
|
||
// TODO what value do we put here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the other bedrock service settings for chat completion it's like 240: https://github.com/elastic/elasticsearch/blob/32ddbb3449d19a0970b96eefe960d4ab006357fc/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/amazonbedrock/AmazonBedrockServiceSettings.java
Maybe we lower it to something closer to that 🤷♂️
import org.apache.http.client.methods.HttpPost; | ||
import org.elasticsearch.tasks.Task; | ||
|
||
public interface TraceContextAware { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about making this an abstract class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I prefer to not have inheritance and this was simpler. I will use composition and just make it a concrete member. Lmk if thats ok 😄.
"stream_options": { | ||
"include_usage": true | ||
}, | ||
"user": "a_user" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove this since we're not sending user
anymore.
# Conflicts: # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/external/request/openai/OpenAiUnifiedChatCompletionRequestEntity.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/external/request/openai/OpenAiUnifiedChatCompletionRequestEntityTests.java
Parent PR: #118301
We need to call EIS via Elasticsearch. This PR implements the functionality.
Testing
Run via
It returns
Then we perform inference via
Returns