EIS Unified ChatCompletions Integration #118871

jaybcee · 2024-12-17T17:37:16Z

Parent PR: #118301

We need to call EIS via Elasticsearch. This PR implements the functionality.

Testing

Run via

1. `./gradlew localDistro`
2. `cd build/distribution/local/elasticsearch-9.0.0-SNAPSHOT`
3. `./bin/elasticsearch -E xpack.inference.elastic.url=https://localhost:8443 -E xpack.inference.elastic.http.ssl.verification_mode=none -E xpack.security.enabled=false -E xpack.security.enrollment.enabled=false`

Create endpoint via

curl --location --request PUT 'http://localhost:9200/_inference/completion/test' \
--header 'Content-Type: application/json' \
--data '{
    "service": "elastic",
    "service_settings": {
        "model_id": "claude-3.5-sonnet"
    }
}' -k

We eventually expect to have a default endpoint.
The model name is a bit of a placeholder for now its unclear to me what we expose. In any case its trivial. We have an external to internal mapping.

It returns

{
    "inference_id": "test",
    "task_type": "completion",
    "service": "elastic",
    "service_settings": {
        "model_id": "claude-3.5-sonnet",
        "rate_limit": {
            "requests_per_minute": 1000
        }
    }
}

Then we perform inference via

curl --location 'http://localhost:9200/_inference/completion/test/_unified' \
--header 'Content-Type: application/json' \
--data '{
    "messages": [
        {
            "role": "user",
            "content": "In only two digits and nothing else, what is the meaning of life?"
        }
    ],
    "model" : "claude-3.5-sonnet",
    "temperature": 0.7,
    "max_completion_tokens": 300
}' -k

Returns

curl --location 'http://localhost:9200/_inference/completion/test/_unified' \
--header 'Content-Type: application/json' \
--data '{
    "messages": [
        {
            "role": "user",
            "content": "In only two digits and nothing else, what is the meaning of life?"
        }
    ],
    "model" : "claude-3.5-sonnet",
    "temperature": 0.7,
    "max_completion_tokens": 300
}' -k
event: message
data: {"id":"unified-a52c5569-6fca-48dd-9a03-cf6b2d999995","choices":[{"delta":{"role":"assistant"},"index":0}],"model":"claude-3.5-sonnet","object":"chat.completion.chunk"}

event: message
data: {"id":"unified-a52c5569-6fca-48dd-9a03-cf6b2d999995","choices":[{"delta":{"content":"42"},"index":0}],"model":"claude-3.5-sonnet","object":"chat.completion.chunk"}

event: message
data: {"id":"unified-a52c5569-6fca-48dd-9a03-cf6b2d999995","choices":[{"delta":{},"index":0}],"model":"claude-3.5-sonnet","object":"chat.completion.chunk"}

event: message
data: {"id":"unified-a52c5569-6fca-48dd-9a03-cf6b2d999995","choices":[{"delta":{},"finish_reason":"stop","index":0}],"model":"claude-3.5-sonnet","object":"chat.completion.chunk"}

event: message
data: {"id":"unified-a52c5569-6fca-48dd-9a03-cf6b2d999995","choices":[{"delta":{},"index":0}],"model":"claude-3.5-sonnet","object":"chat.completion.chunk","usage":{"completion_tokens":4,"prompt_tokens":22,"total_tokens":26}}

event: message
data: [DONE]

… entity

…ntegration

jaybcee

Fortunately this worked mostly out of the box. I had so change EIS a bit to reflect the SSE.

https://github.com/elastic/eis-gateway/pull/207

It sends the response with a data prefix.

Did we want to implement more tests?

jaybcee · 2024-12-18T21:33:59Z

...arch/xpack/inference/services/elastic/completion/ElasticInferenceServiceCompletionModel.java

+        return new URI(elasticInferenceServiceComponents().elasticInferenceServiceUrl() + "/api/v1/chat/completions");
+    }
+
+    // TODO create the Configuration class?


@jonathan-buttner

Can you explain why you had this TODO? I'm not sure what it brings.

Jason and I talked about this offline. Basically we need to add a Configuration class like this: https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/openai/completion/OpenAiChatCompletionModel.java#L127

Just a follow up, I think we can address this after we merge this PR. Maybe create an issue so we don't forget it.

Issue here: https://github.com/elastic/search-team/issues/8997

jaybcee · 2024-12-18T21:34:24Z

.../inference/services/elastic/completion/ElasticInferenceServiceCompletionServiceSettings.java

+
+    public static final String NAME = "elastic_inference_service_completion_service_settings";
+
+    // TODO what value do we put here?


@timgrein , do you have any suggestion? I'm not up to speed on the state of rate limiting.

Good question, I guess we could use the default from bedrock for now?

It depends on the environment and quota set... We should leave it as is for now unless any objection. Is it ok to leave the TODO? I'll drop a note in the ES integration issue.

For the other bedrock service settings for chat completion it's like 240: https://github.com/elastic/elasticsearch/blob/32ddbb3449d19a0970b96eefe960d4ab006357fc/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/amazonbedrock/AmazonBedrockServiceSettings.java

Maybe we lower it to something closer to that 🤷‍♂️

I put it to 240 for now, but, a customers quota and our shared quota can be different. In any case rate limiting is mildly opaque to me. This is a good enough number for now.

jaybcee · 2024-12-18T21:34:58Z

.../inference/services/elastic/completion/ElasticInferenceServiceCompletionServiceSettings.java

+    public static ElasticInferenceServiceCompletionServiceSettings fromMap(Map<String, Object> map, ConfigurationParseContext context) {
+        ValidationException validationException = new ValidationException();
+
+        // TODO does EIS have this?


@timgrein, same thing, do we want limit per model at all?

Do you mean rate limit grouping per model? Not yet, I think we'll group on project ids first. When ELSER is available on EIS we can additionally group by model.

I was not clear. I meant in the context of ES. Or did you mean we should rate limit on project id within ES?

jaybcee · 2024-12-18T21:36:46Z

...rch/xpack/inference/external/request/elastic/EISUnifiedChatCompletionRequestEntityTests.java

+    private static final String ROLE = "user";
+    private static final String USER = "a_user";
+
+    // TODO remove if EIS doesn't use the model and user fields


@maxhniebergall, we need the model. The user field is a bit ambiguous. Do we set it and ignore it or should we stop sending it?

Let's discuss at the inference sync tomorrow

Looks like we'll get rid of it for now. It's available for some Bedrock models but it has to passed in an odd way. I'll remove the references to it in the code as well.

As for its usage, I don't think we use it in a meaningful way. My brief Googling shows that its useful for the provider to identify one of your users who is "jailbreaking" the LLM should you get suspended.

elasticsearchmachine · 2024-12-19T02:27:26Z

Pinging @elastic/search-inference-team (Team:Search - Inference)

elasticsearchmachine · 2024-12-19T02:27:27Z

Pinging @elastic/search-eng (Team:SearchOrg)

timgrein

Nice 👏

Left some comments. Requested changes, because of the usage of TransportVersion.V_8_16_0 in getMinimalSupportedTransportVersion, that's something we must change I think.

docs/changelog/118871.yaml

.../elasticsearch/xpack/inference/external/elastic/EISUnifiedChatCompletionResponseHandler.java

...g/elasticsearch/xpack/inference/external/http/sender/EISUnifiedCompletionRequestManager.java

...a/org/elasticsearch/xpack/inference/external/unified/UnifiedChatCompletionRequestEntity.java

...java/org/elasticsearch/xpack/inference/services/elastic/ElasticInferenceServiceSettings.java

timgrein · 2024-12-19T15:15:57Z

.../inference/services/elastic/completion/ElasticInferenceServiceCompletionServiceSettings.java

+
+    public static final String NAME = "elastic_inference_service_completion_service_settings";
+
+    // TODO what value do we put here?


Good question, I guess we could use the default from bedrock for now?

timgrein · 2024-12-19T15:17:05Z

.../inference/services/elastic/completion/ElasticInferenceServiceCompletionServiceSettings.java

+    public static ElasticInferenceServiceCompletionServiceSettings fromMap(Map<String, Object> map, ConfigurationParseContext context) {
+        ValidationException validationException = new ValidationException();
+
+        // TODO does EIS have this?


Do you mean rate limit grouping per model? Not yet, I think we'll group on project ids first. When ELSER is available on EIS we can additionally group by model.

timgrein · 2024-12-19T15:17:38Z

.../inference/services/elastic/completion/ElasticInferenceServiceCompletionServiceSettings.java

+
+    @Override
+    public TransportVersion getMinimalSupportedVersion() {
+        return TransportVersions.V_8_16_0;


I think we need a new TransportVersion here, right? (requesting changes, I guess this could break things otherwise)

What is the correct version? 8.18?

I think you need to add a specific one to TransportVersions. Collapsing transport versions happens after a release AFAIU

Something like:

public static final TransportVersion ELASTIC_INFERENCE_SERVICE_UNIFIED_CHAT_COMPLETIONS_INTEGRATION = def(8_XXX_00_0);

timgrein

Removing request changes to unblock this PR, when I'm on PTO - we chatted about the TransportVersion change, which needs to be addressed

Unblocking the PR, when I'm on PTO - we've discussed the changes, which need to be addressed

Co-authored-by: Tim Grein <[email protected]>

…inference/external/unified/UnifiedChatCompletionRequestEntity.java Co-authored-by: Tim Grein <[email protected]>

jonathan-buttner

Looking good, I left a few questions/suggestions

jonathan-buttner · 2024-12-19T14:04:58Z

.../elasticsearch/xpack/inference/external/elastic/EISUnifiedChatCompletionResponseHandler.java

+    @Override
+    public InferenceServiceResults parseResult(Request request, Flow.Publisher<HttpResult> flow) {
+        var serverSentEventProcessor = new ServerSentEventProcessor(new ServerSentEventParser());
+        var openAiProcessor = new OpenAiUnifiedStreamingProcessor(); // EIS uses the unified API spec


Note to myself to move and rename that class: #119085

jonathan-buttner · 2024-12-19T14:14:40Z

...rc/main/java/org/elasticsearch/xpack/inference/services/elastic/ElasticInferenceService.java

@@ -97,7 +127,7 @@ protected void doInfer(
        TimeValue timeout,
        ActionListener<InferenceServiceResults> listener
    ) {
-        if (model instanceof ElasticInferenceServiceModel == false) {
+        if (model instanceof ElasticInferenceServiceExecutableActionModel == false) {


Just a note for reviewers, the reason I did this is because the completion model doesn't adhere to the visitor pattern like the sparse embedding model does. This could get weird if we eventually support the completion task type in the non-unified API. If that happens I suppose we could either create a new model class or undo this class hierarchy and add the visitor pattern.

jonathan-buttner · 2024-12-23T14:15:07Z

.../inference/external/request/elastic/ElasticInferenceServiceUnifiedChatCompletionRequest.java

+    ) {
+        this.unifiedChatInput = Objects.requireNonNull(unifiedChatInput);
+        this.model = Objects.requireNonNull(model);
+        this.uri = model.uri();


nit: Since we're keeping a reference to the model we can probably just do this.model.uri() when we need it.

Nice catch ;)

jonathan-buttner · 2024-12-23T14:19:03Z

...a/org/elasticsearch/xpack/inference/external/unified/UnifiedChatCompletionRequestEntity.java

+            builder.field(MAX_COMPLETION_TOKENS_FIELD, unifiedRequest.maxCompletionTokens());
+        }
+
+        // Underlying providers except OpenAI only return 1 possible choice.


Suggested change

// Underlying providers except OpenAI only return 1 possible choice.

// Underlying providers expect OpenAI to only return 1 possible choice.

jonathan-buttner · 2024-12-23T14:27:40Z

...arch/xpack/inference/services/elastic/completion/ElasticInferenceServiceCompletionModel.java

+        try {
+            this.uri = createUri();
+        } catch (URISyntaxException e) {
+            throw new RuntimeException(e);


nit: I know right now this is a setting that we set, but long term are we going to hard code this or is it injected? I'm just thinking maybe we should go ahead and return an ElasticsearchStatusException since I wouldn't expect to see this error very often 🤔

Hmm... You bring up an interesting point. I would argue that the minute the setting is loaded, if its not a valid URI, then it should error. Our hardcoded suffix should never be wrong.

The way it currently works (I think), is we take the raw string verbatim and then pass the buck, where the error would be caught here. I'll make it return the error you suggested and leave a TODO to revisit this. I think it can be improved but I don't want to creep out of the scope of this PR.

jonathan-buttner · 2024-12-23T14:28:53Z

...arch/xpack/inference/services/elastic/completion/ElasticInferenceServiceCompletionModel.java

+        return new URI(elasticInferenceServiceComponents().elasticInferenceServiceUrl() + "/api/v1/chat/completions");
+    }
+
+    // TODO create the Configuration class?


Jason and I talked about this offline. Basically we need to add a Configuration class like this: https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/openai/completion/OpenAiChatCompletionModel.java#L127

jonathan-buttner · 2024-12-23T14:29:35Z

...ence/external/request/elastic/ElasticInferenceServiceUnifiedChatCompletionRequestEntity.java

+    public XContentBuilder toXContent(XContentBuilder builder, Params params) throws IOException {
+        builder.startObject();
+        unifiedRequestEntity.toXContent(builder, params);
+        builder.field(MODEL_FIELD, modelId);


Reminder for myself that we can probably merge this and the openai class back together since they're sending the same stuff.

jonathan-buttner · 2024-12-23T14:31:52Z

.../inference/services/elastic/completion/ElasticInferenceServiceCompletionServiceSettings.java

+
+    public static final String NAME = "elastic_inference_service_completion_service_settings";
+
+    // TODO what value do we put here?


For the other bedrock service settings for chat completion it's like 240: https://github.com/elastic/elasticsearch/blob/32ddbb3449d19a0970b96eefe960d4ab006357fc/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/amazonbedrock/AmazonBedrockServiceSettings.java

Maybe we lower it to something closer to that 🤷‍♂️

jonathan-buttner · 2024-12-23T15:13:50Z

...n/inference/src/main/java/org/elasticsearch/xpack/inference/telemetry/TraceContextAware.java

+import org.apache.http.client.methods.HttpPost;
+import org.elasticsearch.tasks.Task;
+
+public interface TraceContextAware {


What do you think about making this an abstract class?

Sure, I prefer to not have inheritance and this was simpler. I will use composition and just make it a concrete member. Lmk if thats ok 😄.

jonathan-buttner · 2024-12-23T15:17:56Z

...external/request/elastic/ElasticInferenceServiceUnifiedChatCompletionRequestEntityTests.java

+                "stream_options": {
+                    "include_usage": true
+                },
+                "user": "a_user"


I think we can remove this since we're not sending user anymore.

# Conflicts: # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/external/request/openai/OpenAiUnifiedChatCompletionRequestEntity.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/external/request/openai/OpenAiUnifiedChatCompletionRequestEntityTests.java

jonathan-buttner and others added 10 commits December 6, 2024 16:02

Starting completion model

ccec39b

Adding model

467747f

initial implementation of request and response handling, manager, and…

69ba46d

… entity

Working response from openai

39e2c27

Update docs/changelog/118301.yaml

7984b69

Fixing comment

be588f4

Adding some initial tests

38a58f9

Merge branch 'main' of github.com:elastic/elasticsearch into ml-eis-i…

cad6f1e

…ntegration

Moving tests around

2e4fb05

Merge branch 'main' into ml-eis-integration

1c0ab90

elasticsearchmachine added the v9.0.0 label Dec 17, 2024

jaybcee and others added 5 commits December 18, 2024 16:22

Address some TODOs

1abe7e6

Merge branch 'main' into ml-eis-integration-jbc

8e47f34

Merge branch 'main' into ml-eis-integration-jbc

9840c62

Remove a TODO

cfd7580

[CI] Auto commit changes from spotless

4fb6930

jaybcee commented Dec 18, 2024

View reviewed changes

jaybcee added 4 commits December 18, 2024 18:09

Merge branch 'main' into ml-eis-integration-jbc

6a3d916

Fix tests

2730017

Merge branch 'main' into ml-eis-integration-jbc

729be3d

Fix more tests

ab979a1

jaybcee requested a review from jonathan-buttner December 19, 2024 02:26

jaybcee marked this pull request as ready for review December 19, 2024 02:26

elasticsearchmachine added the needs:triage Requires assignment of a team area label label Dec 19, 2024

jaybcee added the :SearchOrg/Inference Label for the Search Inference team label Dec 19, 2024

elasticsearchmachine added Team:SearchOrg Meta label for the Search Org (Enterprise Search) Team:Search - Inference labels Dec 19, 2024

elasticsearchmachine removed the needs:triage Requires assignment of a team area label label Dec 19, 2024

Delete docs/changelog/118301.yaml

42c96a2

jaybcee changed the title ~~[WIP] EIS Unified ChatCompletions Integration~~ EIS Unified ChatCompletions Integration Dec 19, 2024

timgrein previously requested changes Dec 19, 2024

View reviewed changes

timgrein reviewed Dec 19, 2024

View reviewed changes

jaybcee and others added 17 commits December 19, 2024 13:19

Update docs/changelog/118871.yaml

34073db

Co-authored-by: Tim Grein <[email protected]>

Update x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/…

b03ee46

…inference/external/unified/UnifiedChatCompletionRequestEntity.java Co-authored-by: Tim Grein <[email protected]>

Rename EISUnifiedChatCompletionResponseHandler

dfa0a05

Renames to ElasticInferenceServiceUnifiedCompletionRequestManager

566bacc

Remove eis from response

3d888b1

Renames EISUnifiedChatCompletionRequest

80db7d3

Renames and comments

df74a82

Adds n=1 hardcode comment

d7dbf61

Fixes

1f96e4a

Renames tool

03a139a

Updates transport

e90e693

propagateTraceContext extraction

26f1ac3

format

1d8d641

Merge branch 'main' into ml-eis-integration-jbc

3399d61

[CI] Auto commit changes from spotless

fcfbfd3

Clean up trace

9b5503a

[CI] Auto commit changes from spotless

2a3faa4

jonathan-buttner reviewed Dec 23, 2024

View reviewed changes

maxhniebergall and others added 7 commits December 23, 2024 13:15

finish merge

67d21f8

Remove OpenAiRequest as it was uneeded

0feba86

Address comments

93cc995

[CI] Auto commit changes from spotless

9dd88bf

Merge branch 'main' into ml-eis-integration-jbc

13c0bdf

[CI] Auto commit changes from spotless

635f734


		public static final String NAME = "elastic_inference_service_completion_service_settings";

		// TODO what value do we put here?

	// Underlying providers except OpenAI only return 1 possible choice.
	// Underlying providers expect OpenAI to only return 1 possible choice.

EIS Unified ChatCompletions Integration #118871

Are you sure you want to change the base?

EIS Unified ChatCompletions Integration #118871

Conversation

jaybcee commented Dec 17, 2024 • edited Loading

jaybcee left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jaybcee Dec 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elasticsearchmachine commented Dec 19, 2024

elasticsearchmachine commented Dec 19, 2024

timgrein left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

timgrein Dec 19, 2024 • edited Loading

Choose a reason for hiding this comment

timgrein left a comment

Choose a reason for hiding this comment

jonathan-buttner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jaybcee Dec 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jaybcee Dec 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jaybcee commented Dec 17, 2024 •

edited

Loading

jaybcee Dec 19, 2024 •

edited

Loading

timgrein Dec 19, 2024 •

edited

Loading

jaybcee Dec 23, 2024 •

edited

Loading

jaybcee Dec 23, 2024 •

edited

Loading