Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements on Exposed ORT support #976

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

kallebysantos
Copy link
Contributor

Hey there 🙏,
Coming from #947, I did test the new Exposed ORT feature in alpha-v20 and alpha-21 releases. I'd notice that to make it work we must specify 'auto' as inference device:

Current behaviour:

Without specify a device:

import { pipeline } from '@huggingface/transformers';

let pipe = await pipeline('sentiment-analysis');
// The exposed runtime is loaded 
// but when trying to inference:
let out = await pipe('I love transformers!');
//Error: Unsupported device: "wasm". Should be one of: 

Explicit using auto as device:

import { pipeline } from '@huggingface/transformers';

let pipe = await pipeline('sentiment-analysis', null, { device: 'auto' });

let out = await pipe('I love transformers!');
// [{'label': 'POSITIVE', 'score': 0.999817686}]

To solve this annoying config, the PR improves the environment support for Exposed ORT by implicitly selecting 'auto' as the default fallback instead of 'wasm'.

import { pipeline } from '@huggingface/transformers';

let pipe = await pipeline('sentiment-analysis');

let out = await pipe('I love transformers!');
// [{'label': 'POSITIVE', 'score': 0.999817686}]

@xenova
Copy link
Collaborator

xenova commented Nov 26, 2024

Hi again 👋 Apologies for the late reply.

For this PR, could you explain the meaning/origin of the "auto" device? Is this an execution provider defined by the custom ORT runner?

@kallebysantos
Copy link
Contributor Author

kallebysantos commented Nov 26, 2024

Hi again 👋 Apologies for the late reply.

That's ok, saw that you had a loot of work to do!
Btw congratulations v3 is finally launched 🎉

For this PR, could you explain the meaning/origin of the "auto" device? Is this an execution provider defined by the custom ORT runner?

Currently the runner will not look on it to execute, it does based on the environment. So the device property should do nothing in this case.


But the problem is that transformers.js tries to explicit use some device, like "wasm", and that is not possible to execute it without specify "auto".

Example, if we don't specify a device, or anything instead of "auto" it will throw:

const pipe = await pipeline('feature-extraction', 'supabase/gte-small')
// Error: Unsupported device "wasm", Should be one of:    .

const pipe = await pipeline('feature-extraction', 'supabase/gte-small', { device: 'cpu' })
// Error: Unsupported device "cpu", Should be one of:    .

const pipe = await pipeline('feature-extraction', 'supabase/gte-small', { device: 'webgpu' })
// Error: Unsupported device "webgpu", Should be one of:    .

But if I do "auto" it works:

const pipe = await pipeline('feature-extraction', 'supabase/gte-small',  { device: 'auto' })

// [.... embeddings ]
Custom ORT exposed:
console.log(globalThis[Symbol.for("onnxruntime"])
/*
{
  Tensor: [class Tensor],
  env: {},
  InferenceSession: { create: [AsyncFunction: fromBuffer] }
}
*/

So the purpose of this PR is to automatically set "auto" when its running from some Custom ORT. To avoid this annoying config.
Also I did add some global flags to detect if is or not running from Custom ORT.


I invite you to try run transformers.js from Supa stack, Custom ORT rust backend its available from [email protected]^.
You can use it from supabase cli:

npx supabase functions new "ort-test"

npx supabase functions serve

@xenova
Copy link
Collaborator

xenova commented Nov 26, 2024

Thanks for the additional context! 👍 Just on that note, what if an exposed ORT library does allow for different devices to be specified, and the user could choose the version they would like?

For Supabase's runner, specifically, I imagine it is running on CPU, right? So "cpu" could be mapped to the default device on their side perhaps?

@kallebysantos
Copy link
Contributor Author

kallebysantos commented Nov 26, 2024

Thanks for the additional context! 👍 Just on that note, what if an exposed ORT library does allow for different devices to be specified, and the user could choose the version they would like?

For Supabase's runner, specifically, I imagine it is running on CPU, right? So "cpu" could be mapped to the default device on their side perhaps?

Sure, it makes sense. But this way they still need to manually specify a device?

-I need to study more about how transformers.js handle the available devices.

At this moment, could you give me any suggestion? about how can we achieve: the following, without it throws an error?

const pipe = await pipeline('feature-extraction', 'supabase/gte-small')

Do I need to export some property that says "Hey trasnformers.js I'm using "cpu" as default so pls map to it" ?


EDIT: I'd look in both transfomers.js and onnxruntime-common and I think that I got your point, but I still thinking that custom providers should fallback to "auto" as default.

@xenova
Copy link
Collaborator

xenova commented Dec 2, 2024

Note that there is no "auto" execution provider in onnxruntime-web/onnxruntime-node (and this is a layer added on top by Transformers.js).

That said, I think I see what the problem is: for the custom runtime, we don't specify supportedDevices or defaultDevices, which will be sent to the executionProviders option in createInferenceSession

supportedDevices.push('cpu');
defaultDevices = ['cpu'];

device -> execution provider mapping:

export function deviceToExecutionProviders(device = null) {
// Use the default execution providers if the user hasn't specified anything
if (!device) return defaultDevices;
// Handle overloaded cases
switch (device) {
case "auto":
return supportedDevices;
case "gpu":
return supportedDevices.filter(x =>
["webgpu", "cuda", "dml", "webnn-gpu"].includes(x),
);
}
if (supportedDevices.includes(device)) {
return [DEVICE_TO_EXECUTION_PROVIDER_MAPPING[device] ?? device];
}
throw new Error(`Unsupported device: "${device}". Should be one of: ${supportedDevices.join(', ')}.`)
}


So, I think the solution here would be to set them both to undefined by default (instead of [] as it is currently), since then the custom ORT environment will use its defaults: https://onnxruntime.ai/docs/api/js/interfaces/InferenceSession.SessionOptions.html#executionProviders

@kallebysantos
Copy link
Contributor Author

Hey Joshua, thanks for your suggestion

So, I think the solution here would be to set them both to undefined by default (instead of [] as it is currently), since then the custom ORT environment will use its defaults..

Yes it worked, 42ae1fe

Copy link
Collaborator

@xenova xenova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating! ✅ Looks good!

src/env.js Show resolved Hide resolved
src/backends/onnx.js Outdated Show resolved Hide resolved
- Add global variable to check if `IS_EXPOSED_RUNTIME_ENV` -> true if Js
exposes their own custom runtime.
- Applying 'auto' device as default for exposed runtime environment.
- Adding checkings for 'Tensor' and 'InferenceSession' members of the
exposed custom ort.
Copy link
Collaborator

@xenova xenova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for the modifications! I was talking to a colleague about this PR and one thing which was brought up was the possibility of accidental namespace clashes (if onnxruntime is detected globally).

For that reason, I think this should be "opt-in" behaviour based on the presence or absence of an environment variable, and should not be the default for all users.

In the supabase runtime, are you able to set default environment variables. e.g., HF_TRANSFORMERS_USE_EXPOSED_RUNTIME (name subject to change, ofc).

We can then use this to detect when to use the exposed runtime or not.

Comment on lines +62 to +75
// ensure that the runtime implements the necessary functions
// consider use array map if need to check more required members.
if (!Object.hasOwn(onnxruntime, 'Tensor')) {
throw new Error(`Invalid "globalThis[${String(apis.EXPOSED_RUNTIME_SYMBOL)}]" definition. Missing required exported member "Tensor".`)
}

if (!Object.hasOwn(onnxruntime, 'InferenceSession')) {
throw new Error(`Invalid "globalThis[${String(apis.EXPOSED_RUNTIME_SYMBOL)}]" definition. Missing required exported member "InferenceSession".`)
}
if(!Object.hasOwn(onnxruntime?.InferenceSession, 'create')) {
throw new Error(`Invalid "globalThis[${String(apis.EXPOSED_RUNTIME_SYMBOL)}].InferenceSession" definition. Missing required exported member "InferenceSession.create".`)
}

ONNX = onnxruntime;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using an array map would be great here! (minimal code duplication).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants