Enable intel devices CPU/XPU/HPU for python backend #245

yuanwu2017 · 2024-04-22T20:17:16Z

What does this PR do?

Enable CPU device for python backend

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@OlivierDehaene OR @Narsil

Signed-off-by: yuanwu <[email protected]>

yuanwu2017 · 2024-06-23T15:20:25Z

@OlivierDehaene @Narsil Please help to review.

add python backend support for xlm-roberta type model

Signed-off-by: Liu, Kaixuan <[email protected]>

Ipex

Signed-off-by: yuanwu <[email protected]>

Signed-off-by: Liu, Kaixuan <[email protected]>

add XPU and HPU support

Signed-off-by: kaixuanliu <[email protected]>

add import ipex

regisss · 2024-08-28T13:25:54Z

backends/python/server/requirements-hpu.txt

Shouldn't it be the same as https://github.com/huggingface/tei-gaudi/blob/habana-main/backends/python/server/requirements.txt ?

Yes, they are the same, except here I delete some unused python packages.

I was asking because they seem outdated (e.g. optimum-habana == 1.12.0), probably because this PR was opened before the release of optimum-habana 1.13. Can you update this file?

Oh, Yes. I will change it~

Have updated. BTW, we just updated the HPU model forward side implementation(using FlashBert), can you help take another look?

regisss · 2024-08-28T13:29:56Z

backends/src/lib.rs

+    #[instrument(skip(self))]
+    pub async fn warmup_hpu(
+        &self,
+        mut max_input_length: usize,
+        max_token: usize,
+        max_bs: Option<usize>
+    ) -> Result<(), BackendError> {
+        let read_env_var = |key: &str, default: usize| -> usize {
+            env::var(key).ok().map_or(default, |value| value.parse::<usize>().unwrap())
+        };
+        let seq_bucket_size: usize = read_env_var("PAD_SEQUENCE_TO_MULTIPLE_OF", 128);
+        let max_warmup_length: usize = read_env_var("MAX_WARMUP_SEQUENCE_LENGTH", 1024);
+
+        let max_batch_size = match max_bs {
+            Some(value) => value as usize,
+            None => read_env_var("MAX_WARMUP_BATCH_SIZE", 8),
+        };
+
+        let mut batch_sizes: Vec<usize> = powers_of_two(max_batch_size);
+        if let Some(&last) = batch_sizes.last() {
+            if last < max_batch_size {
+                batch_sizes.push(max_batch_size);
+            }
+        }
+        if max_warmup_length > max_input_length {
+            return Err(BackendError::Start(
+                format!("max_warmup_length ({max_warmup_length}) exceeds model's max_input_length ({max_input_length}), you can modify this value adding `-e MAX_WARMUP_SEQUENCE_LENGTH=<new_warmup_length>` to your Docker run command")
+            ));
+        }
+        if seq_bucket_size > max_warmup_length {
+            return Err(BackendError::Start(
+                format!("PAD_SEQUENCE_TO_MULTIPLE_OF ({seq_bucket_size}) exceeds model's max warmup length ({max_warmup_length}), you can modify these values adding `-e PAD_SEQUENCE_TO_MULTIPLE_OF=<new_value>` or `-e MAX_WARMUP_SEQUENCE_LENGTH=<new_value> to your Docker run command`")
+            ));
+        }
+
+        max_input_length = std::cmp::min(max_input_length, max_warmup_length);
+        let mut seq_lengths: Vec<usize> = (seq_bucket_size..max_input_length+1).step_by(seq_bucket_size as usize).collect();
+        if let Some(&last) = seq_lengths.last() {
+            if last < max_input_length {
+                seq_lengths.push(max_input_length);
+            }
+        }
+
+        let mut shapes: Vec<(u32, u32)> = Vec::with_capacity(batch_sizes.len() * seq_lengths.len());
+        for batch_size in &batch_sizes {
+            for seq_length in &seq_lengths {
+                shapes.push((*batch_size as u32, *seq_length as u32));
+            }
+        }
+        for shape in shapes.iter() {
+            let batch = self.create_warmup_batch(*shape, max_token as u32);
+            match &self.model_type {
+                ModelType::Classifier => self.predict(batch).await.map(|_| ()),
+                ModelType::Embedding(_) => self.embed(batch).await.map(|_| ()),
+            }?;
+            tracing::info!("finish warmup for batch: {}, length: {}", shape.0, shape.1);
+        }
+        Ok(())
+    }
+
+    #[instrument(skip_all)]
+    pub fn create_warmup_batch(
+        &self,
+        shape: (u32, u32),
+        max_token: u32,
+    ) -> Batch {
+        let (batch_size, length) = shape;
+        let mut batched_input_ids = Vec::new();
+        let mut batched_token_type_ids = Vec::new();
+        let mut batched_position_ids = Vec::new();
+        let mut cumulative_seq_lengths = Vec::with_capacity(batch_size as usize + 1);
+        let mut pooled_indices = Vec::with_capacity(batch_size as usize);
+        cumulative_seq_lengths.push(0);
+        let input_ids: Vec<u32> = (0..length).map(|_| rand::thread_rng().gen_range(0..max_token)).collect();
+        let token_type_ids: Vec<u32> = vec![0; length as usize];
+        let position_ids: Vec<u32> = (0..length).collect();
+        let mut current_length = 0;
+        for batch_id in 0..batch_size {
+            batched_input_ids.extend(input_ids.iter().cloned());
+            batched_token_type_ids.extend(token_type_ids.iter().cloned());
+            batched_position_ids.extend(position_ids.iter().cloned());
+            current_length += input_ids.len();
+            cumulative_seq_lengths.push(current_length as u32);
+            pooled_indices.push(batch_id);
+        }
+        Batch {
+            input_ids: batched_input_ids,
+            token_type_ids: batched_token_type_ids,
+            position_ids: batched_position_ids,
+            cumulative_seq_lengths,
+            max_length: length,
+            pooled_indices,
+            raw_indices: vec![],
+        }
+    }
+


This is the same as in https://github.com/huggingface/tei-gaudi/blob/habana-main/backends/src/lib.rs right?

Signed-off-by: kaixuanliu <[email protected]>

add hpu flashBert support

regisss

The HPU-specific changes look good to me, I didn't check the rest.

Signed-off-by: kaixuanliu <[email protected]>

nice code

regisss · 2024-08-29T09:11:07Z

Dockerfile-intel

Maybe better to call this file Dockerfile-hpu since it is for Gaudi only right? To stay consistent with requirements-hpu.txt and requirements-intel.txt.

Well, Dockerfile-intel is for all intel platform(CPU,XPU and HPU). We use build-args to separate them. And requirements-intel.txt is for CPU and XPU; requirements-hpu.txt is for HPU only.

Ah okay, I didn't see the build-args. Thanks!

yao-matrix · 2024-08-30T05:49:53Z

@OlivierDehaene , could you help review? Thx

Cargo fmt and ruff format

kaixuanliu · 2024-09-05T00:36:23Z

@OlivierDehaene , Hi, can you help review? Thx

yao-matrix · 2024-09-19T05:09:21Z

@OlivierDehaene , could you help review? Thx

Unused imports and better imports

Signed-off-by: kaixuanliu <[email protected]>

fix bug

Signed-off-by: kaixuanliu <[email protected]>

kaixuanliu · 2024-09-24T08:49:17Z

@OlivierDehaene @Narsil , Can you help take a review? Thanks!
cmd line to build docker image:

#CPU 
 
docker build --build-arg PLATFORM="cpu" -f Dockerfile-intel -t tei_cpu .
#XPU    

docker build --build-arg PLATFORM="xpu" -f Dockerfile-intel -t tei_xpu .
#HPU  

docker build --build-arg PLATFORM="hpu" -f Dockerfile-intel -t tei_hpu .

yao-matrix · 2024-09-25T00:09:18Z

@mfuntowicz @kding1

upgrade xpu-ipex to 2.3.110

IlyasMoutawwakil · 2024-11-05T13:18:14Z

backends/python/server/text_embeddings_server/models/default_model.py

-        cpu_results = embedding.view(-1).tolist()
+        cpu_results = embedding.reshape(-1).tolist()


a view costs less than a reshape, why changing it ?

Here if we send batched requests like:
curl 127.0.0.1:8080/embed -X POST -d '{"inputs":["What is Deep Learning?", "It is a lovely day"]}' -H 'Content-Type: application/json'
, it will return error RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

IlyasMoutawwakil · 2024-11-05T13:20:12Z

backends/python/server/text_embeddings_server/models/flash_bert.py

+def hpu_add_layer_norm(
+    add: torch.Tensor,
+    x: torch.Tensor,
+    weight: torch.Tensor,
+    bias: torch.Tensor,
+    epsilon: float,
+    add_back: bool,
+):
+    if add is not None:
+        added_tensor = torch.add(add, x, alpha=1.0)
+        output = F.layer_norm(added_tensor, [x.size(-1)], weight, bias, epsilon)
+        if add_back:
+            add.add_(x)
+        return output
+    else:
+        return F.layer_norm(x, [x.size(-1)], weight=weight, bias=bias, eps=epsilon)


might make sense to move this to a file and import it in the case of hpu device only, same as https://github.com/huggingface/text-embeddings-inference/pull/245/files#diff-0974ea7d63e0618f6efe7ab5bdfd6ff7102d5858d241b01448214588dd0bc1cdR49

Can I create a python file hpu_op.py under the path backends/python/server/text_embeddings_server/utils/ and put the function of hpu_add_layer_norm in this file?

sorry for the late response, I was ooo. looking back at it, this method only depends on torch and is generic enough so no need for an hpu ops file.

kaixuanliu · 2024-11-29T05:39:09Z

@IlyasMoutawwakil , Hi, do you have other comments on this PR?

Signed-off-by: Liu, Kaixuan <[email protected]>

fix conflict env

yuanwu2017 force-pushed the ipex branch 2 times, most recently from 4c09b22 to 4d285bd Compare April 22, 2024 20:41

yuanwu2017 marked this pull request as draft April 23, 2024 01:17

yuanwu2017 changed the title ~~Enable the IPEX optimization for python backend~~ Enable CPU device for python backend Apr 23, 2024

Enable CPU device for python backend

042dc8f

Signed-off-by: yuanwu <[email protected]>

yuanwu2017 force-pushed the ipex branch from 4d285bd to 5bd2008 Compare April 23, 2024 09:05

yuanwu2017 added 4 commits April 23, 2024 09:07

Add the flash attention for cpu optimization.

5bd2008

Signed-off-by: yuanwu <[email protected]>

Merge branch 'huggingface:main' into ipex

e0605d2

Refine the patch

51a4af4

Signed-off-by: yuanwu <[email protected]>

Use the latest ipex

d512003

Signed-off-by: yuanwu <[email protected]>

yuanwu2017 marked this pull request as ready for review June 23, 2024 14:40

Merge branch 'huggingface:main' into ipex

c49cb7b

yuanwu2017 and others added 11 commits July 17, 2024 16:22

Merge pull request #1 from kaixuanliu/ipex

8e499ea

add python backend support for xlm-roberta type model

add python backend support for xlm-roberta type model

70e154e

Signed-off-by: Liu, Kaixuan <[email protected]>

refine code

35d3fa0

Signed-off-by: Liu, Kaixuan <[email protected]>

nice code

af2e942

Signed-off-by: Liu, Kaixuan <[email protected]>

Merge pull request #2 from kaixuanliu/ipex

e7a7388

Ipex

Merge branch 'main' into ipex

4d94445

Fix build error

f61b8bd

Signed-off-by: yuanwu <[email protected]>

add XPU and HPU support

1c2e6ab

Signed-off-by: Liu, Kaixuan <[email protected]>

change build-arg to cpu instead of ipex

e161e83

Signed-off-by: Liu, Kaixuan <[email protected]>

nice code

fc979a9

Signed-off-by: Liu, Kaixuan <[email protected]>

Merge pull request #3 from kaixuanliu/ipex

09a0d3f

add XPU and HPU support

yuanwu2017 changed the title ~~Enable CPU device for python backend~~ Enable intel devices CPU/XPU/HPU for python backend Aug 20, 2024

kaixuanliu and others added 2 commits August 22, 2024 05:10

add import ipex

8d8148c

Signed-off-by: kaixuanliu <[email protected]>

Merge pull request #4 from kaixuanliu/ipex

081ab41

add import ipex

regisss reviewed Aug 28, 2024

View reviewed changes

kaixuanliu and others added 2 commits August 29, 2024 06:54

add hpu flashBert support

cbc3ee2

Signed-off-by: kaixuanliu <[email protected]>

Merge pull request #5 from kaixuanliu/ipex

f7d1e1b

add hpu flashBert support

regisss approved these changes Aug 29, 2024

View reviewed changes

kaixuanliu and others added 2 commits August 29, 2024 08:40

update version

8cb7fa0

Signed-off-by: kaixuanliu <[email protected]>

Merge pull request #6 from kaixuanliu/ipex

8b2d1bf

nice code

regisss reviewed Aug 29, 2024

View reviewed changes

pi314ever and others added 2 commits August 30, 2024 07:05

Cargo fmt and ruff format

0f09872

Merge pull request #8 from pi314ever/dh/ipex-formatting

7684eeb

Cargo fmt and ruff format

pi314ever added 5 commits September 5, 2024 22:19

Unused imports and better imports

5dcd07b

Clippy fixes

984532d

End of File fixes

66f7775

Nicer importlib imports

542604c

Compact code

267c559

yuanwu2017 and others added 4 commits September 19, 2024 13:41

Merge pull request #9 from pi314ever/dh/ipex-minor-cleanup

c70252c

Unused imports and better imports

fix bug

978c1fe

Signed-off-by: kaixuanliu <[email protected]>

Merge pull request #10 from kaixuanliu/ipex

c6aba0d

fix bug

upgrade xpu-ipex to 2.3.110

e75ad3f

Signed-off-by: kaixuanliu <[email protected]>

Merge pull request #11 from kaixuanliu/ipex

04744eb

upgrade xpu-ipex to 2.3.110

tylertitsworth approved these changes Sep 27, 2024

View reviewed changes

kaixuanliu approved these changes Sep 29, 2024

View reviewed changes

IlyasMoutawwakil reviewed Nov 5, 2024

View reviewed changes

IlyasMoutawwakil approved these changes Dec 6, 2024

View reviewed changes

yuanwu2017 and others added 3 commits December 12, 2024 18:08

Merge branch 'main' into ipex

9079081

fix conflict env

8cd73a2

Signed-off-by: Liu, Kaixuan <[email protected]>

Merge pull request #12 from kaixuanliu/ipex-conflict-resolve

5b19baf

fix conflict env

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable intel devices CPU/XPU/HPU for python backend #245

Enable intel devices CPU/XPU/HPU for python backend #245

yuanwu2017 commented Apr 22, 2024 •

edited

Loading

yuanwu2017 commented Jun 23, 2024

regisss Aug 28, 2024

kaixuanliu Aug 29, 2024

regisss Aug 29, 2024

kaixuanliu Aug 29, 2024

kaixuanliu Aug 29, 2024

regisss Aug 29, 2024

regisss Aug 28, 2024

kaixuanliu Aug 29, 2024

regisss left a comment

regisss Aug 29, 2024

kaixuanliu Aug 29, 2024

regisss Aug 29, 2024

yao-matrix commented Aug 30, 2024

kaixuanliu commented Sep 5, 2024

yao-matrix commented Sep 19, 2024

kaixuanliu commented Sep 24, 2024 •

edited

Loading

yao-matrix commented Sep 25, 2024

IlyasMoutawwakil Nov 5, 2024

kaixuanliu Nov 6, 2024

IlyasMoutawwakil Nov 5, 2024 •

edited

Loading

kaixuanliu Nov 6, 2024

IlyasMoutawwakil Nov 19, 2024

kaixuanliu commented Nov 29, 2024 •

edited

Loading

		cpu_results = embedding.view(-1).tolist()
		cpu_results = embedding.reshape(-1).tolist()

Enable intel devices CPU/XPU/HPU for python backend #245

Are you sure you want to change the base?

Enable intel devices CPU/XPU/HPU for python backend #245

Conversation

yuanwu2017 commented Apr 22, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

yuanwu2017 commented Jun 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

regisss left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yao-matrix commented Aug 30, 2024

kaixuanliu commented Sep 5, 2024

yao-matrix commented Sep 19, 2024

kaixuanliu commented Sep 24, 2024 • edited Loading

yao-matrix commented Sep 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

IlyasMoutawwakil Nov 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kaixuanliu commented Nov 29, 2024 • edited Loading

yuanwu2017 commented Apr 22, 2024 •

edited

Loading

kaixuanliu commented Sep 24, 2024 •

edited

Loading

IlyasMoutawwakil Nov 5, 2024 •

edited

Loading

kaixuanliu commented Nov 29, 2024 •

edited

Loading