Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predict for Python-Backend #450

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

michaelfeil
Copy link
Contributor

@michaelfeil michaelfeil commented Dec 13, 2024

What does this PR do?

The PR enables Classifier Models to be run with the Python Backend.

On a high level, the ModelType gets called into the Python API.
The grcp protocol got an extension for the Predict(repeated Score) interface.
The Python Server then runs either AutoModel / FlashBert or AutoModelForSequenceClassification.

This is particular useful as it runs models e.g. with DebertaV2 e.g. mixedbread-ai/mxbai-rerank-xsmall-v1

Closes #386 #357
Closes #449 @kaixuanliu I partially picked up your stale PR, lmk if you want to be commit co-author.
Fixed: Makefile command issue.
This PR has been formatted with cargo fmt and python black.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

I added a

make run-reranker-dev
2024-12-13T02:39:30.494103Z  INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "mix*******-**/*****-******-*****l-v1", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "michaelfeil-dev-pod-h100-0", port: 3000, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: None, payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
2024-12-13T02:39:30.494244Z  INFO hf_hub: /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"    
2024-12-13T02:39:30.578633Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:20: Starting download
2024-12-13T02:39:30.578653Z  INFO download_artifacts:download_pool_config: text_embeddings_core::download: core/src/download.rs:53: Downloading `1_Pooling/config.json`
2024-12-13T02:39:30.975772Z  WARN download_artifacts: text_embeddings_core::download: core/src/download.rs:26: Download failed: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/mixedbread-ai/mxbai-rerank-xsmall-v1/resolve/main/1_Pooling/config.json)
2024-12-13T02:39:32.063030Z  INFO download_artifacts:download_new_st_config: text_embeddings_core::download: core/src/download.rs:77: Downloading `config_sentence_transformers.json`
2024-12-13T02:39:32.222611Z  WARN download_artifacts: text_embeddings_core::download: core/src/download.rs:36: Download failed: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/mixedbread-ai/mxbai-rerank-xsmall-v1/resolve/main/config_sentence_transformers.json)
2024-12-13T02:39:32.222630Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:40: Downloading `config.json`
2024-12-13T02:39:32.222665Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:43: Downloading `tokenizer.json`
2024-12-13T02:39:32.222677Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:47: Model artifacts downloaded in 1.644047184s
2024-12-13T02:39:32.377947Z  WARN tokenizers::tokenizer::serialization: /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '[MASK]' was expected to have ID '128000' but was given ID 'None'    
2024-12-13T02:39:32.378421Z  WARN text_embeddings_router: router/src/lib.rs:184: Could not find a Sentence Transformers config
2024-12-13T02:39:32.378431Z  INFO text_embeddings_router: router/src/lib.rs:188: Maximum number of tokens per request: 512
2024-12-13T02:39:32.378440Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:28: Starting 208 tokenization workers
2024-12-13T02:39:45.335673Z  INFO text_embeddings_router: router/src/lib.rs:230: Starting model backend
2024-12-13T02:39:45.335724Z  INFO text_embeddings_backend: backends/src/lib.rs:360: Downloading `model.safetensors`
2024-12-13T02:39:45.335788Z  INFO text_embeddings_backend: backends/src/lib.rs:244: Model weights downloaded in 64.629µs
2024-12-13T02:39:45.335855Z ERROR text_embeddings_backend: backends/src/lib.rs:255: Could not start Candle backend: Could not start backend: Model is not supported

Caused by:
    unknown variant `deberta-v2`, expected one of `bert`, `xlm-roberta`, `camembert`, `roberta`, `distilbert`, `nomic_bert`, `mistral`, `new`, `qwen2`, `mpnet` at line 21 column 28
2024-12-13T02:39:45.336213Z  INFO text_embeddings_backend_python::management: backends/python/src/management.rs:79: Starting Python backend
2024-12-13T02:39:48.582641Z  WARN python-backend: text_embeddings_backend_python::logging: backends/python/src/logging.rs:39: Could not import Flash Attention enabled models: No module named 'dropout_layer_norm'

2024-12-13T02:39:50.145457Z  INFO python-backend: text_embeddings_backend_python::logging: backends/python/src/logging.rs:37: Server started at unix:///tmp/text-embeddings-inference-server

2024-12-13T02:39:50.145903Z  INFO text_embeddings_backend_python::management: backends/python/src/management.rs:140: Python backend ready in 4.376466659s
2024-12-13T02:39:50.573917Z  INFO text_embeddings_router: router/src/lib.rs:248: Warming up model
2024-12-13T02:39:50.824605Z  WARN text_embeddings_router: router/src/lib.rs:310: Invalid hostname, defaulting to 0.0.0.0
2024-12-13T02:39:50.825655Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1812: Starting HTTP server: 0.0.0.0:3000
2024-12-13T02:39:50.825667Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1813: Ready
2024-12-13T02:41:33.254410Z  INFO rerank{total_time="13.560855ms" tokenization_time="802.831µs" queue_time="769.966µs" inference_time="11.89108ms"}: text_embeddings_router::http::server: router/src/http/server.rs:459: Success
2024-12-13T02:41:40.415004Z  INFO rerank{total_time="12.697461ms" tokenization_time="664.931µs" queue_time="745.23µs" inference_time="11.198792ms"}: text_embeddings_router::http::server: router/src/http/server.rs:459: Success
e.g.
{
  "query": "What is Deep Learning?",
  "raw_scores": false,
  "return_text": true,
  "texts": [
    "Deep learning is..", "Deep Learning is part of ML ", "Paris is the capital of France"
  ],
  "truncate": false,
  "truncation_direction": "Right"
}

200 | Response bodyDownload[   {     "index": 1,     "text": "Deep Learning is part of ML ",     "score": 0.7718435   },   {     "index": 0,     "text": "Deep learning is..",     "score": 0.68825895   },   {     "index": 2,     "text": "Paris is the capital of France",     "score": 0.034815196   } ]

Who can review?

@OlivierDehaene @Narsil

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature addition: Python backend / grcp backend for ClassifierEngine
1 participant