You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
model=meta-llama/Llama-3.3-70B-Instruct
# share a volume with the Docker container to avoid downloading weights every run
volume=/srv/ai/data/tgi
docker run --gpus "1,2,3,4" --shm-size 1g -e HF_TOKEN=[TOKEN] -p 8080:80 -v $volume:/data \
ghcr.io/huggingface/text-generation-inference:3.0.0 \
--model-id $model \
--quantize eetq \
--cuda-memory-fraction 0.95
4x 3090 tis, epyc cpu, 256gb ram
Information
Docker
The CLI directly
Tasks
An officially supported command
My own modifications
Reproduction
Run docker command above
2024-12-17T17:23:53.961980Z INFO text_generation_launcher: Using prefill chunking = True
2024-12-17T17:23:54.547663Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-12-17T17:23:54.547663Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-12-17T17:23:54.558361Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-12-17T17:23:54.572348Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-12-17T17:23:54.821433Z INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-1
2024-12-17T17:23:54.821492Z INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-3
2024-12-17T17:23:54.821530Z INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-2
2024-12-17T17:23:54.847944Z INFO shard-manager: text_generation_launcher: Shard ready in 150.41845764s rank=3
2024-12-17T17:23:54.858639Z INFO shard-manager: text_generation_launcher: Shard ready in 150.432820265s rank=1
2024-12-17T17:23:54.872643Z INFO shard-manager: text_generation_launcher: Shard ready in 150.439607673s rank=2
2024-12-17T17:23:55.047221Z INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-0
2024-12-17T17:23:55.048286Z INFO shard-manager: text_generation_launcher: Shard ready in 150.622573521s rank=0
2024-12-17T17:23:55.115403Z INFO text_generation_launcher: Starting Webserver
2024-12-17T17:23:55.210971Z INFO text_generation_router_v3: backends/v3/src/lib.rs:125: Warming up model
2024-12-17T17:23:55.231460Z INFO text_generation_launcher: Using optimized Triton indexing kernels.
After this the server dies and have to manuall power cycle
Full logs trying smaller model and tried disabling cuda-graphs
2024-12-17T18:03:27.087401Z INFO text_generation_launcher: Args {
model_id: "Qwen/Qwen2.5-32B-Instruct",
revision: None,
validation_workers: 2,
sharded: None,
num_shard: None,
quantize: Some(
Eetq,
),
speculate: None,
dtype: None,
kv_cache_dtype: None,
trust_remote_code: false,
max_concurrent_requests: 128,
max_best_of: 2,
max_stop_sequences: 4,
max_top_n_tokens: 5,
max_input_tokens: None,
max_input_length: None,
max_total_tokens: None,
waiting_served_ratio: 0.3,
max_batch_prefill_tokens: None,
max_batch_total_tokens: None,
max_waiting_tokens: 20,
max_batch_size: None,
cuda_graphs: Some(
[
0,
],
),
hostname: "4eee9dca0df9",
port: 80,
shard_uds_path: "/tmp/text-generation-server",
master_addr: "localhost",
master_port: 29500,
huggingface_hub_cache: None,
weights_cache_override: None,
disable_custom_kernels: false,
cuda_memory_fraction: 0.95,
rope_scaling: None,
rope_factor: None,
json_output: false,
otlp_endpoint: None,
otlp_service_name: "text-generation-inference.router",
cors_allow_origin: [],
api_key: None,
watermark_gamma: None,
watermark_delta: None,
ngrok: false,
ngrok_authtoken: None,
ngrok_edge: None,
tokenizer_config_path: None,
disable_grammar_support: false,
env: false,
max_client_batch_size: 4,
lora_adapters: None,
usage_stats: On,
payload_limit: 2000000,
enable_prefill_logprobs: false,
}
2024-12-17T18:03:27.088023Z INFO hf_hub: Token file not found "/data/token"
2024-12-17T18:03:28.994330Z INFO text_generation_launcher: Using attention flashinfer - Prefix caching true
2024-12-17T18:03:28.994349Z INFO text_generation_launcher: Sharding model on 4 processes
2024-12-17T18:03:29.030950Z WARN text_generation_launcher: Unkown compute for card nvidia-geforce-rtx-3090-ti
2024-12-17T18:03:29.064926Z INFO text_generation_launcher: Default `max_batch_prefill_tokens` to 4096
2024-12-17T18:03:29.065078Z INFO download: text_generation_launcher: Starting check and download process for Qwen/Qwen2.5-32B-Instruct
2024-12-17T18:03:32.104130Z INFO text_generation_launcher: Files are already present on the host. Skipping download.
2024-12-17T18:03:32.680081Z INFO download: text_generation_launcher: Successfully downloaded weights for Qwen/Qwen2.5-32B-Instruct
2024-12-17T18:03:32.680348Z INFO shard-manager: text_generation_launcher: Starting shard rank=1
2024-12-17T18:03:32.680364Z INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-12-17T18:03:32.680439Z INFO shard-manager: text_generation_launcher: Starting shard rank=3
2024-12-17T18:03:32.686107Z INFO shard-manager: text_generation_launcher: Starting shard rank=2
2024-12-17T18:03:35.215815Z INFO text_generation_launcher: Using prefix caching = True
2024-12-17T18:03:35.215842Z INFO text_generation_launcher: Using Attention = flashinfer
2024-12-17T18:03:42.713034Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-12-17T18:03:42.714007Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-12-17T18:03:42.714678Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-12-17T18:03:42.721143Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-12-17T18:03:52.722256Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-12-17T18:03:52.723416Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-12-17T18:03:52.723960Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-12-17T18:03:52.730231Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-12-17T18:04:02.731685Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-12-17T18:04:02.733008Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-12-17T18:04:02.733511Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-12-17T18:04:02.739340Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-12-17T18:04:12.740983Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-12-17T18:04:12.742778Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-12-17T18:04:12.743260Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-12-17T18:04:12.748509Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-12-17T18:04:22.750201Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-12-17T18:04:22.752482Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-12-17T18:04:22.753057Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-12-17T18:04:22.757785Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-12-17T18:04:32.759340Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-12-17T18:04:32.762067Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-12-17T18:04:32.762852Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-12-17T18:04:32.767034Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-12-17T18:04:42.768492Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-12-17T18:04:42.771758Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-12-17T18:04:42.772535Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-12-17T18:04:42.776268Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-12-17T18:04:52.777706Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-12-17T18:04:52.781362Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-12-17T18:04:52.782289Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-12-17T18:04:52.785605Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-12-17T18:05:02.786995Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-12-17T18:05:02.790997Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-12-17T18:05:02.792054Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-12-17T18:05:02.794933Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-12-17T18:05:12.796209Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-12-17T18:05:12.800615Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-12-17T18:05:12.802012Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-12-17T18:05:12.804257Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-12-17T18:05:22.805536Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-12-17T18:05:22.810307Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-12-17T18:05:22.811833Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-12-17T18:05:22.813416Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-12-17T18:05:32.814759Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-12-17T18:05:32.819792Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-12-17T18:05:32.821590Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-12-17T18:05:32.821834Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-12-17T18:05:42.824027Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-12-17T18:05:42.829566Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-12-17T18:05:42.830560Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-12-17T18:05:42.831422Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-12-17T18:05:52.833387Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-12-17T18:05:52.839573Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-12-17T18:05:52.840175Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-12-17T18:05:52.841278Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-12-17T18:06:01.763800Z INFO text_generation_launcher: Using prefill chunking = True
2024-12-17T18:06:02.627022Z INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-1
2024-12-17T18:06:02.627076Z INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-3
2024-12-17T18:06:02.627110Z INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-0
2024-12-17T18:06:02.642621Z INFO shard-manager: text_generation_launcher: Shard ready in 149.940971364s rank=3
2024-12-17T18:06:02.649583Z INFO shard-manager: text_generation_launcher: Shard ready in 149.948711278s rank=0
2024-12-17T18:06:02.650706Z INFO shard-manager: text_generation_launcher: Shard ready in 149.949875248s rank=1
2024-12-17T18:06:02.848613Z INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-2
2024-12-17T18:06:02.849891Z INFO shard-manager: text_generation_launcher: Shard ready in 150.143446295s rank=2
2024-12-17T18:06:02.909856Z INFO text_generation_launcher: Starting Webserver
2024-12-17T18:06:03.001599Z INFO text_generation_router_v3: backends/v3/src/lib.rs:125: Warming up model
2024-12-17T18:06:03.023245Z INFO text_generation_launcher: Using optimized Triton indexing kernels.
Expected behavior
No system crash
The text was updated successfully, but these errors were encountered:
System Info
4x 3090 tis, epyc cpu, 256gb ram
Information
Tasks
Reproduction
Run docker command above
After this the server dies and have to manuall power cycle
Full logs trying smaller model and tried disabling cuda-graphs
Expected behavior
No system crash
The text was updated successfully, but these errors were encountered: