You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
xpu-smi:
N/A
2024-12-11T18:48:31.147473Z INFO text_generation_launcher: Args {
model_id: "bigscience/bloom-560m",
revision: None,
validation_workers: 2,
sharded: None,
num_shard: None,
quantize: None,
speculate: None,
dtype: None,
kv_cache_dtype: None,
trust_remote_code: false,
max_concurrent_requests: 128,
max_best_of: 2,
max_stop_sequences: 4,
max_top_n_tokens: 5,
max_input_tokens: None,
max_input_length: None,
max_total_tokens: None,
waiting_served_ratio: 0.3,
max_batch_prefill_tokens: None,
max_batch_total_tokens: None,
max_waiting_tokens: 20,
max_batch_size: None,
cuda_graphs: None,
hostname: "0.0.0.0",
port: 3000,
shard_uds_path: "/tmp/text-generation-server",
master_addr: "localhost",
master_port: 29500,
huggingface_hub_cache: None,
weights_cache_override: None,
disable_custom_kernels: false,
cuda_memory_fraction: 1.0,
rope_scaling: None,
rope_factor: None,
json_output: false,
otlp_endpoint: None,
otlp_service_name: "text-generation-inference.router",
cors_allow_origin: [],
api_key: None,
watermark_gamma: None,
watermark_delta: None,
ngrok: false,
ngrok_authtoken: None,
ngrok_edge: None,
tokenizer_config_path: None,
disable_grammar_support: false,
env: true,
max_client_batch_size: 4,
lora_adapters: None,
usage_stats: On,
payload_limit: 2000000,
enable_prefill_logprobs: false,
}
config.json [00:00:00] [████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████] 693 B/693 B 4.39 KiB/s (0s)2024-12-11T18:48:34.581849Z INFO text_generation_launcher: Forcing attention to 'flashdecoding' because head dim is not supported by flashinfer, also disabling prefix caching
2024-12-11T18:48:34.581893Z INFO text_generation_launcher: Using attention flashdecoding - Prefix caching 0
2024-12-11T18:48:34.646542Z WARN text_generation_launcher: Unkown compute for card nvidia-geforce-rtx-3090
2024-12-11T18:48:34.712901Z INFO text_generation_launcher: Default max_batch_prefill_tokens to 4096
2024-12-11T18:48:34.712926Z INFO text_generation_launcher: Using default cuda graphs [1, 2, 4, 8, 16, 32]
2024-12-11T18:48:34.713142Z INFO download: text_generation_launcher: Starting check and download process for bigscience/bloom-560m
2024-12-11T18:48:38.229198Z INFO text_generation_launcher: Download file: model.safetensors
Information
Docker
The CLI directly
Tasks
An officially supported command
My own modifications
Reproduction
(text-generation-inference) scin@krakatoa:~$ CUDA_VISIBLE_DEVICES=0,1,2,3 text-generation-launcher
--model-id /mnt/models/meta-llama/Llama-3.3-70B-Instruct/
--sharded true
--num-shard 4
--quantize bitsandbytes
--max-input-tokens 2048
--max-total-tokens 4096
--max-batch-size 8
2024-12-11T18:25:32.807525Z INFO text_generation_launcher: Args {
model_id: "/mnt/models/meta-llama/Llama-3.3-70B-Instruct/",
revision: None,
validation_workers: 2,
sharded: Some(
true,
),
num_shard: Some(
4,
),
quantize: Some(
Bitsandbytes,
),
speculate: None,
dtype: None,
kv_cache_dtype: None,
trust_remote_code: false,
max_concurrent_requests: 128,
max_best_of: 2,
max_stop_sequences: 4,
max_top_n_tokens: 5,
max_input_tokens: Some(
2048,
),
max_input_length: None,
max_total_tokens: Some(
4096,
),
waiting_served_ratio: 0.3,
max_batch_prefill_tokens: None,
max_batch_total_tokens: None,
max_waiting_tokens: 20,
max_batch_size: Some(
8,
),
cuda_graphs: None,
hostname: "0.0.0.0",
port: 3000,
shard_uds_path: "/tmp/text-generation-server",
master_addr: "localhost",
master_port: 29500,
huggingface_hub_cache: None,
weights_cache_override: None,
disable_custom_kernels: false,
cuda_memory_fraction: 1.0,
rope_scaling: None,
rope_factor: None,
json_output: false,
otlp_endpoint: None,
otlp_service_name: "text-generation-inference.router",
cors_allow_origin: [],
api_key: None,
watermark_gamma: None,
watermark_delta: None,
ngrok: false,
ngrok_authtoken: None,
ngrok_edge: None,
tokenizer_config_path: None,
disable_grammar_support: false,
env: false,
max_client_batch_size: 4,
lora_adapters: None,
usage_stats: On,
payload_limit: 2000000,
enable_prefill_logprobs: false,
}
2024-12-11T18:25:36.061592Z INFO text_generation_launcher: Using attention flashinfer - Prefix caching true
2024-12-11T18:25:36.061634Z INFO text_generation_launcher: Sharding model on 4 processes
2024-12-11T18:25:36.254821Z WARN text_generation_launcher: Unkown compute for card nvidia-geforce-rtx-3090
2024-12-11T18:25:36.337546Z WARN text_generation_launcher: Not enough VRAM to run the model: Available: 97.93GB - Model 127.68GB.
2024-12-11T18:25:36.337590Z INFO text_generation_launcher: Default max_batch_prefill_tokens to 4096
2024-12-11T18:25:36.337607Z WARN text_generation_launcher: Bitsandbytes is deprecated, use eetq instead, which provides better latencies overall and is drop-in in most cases.
2024-12-11T18:25:36.337622Z WARN text_generation_launcher: Bitsandbytes doesn't work with cuda graphs, deactivating them
2024-12-11T18:25:36.337867Z INFO download: text_generation_launcher: Starting check and download process for /mnt/models/meta-llama/Llama-3.3-70B-Instruct/
2024-12-11T18:25:40.220528Z INFO text_generation_launcher: Files are already present on the host. Skipping download.
2024-12-11T18:25:40.971370Z INFO download: text_generation_launcher: Successfully downloaded weights for /mnt/models/meta-llama/Llama-3.3-70B-Instruct/
2024-12-11T18:25:40.971794Z INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-12-11T18:25:40.979679Z INFO shard-manager: text_generation_launcher: Starting shard rank=1
2024-12-11T18:25:40.979746Z INFO shard-manager: text_generation_launcher: Starting shard rank=3
2024-12-11T18:25:40.979746Z INFO shard-manager: text_generation_launcher: Starting shard rank=2
2024-12-11T18:25:44.579432Z INFO text_generation_launcher: Using prefix caching = True
2024-12-11T18:25:44.579523Z INFO text_generation_launcher: Using Attention = flashinfer
2024-12-11T18:25:45.106837Z WARN text_generation_launcher: exllamav2_kernels not installed.
2024-12-11T18:25:45.116706Z WARN text_generation_launcher: Could not import Mamba: No module named 'mamba_ssm'
2024-12-11T18:25:51.045493Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-12-11T18:25:51.075031Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-12-11T18:25:51.079025Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-12-11T18:25:51.085019Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-12-11T18:26:01.091932Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-12-11T18:26:01.128324Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-12-11T18:26:01.132268Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-12-11T18:26:01.146315Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-12-11T18:26:11.138573Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-12-11T18:26:11.169899Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-12-11T18:26:11.173043Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-12-11T18:26:11.213768Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-12-11T18:26:21.162127Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-12-11T18:26:21.219010Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-12-11T18:26:21.268627Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-12-11T18:26:21.306493Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-12-11T18:26:31.257146Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-12-11T18:26:31.300640Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-12-11T18:26:31.313775Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-12-11T18:26:31.397810Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-12-11T18:26:33.308202Z INFO text_generation_launcher: Using prefill chunking = True
2024-12-11T18:26:36.179734Z INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-1
2024-12-11T18:26:36.207682Z INFO shard-manager: text_generation_launcher: Shard ready in 55.201097835s rank=1
2024-12-11T18:26:36.818240Z INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-2
2024-12-11T18:26:36.818290Z INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-3
2024-12-11T18:26:36.818590Z INFO shard-manager: text_generation_launcher: Shard ready in 55.801796419s rank=3
2024-12-11T18:26:36.824017Z INFO shard-manager: text_generation_launcher: Shard ready in 55.815034637s rank=2
2024-12-11T18:26:37.306767Z INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-0
2024-12-11T18:26:37.384325Z INFO shard-manager: text_generation_launcher: Shard ready in 56.386862739s rank=0
2024-12-11T18:26:37.426239Z INFO text_generation_launcher: Starting Webserver
2024-12-11T18:26:37.518635Z INFO text_generation_router_v3: backends/v3/src/lib.rs:125: Warming up model
2024-12-11T18:26:38.029590Z INFO text_generation_launcher: Using optimized Triton indexing kernels.
2024-12-11T18:26:38.572695Z ERROR text_generation_launcher: Method Warmup encountered an error.
Traceback (most recent call last):
File "/home/scin/miniconda3/envs/text-generation-inference/bin/text-generation-server", line 8, in
sys.exit(app())
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/typer/main.py", line 321, in call
return get_command(self)(*args, **kwargs)
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/typer/core.py", line 728, in main
return _main(
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/typer/core.py", line 197, in _main
rv = self.invoke(ctx)
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/typer/main.py", line 703, in wrapper
return callback(**use_params)
File "/home/scin/text-generation-inference/server/text_generation_server/cli.py", line 117, in serve
server.serve(
File "/home/scin/text-generation-inference/server/text_generation_server/server.py", line 315, in serve
asyncio.run(
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/asyncio/runners.py", line 190, in run
return runner.run(main)
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/asyncio/base_events.py", line 641, in run_until_complete
self.run_forever()
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/asyncio/base_events.py", line 608, in run_forever
self._run_once()
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/asyncio/base_events.py", line 1936, in _run_once
handle._run()
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/asyncio/events.py", line 84, in _run
self._context.run(self._callback, *self._args)
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/grpc_interceptor/server.py", line 165, in invoke_intercept_method
return await self.intercept(
File "/home/scin/text-generation-inference/server/text_generation_server/interceptor.py", line 24, in intercept
return await response
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 120, in _unary_interceptor
raise error
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 111, in _unary_interceptor
return await behavior(request_or_iterator, context)
File "/home/scin/text-generation-inference/server/text_generation_server/server.py", line 144, in Warmup
self.model.warmup(batch, max_input_tokens, max_total_tokens)
File "/home/scin/text-generation-inference/server/text_generation_server/models/flash_causal_lm.py", line 1577, in warmup
_, _batch, _ = self.generate_token(batch)
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/contextlib.py", line 81, in inner
return func(*args, **kwds)
File "/home/scin/text-generation-inference/server/text_generation_server/models/flash_causal_lm.py", line 1957, in generate_token
out, speculative_logits = self.forward(batch, adapter_data)
File "/home/scin/text-generation-inference/server/text_generation_server/models/flash_causal_lm.py", line 1839, in forward
with self._forward_context(
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/contextlib.py", line 137, in enter
return next(self.gen)
File "/home/scin/text-generation-inference/server/text_generation_server/layers/attention/flashinfer.py", line 86, in use_prefill_with_paged_kv_state
state.begin_forward(
File "/home/scin/flashinfer/flashinfer/prefill.py", line 1078, in plan
window_left >= 0, # use_sliding_window
TypeError: '>=' not supported between instances of 'NoneType' and 'int'
2024-12-11T18:26:38.573225Z ERROR warmup{max_input_length=Some(2048) max_prefill_tokens=4096 max_total_tokens=Some(4096) max_batch_size=Some(8)}:warmup: text_generation_router_v3::client: backends/v3/src/client/mod.rs:45: Server error: '>=' not supported between instances of 'NoneType' and 'int'
2024-12-11T18:26:38.576166Z ERROR text_generation_launcher: Method Warmup encountered an error.
Traceback (most recent call last):
File "/home/scin/miniconda3/envs/text-generation-inference/bin/text-generation-server", line 8, in
sys.exit(app())
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/typer/main.py", line 321, in call
return get_command(self)(*args, **kwargs)
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/typer/core.py", line 728, in main
return _main(
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/typer/core.py", line 197, in _main
rv = self.invoke(ctx)
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/typer/main.py", line 703, in wrapper
return callback(**use_params)
File "/home/scin/text-generation-inference/server/text_generation_server/cli.py", line 117, in serve
server.serve(
File "/home/scin/text-generation-inference/server/text_generation_server/server.py", line 315, in serve
asyncio.run(
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/asyncio/runners.py", line 190, in run
return runner.run(main)
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/asyncio/base_events.py", line 641, in run_until_complete
self.run_forever()
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/asyncio/base_events.py", line 608, in run_forever
self._run_once()
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/asyncio/base_events.py", line 1936, in _run_once
handle._run()
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/asyncio/events.py", line 84, in _run
self._context.run(self._callback, *self._args)
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/grpc_interceptor/server.py", line 165, in invoke_intercept_method
return await self.intercept(
File "/home/scin/text-generation-inference/server/text_generation_server/interceptor.py", line 24, in intercept
return await response
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 120, in _unary_interceptor
raise error
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 111, in _unary_interceptor
return await behavior(request_or_iterator, context)
File "/home/scin/text-generation-inference/server/text_generation_server/server.py", line 144, in Warmup
self.model.warmup(batch, max_input_tokens, max_total_tokens)
File "/home/scin/text-generation-inference/server/text_generation_server/models/flash_causal_lm.py", line 1577, in warmup
_, _batch, _ = self.generate_token(batch)
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/contextlib.py", line 81, in inner
return func(*args, **kwds)
File "/home/scin/text-generation-inference/server/text_generation_server/models/flash_causal_lm.py", line 1957, in generate_token
out, speculative_logits = self.forward(batch, adapter_data)
File "/home/scin/text-generation-inference/server/text_generation_server/models/flash_causal_lm.py", line 1839, in forward
with self._forward_context(
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/contextlib.py", line 137, in enter
return next(self.gen)
File "/home/scin/text-generation-inference/server/text_generation_server/layers/attention/flashinfer.py", line 86, in use_prefill_with_paged_kv_state
state.begin_forward(
File "/home/scin/flashinfer/flashinfer/prefill.py", line 1078, in plan
window_left >= 0, # use_sliding_window
TypeError: '>=' not supported between instances of 'NoneType' and 'int'
2024-12-11T18:26:38.576425Z ERROR warmup{max_input_length=Some(2048) max_prefill_tokens=4096 max_total_tokens=Some(4096) max_batch_size=Some(8)}:warmup: text_generation_router_v3::client: backends/v3/src/client/mod.rs:45: Server error: '>=' not supported between instances of 'NoneType' and 'int'
2024-12-11T18:26:38.576677Z ERROR text_generation_launcher: Method Warmup encountered an error.
Traceback (most recent call last):
File "/home/scin/miniconda3/envs/text-generation-inference/bin/text-generation-server", line 8, in
sys.exit(app())
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/typer/main.py", line 321, in call
return get_command(self)(*args, **kwargs)
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/typer/core.py", line 728, in main
return _main(
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/typer/core.py", line 197, in _main
rv = self.invoke(ctx)
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/typer/main.py", line 703, in wrapper
return callback(**use_params)
File "/home/scin/text-generation-inference/server/text_generation_server/cli.py", line 117, in serve
server.serve(
File "/home/scin/text-generation-inference/server/text_generation_server/server.py", line 315, in serve
asyncio.run(
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/asyncio/runners.py", line 190, in run
return runner.run(main)
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/asyncio/base_events.py", line 641, in run_until_complete
self.run_forever()
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/asyncio/base_events.py", line 608, in run_forever
self._run_once()
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/asyncio/base_events.py", line 1936, in _run_once
handle._run()
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/asyncio/events.py", line 84, in _run
self._context.run(self._callback, *self._args)
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/grpc_interceptor/server.py", line 165, in invoke_intercept_method
return await self.intercept(
File "/home/scin/text-generation-inference/server/text_generation_server/interceptor.py", line 24, in intercept
return await response
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 120, in _unary_interceptor
raise error
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 111, in _unary_interceptor
return await behavior(request_or_iterator, context)
File "/home/scin/text-generation-inference/server/text_generation_server/server.py", line 144, in Warmup
self.model.warmup(batch, max_input_tokens, max_total_tokens)
File "/home/scin/text-generation-inference/server/text_generation_server/models/flash_causal_lm.py", line 1577, in warmup
_, _batch, _ = self.generate_token(batch)
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/contextlib.py", line 81, in inner
return func(*args, **kwds)
File "/home/scin/text-generation-inference/server/text_generation_server/models/flash_causal_lm.py", line 1957, in generate_token
out, speculative_logits = self.forward(batch, adapter_data)
File "/home/scin/text-generation-inference/server/text_generation_server/models/flash_causal_lm.py", line 1839, in forward
with self._forward_context(
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/contextlib.py", line 137, in enter
return next(self.gen)
File "/home/scin/text-generation-inference/server/text_generation_server/layers/attention/flashinfer.py", line 86, in use_prefill_with_paged_kv_state
state.begin_forward(
File "/home/scin/flashinfer/flashinfer/prefill.py", line 1078, in plan
window_left >= 0, # use_sliding_window
TypeError: '>=' not supported between instances of 'NoneType' and 'int'
2024-12-11T18:26:38.577034Z ERROR warmup{max_input_length=Some(2048) max_prefill_tokens=4096 max_total_tokens=Some(4096) max_batch_size=Some(8)}:warmup: text_generation_router_v3::client: backends/v3/src/client/mod.rs:45: Server error: '>=' not supported between instances of 'NoneType' and 'int'
2024-12-11T18:26:38.591025Z ERROR text_generation_launcher: Method Warmup encountered an error.
Traceback (most recent call last):
File "/home/scin/miniconda3/envs/text-generation-inference/bin/text-generation-server", line 8, in
sys.exit(app())
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/typer/main.py", line 321, in call
return get_command(self)(*args, **kwargs)
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/typer/core.py", line 728, in main
return _main(
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/typer/core.py", line 197, in _main
rv = self.invoke(ctx)
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/typer/main.py", line 703, in wrapper
return callback(**use_params)
File "/home/scin/text-generation-inference/server/text_generation_server/cli.py", line 117, in serve
server.serve(
File "/home/scin/text-generation-inference/server/text_generation_server/server.py", line 315, in serve
asyncio.run(
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/asyncio/runners.py", line 190, in run
return runner.run(main)
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/asyncio/base_events.py", line 641, in run_until_complete
self.run_forever()
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/asyncio/base_events.py", line 608, in run_forever
self._run_once()
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/asyncio/base_events.py", line 1936, in _run_once
handle._run()
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/asyncio/events.py", line 84, in _run
self._context.run(self._callback, *self._args)
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/grpc_interceptor/server.py", line 165, in invoke_intercept_method
return await self.intercept(
File "/home/scin/text-generation-inference/server/text_generation_server/interceptor.py", line 24, in intercept
return await response
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 120, in _unary_interceptor
raise error
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 111, in _unary_interceptor
return await behavior(request_or_iterator, context)
File "/home/scin/text-generation-inference/server/text_generation_server/server.py", line 144, in Warmup
self.model.warmup(batch, max_input_tokens, max_total_tokens)
File "/home/scin/text-generation-inference/server/text_generation_server/models/flash_causal_lm.py", line 1577, in warmup
_, _batch, _ = self.generate_token(batch)
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/contextlib.py", line 81, in inner
return func(*args, **kwds)
File "/home/scin/text-generation-inference/server/text_generation_server/models/flash_causal_lm.py", line 1957, in generate_token
out, speculative_logits = self.forward(batch, adapter_data)
File "/home/scin/text-generation-inference/server/text_generation_server/models/flash_causal_lm.py", line 1839, in forward
with self._forward_context(
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/contextlib.py", line 137, in enter
return next(self.gen)
File "/home/scin/text-generation-inference/server/text_generation_server/layers/attention/flashinfer.py", line 86, in use_prefill_with_paged_kv_state
state.begin_forward(
File "/home/scin/flashinfer/flashinfer/prefill.py", line 1078, in plan
window_left >= 0, # use_sliding_window
TypeError: '>=' not supported between instances of 'NoneType' and 'int'
2024-12-11T18:26:38.591400Z ERROR warmup{max_input_length=Some(2048) max_prefill_tokens=4096 max_total_tokens=Some(4096) max_batch_size=Some(8)}:warmup: text_generation_router_v3::client: backends/v3/src/client/mod.rs:45: Server error: '>=' not supported between instances of 'NoneType' and 'int'
Error: Backend(Warmup(Generation("'>=' not supported between instances of 'NoneType' and 'int'")))
2024-12-11T18:26:38.606855Z ERROR text_generation_launcher: Webserver Crashed
2024-12-11T18:26:38.606883Z INFO text_generation_launcher: Shutting down shards
2024-12-11T18:26:38.611513Z INFO shard-manager: text_generation_launcher: Terminating shard rank=1
2024-12-11T18:26:38.611567Z INFO shard-manager: text_generation_launcher: Waiting for shard to gracefully shutdown rank=1
2024-12-11T18:26:38.622656Z INFO shard-manager: text_generation_launcher: Terminating shard rank=3
2024-12-11T18:26:38.622711Z INFO shard-manager: text_generation_launcher: Waiting for shard to gracefully shutdown rank=3
2024-12-11T18:26:38.627376Z INFO shard-manager: text_generation_launcher: Terminating shard rank=2
2024-12-11T18:26:38.627442Z INFO shard-manager: text_generation_launcher: Waiting for shard to gracefully shutdown rank=2
2024-12-11T18:26:38.687423Z INFO shard-manager: text_generation_launcher: Terminating shard rank=0
2024-12-11T18:26:38.687508Z INFO shard-manager: text_generation_launcher: Waiting for shard to gracefully shutdown rank=0
2024-12-11T18:26:40.426587Z INFO shard-manager: text_generation_launcher: shard terminated rank=3
2024-12-11T18:26:40.793289Z INFO shard-manager: text_generation_launcher: shard terminated rank=0
2024-12-11T18:26:40.816993Z INFO shard-manager: text_generation_launcher: shard terminated rank=1
2024-12-11T18:26:40.832001Z INFO shard-manager: text_generation_launcher: shard terminated rank=2
Error: WebserverFailed
Expected behavior
Successfully start the webserver.
The text was updated successfully, but these errors were encountered:
System Info
(text-generation-inference) scin@krakatoa:~$ text-generation-launcher --env
2024-12-11T18:48:31.147398Z INFO text_generation_launcher: Runtime environment:
Target: x86_64-unknown-linux-gnu
Cargo version: 1.80.1
Commit sha: 82c24f7
Docker label: N/A
nvidia-smi:
Wed Dec 11 18:48:30 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.05 Driver Version: 560.35.05 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3090 On | 00000000:01:00.0 Off | N/A |
| 0% 42C P8 13W / 275W | 2MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA GeForce RTX 3090 On | 00000000:02:00.0 Off | N/A |
| 0% 44C P8 25W / 275W | 2MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA GeForce RTX 3090 On | 00000000:03:00.0 Off | N/A |
| 0% 42C P8 24W / 275W | 2MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA GeForce RTX 3090 On | 00000000:04:00.0 Off | N/A |
| 35% 31C P8 31W / 275W | 2MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 4 NVIDIA GeForce RTX 3090 On | 00000000:05:00.0 Off | N/A |
| 0% 42C P8 42W / 275W | 2MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
xpu-smi:
N/A
2024-12-11T18:48:31.147473Z INFO text_generation_launcher: Args {
model_id: "bigscience/bloom-560m",
revision: None,
validation_workers: 2,
sharded: None,
num_shard: None,
quantize: None,
speculate: None,
dtype: None,
kv_cache_dtype: None,
trust_remote_code: false,
max_concurrent_requests: 128,
max_best_of: 2,
max_stop_sequences: 4,
max_top_n_tokens: 5,
max_input_tokens: None,
max_input_length: None,
max_total_tokens: None,
waiting_served_ratio: 0.3,
max_batch_prefill_tokens: None,
max_batch_total_tokens: None,
max_waiting_tokens: 20,
max_batch_size: None,
cuda_graphs: None,
hostname: "0.0.0.0",
port: 3000,
shard_uds_path: "/tmp/text-generation-server",
master_addr: "localhost",
master_port: 29500,
huggingface_hub_cache: None,
weights_cache_override: None,
disable_custom_kernels: false,
cuda_memory_fraction: 1.0,
rope_scaling: None,
rope_factor: None,
json_output: false,
otlp_endpoint: None,
otlp_service_name: "text-generation-inference.router",
cors_allow_origin: [],
api_key: None,
watermark_gamma: None,
watermark_delta: None,
ngrok: false,
ngrok_authtoken: None,
ngrok_edge: None,
tokenizer_config_path: None,
disable_grammar_support: false,
env: true,
max_client_batch_size: 4,
lora_adapters: None,
usage_stats: On,
payload_limit: 2000000,
enable_prefill_logprobs: false,
}
config.json [00:00:00] [████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████] 693 B/693 B 4.39 KiB/s (0s)2024-12-11T18:48:34.581849Z INFO text_generation_launcher: Forcing attention to 'flashdecoding' because head dim is not supported by flashinfer, also disabling prefix caching
2024-12-11T18:48:34.581893Z INFO text_generation_launcher: Using attention flashdecoding - Prefix caching 0
2024-12-11T18:48:34.646542Z WARN text_generation_launcher: Unkown compute for card nvidia-geforce-rtx-3090
2024-12-11T18:48:34.712901Z INFO text_generation_launcher: Default
max_batch_prefill_tokens
to 40962024-12-11T18:48:34.712926Z INFO text_generation_launcher: Using default cuda graphs [1, 2, 4, 8, 16, 32]
2024-12-11T18:48:34.713142Z INFO download: text_generation_launcher: Starting check and download process for bigscience/bloom-560m
2024-12-11T18:48:38.229198Z INFO text_generation_launcher: Download file: model.safetensors
Information
Tasks
Reproduction
(text-generation-inference) scin@krakatoa:~$ CUDA_VISIBLE_DEVICES=0,1,2,3 text-generation-launcher
--model-id /mnt/models/meta-llama/Llama-3.3-70B-Instruct/
--sharded true
--num-shard 4
--quantize bitsandbytes
--max-input-tokens 2048
--max-total-tokens 4096
--max-batch-size 8
2024-12-11T18:25:32.807525Z INFO text_generation_launcher: Args {
model_id: "/mnt/models/meta-llama/Llama-3.3-70B-Instruct/",
revision: None,
validation_workers: 2,
sharded: Some(
true,
),
num_shard: Some(
4,
),
quantize: Some(
Bitsandbytes,
),
speculate: None,
dtype: None,
kv_cache_dtype: None,
trust_remote_code: false,
max_concurrent_requests: 128,
max_best_of: 2,
max_stop_sequences: 4,
max_top_n_tokens: 5,
max_input_tokens: Some(
2048,
),
max_input_length: None,
max_total_tokens: Some(
4096,
),
waiting_served_ratio: 0.3,
max_batch_prefill_tokens: None,
max_batch_total_tokens: None,
max_waiting_tokens: 20,
max_batch_size: Some(
8,
),
cuda_graphs: None,
hostname: "0.0.0.0",
port: 3000,
shard_uds_path: "/tmp/text-generation-server",
master_addr: "localhost",
master_port: 29500,
huggingface_hub_cache: None,
weights_cache_override: None,
disable_custom_kernels: false,
cuda_memory_fraction: 1.0,
rope_scaling: None,
rope_factor: None,
json_output: false,
otlp_endpoint: None,
otlp_service_name: "text-generation-inference.router",
cors_allow_origin: [],
api_key: None,
watermark_gamma: None,
watermark_delta: None,
ngrok: false,
ngrok_authtoken: None,
ngrok_edge: None,
tokenizer_config_path: None,
disable_grammar_support: false,
env: false,
max_client_batch_size: 4,
lora_adapters: None,
usage_stats: On,
payload_limit: 2000000,
enable_prefill_logprobs: false,
}
2024-12-11T18:25:36.061592Z INFO text_generation_launcher: Using attention flashinfer - Prefix caching true
2024-12-11T18:25:36.061634Z INFO text_generation_launcher: Sharding model on 4 processes
2024-12-11T18:25:36.254821Z WARN text_generation_launcher: Unkown compute for card nvidia-geforce-rtx-3090
2024-12-11T18:25:36.337546Z WARN text_generation_launcher: Not enough VRAM to run the model: Available: 97.93GB - Model 127.68GB.
2024-12-11T18:25:36.337590Z INFO text_generation_launcher: Default
max_batch_prefill_tokens
to 40962024-12-11T18:25:36.337607Z WARN text_generation_launcher: Bitsandbytes is deprecated, use
eetq
instead, which provides better latencies overall and is drop-in in most cases.2024-12-11T18:25:36.337622Z WARN text_generation_launcher: Bitsandbytes doesn't work with cuda graphs, deactivating them
2024-12-11T18:25:36.337867Z INFO download: text_generation_launcher: Starting check and download process for /mnt/models/meta-llama/Llama-3.3-70B-Instruct/
2024-12-11T18:25:40.220528Z INFO text_generation_launcher: Files are already present on the host. Skipping download.
2024-12-11T18:25:40.971370Z INFO download: text_generation_launcher: Successfully downloaded weights for /mnt/models/meta-llama/Llama-3.3-70B-Instruct/
2024-12-11T18:25:40.971794Z INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-12-11T18:25:40.979679Z INFO shard-manager: text_generation_launcher: Starting shard rank=1
2024-12-11T18:25:40.979746Z INFO shard-manager: text_generation_launcher: Starting shard rank=3
2024-12-11T18:25:40.979746Z INFO shard-manager: text_generation_launcher: Starting shard rank=2
2024-12-11T18:25:44.579432Z INFO text_generation_launcher: Using prefix caching = True
2024-12-11T18:25:44.579523Z INFO text_generation_launcher: Using Attention = flashinfer
2024-12-11T18:25:45.106837Z WARN text_generation_launcher: exllamav2_kernels not installed.
2024-12-11T18:25:45.116706Z WARN text_generation_launcher: Could not import Mamba: No module named 'mamba_ssm'
2024-12-11T18:25:51.045493Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-12-11T18:25:51.075031Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-12-11T18:25:51.079025Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-12-11T18:25:51.085019Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-12-11T18:26:01.091932Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-12-11T18:26:01.128324Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-12-11T18:26:01.132268Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-12-11T18:26:01.146315Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-12-11T18:26:11.138573Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-12-11T18:26:11.169899Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-12-11T18:26:11.173043Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-12-11T18:26:11.213768Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-12-11T18:26:21.162127Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-12-11T18:26:21.219010Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-12-11T18:26:21.268627Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-12-11T18:26:21.306493Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-12-11T18:26:31.257146Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-12-11T18:26:31.300640Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-12-11T18:26:31.313775Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-12-11T18:26:31.397810Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-12-11T18:26:33.308202Z INFO text_generation_launcher: Using prefill chunking = True
2024-12-11T18:26:36.179734Z INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-1
2024-12-11T18:26:36.207682Z INFO shard-manager: text_generation_launcher: Shard ready in 55.201097835s rank=1
2024-12-11T18:26:36.818240Z INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-2
2024-12-11T18:26:36.818290Z INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-3
2024-12-11T18:26:36.818590Z INFO shard-manager: text_generation_launcher: Shard ready in 55.801796419s rank=3
2024-12-11T18:26:36.824017Z INFO shard-manager: text_generation_launcher: Shard ready in 55.815034637s rank=2
2024-12-11T18:26:37.306767Z INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-0
2024-12-11T18:26:37.384325Z INFO shard-manager: text_generation_launcher: Shard ready in 56.386862739s rank=0
2024-12-11T18:26:37.426239Z INFO text_generation_launcher: Starting Webserver
2024-12-11T18:26:37.518635Z INFO text_generation_router_v3: backends/v3/src/lib.rs:125: Warming up model
2024-12-11T18:26:38.029590Z INFO text_generation_launcher: Using optimized Triton indexing kernels.
2024-12-11T18:26:38.572695Z ERROR text_generation_launcher: Method Warmup encountered an error.
Traceback (most recent call last):
File "/home/scin/miniconda3/envs/text-generation-inference/bin/text-generation-server", line 8, in
sys.exit(app())
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/typer/main.py", line 321, in call
return get_command(self)(*args, **kwargs)
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/typer/core.py", line 728, in main
return _main(
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/typer/core.py", line 197, in _main
rv = self.invoke(ctx)
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/typer/main.py", line 703, in wrapper
return callback(**use_params)
File "/home/scin/text-generation-inference/server/text_generation_server/cli.py", line 117, in serve
server.serve(
File "/home/scin/text-generation-inference/server/text_generation_server/server.py", line 315, in serve
asyncio.run(
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/asyncio/runners.py", line 190, in run
return runner.run(main)
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/asyncio/base_events.py", line 641, in run_until_complete
self.run_forever()
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/asyncio/base_events.py", line 608, in run_forever
self._run_once()
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/asyncio/base_events.py", line 1936, in _run_once
handle._run()
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/asyncio/events.py", line 84, in _run
self._context.run(self._callback, *self._args)
File "/home/scin/miniconda3/envs/text-generation-inference/lib/python3.11/site-packages/grpc_interceptor/server.py", line 165, in invoke_intercept_method
return await self.intercept(
Expected behavior
Successfully start the webserver.
The text was updated successfully, but these errors were encountered: