TensorRT-LLM 0.13.0 Release #2270
Shixiaowei02
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
We are very pleased to announce the 0.13.0 version of TensorRT-LLM. This update includes:
Key Features and Enhancements
docs/source/speculative_decoding.md
.ModelWeightsLoader
(a unified checkpoint converter, seedocs/source/architecture/model-weights-loader.md
).*.bin
and*.pth
.LLM
class.trust_remote_code
for customized models and tokenizers downloaded from Hugging Face Hub.curand
andbfloat16
support forReDrafter
.ModelRunnerCpp
class.head_size=48
cases for FMHA kernels.examples/dit/README.md
.executor
API.API Changes
use_fused_mlp
toTrue
by default.multi_block_mode
by default.strongly_typed
by default inbuilder
API.maxNewTokens
,randomSeed
andminLength
tomaxTokens
,seed
andminTokens
following OpenAI style.LLM
classLLM.generate
arguments to includePromptInputs
andtqdm
.executor
APILogitsPostProcessorConfig
.FinishReason
toResult
.Model Updates
examples/gemma/README.md
.Fixed Issues
remove_input_padding
is enabled #1999)smoothquant
. (convert qwen2-0.5b-instruct failed when using smoothquant #2087)exclude_modules
pattern inconvert_utils.py
to the changes inquantize.py
. ([Fix] Match exclude_modules pattern in convert_utils.py to quantize.py changes. #2113)FORCE_NCCL_ALL_REDUCE_STRATEGY
is set.gpt_attention
.LoraConfig
. (in python 3.11 and release 0.8.0:ValueError: mutable default <class 'datasets.utils.version.Version'> for field version is not allowed: use default_factory
#1323)Infrastructure Changes
nvcr.io/nvidia/pytorch:24.07-py3
.nvcr.io/nvidia/tritonserver:24.07-py3
.We are updating the
main
branch regularly with new features, bug fixes and performance optimizations. Therel
branch will be updated less frequently, and the exact frequencies depend on your feedback.Thanks,
The TensorRT-LLM Engineering Team
This discussion was created from the release TensorRT-LLM 0.13.0 Release.
Beta Was this translation helpful? Give feedback.
All reactions