Skip to content

Releases: NVIDIA/TensorRT

TensorRT OSS v10.7.0

05 Dec 21:16
17003e4
Compare
Choose a tag to compare

10.7.0 GA

For more information, see the TensorRT 10.7.0 release notes.

Key Feature and Updates:

  • Demo Changes

    • demoDiffusion
      • Enabled low-vram for the Flux pipeline. Users can now run the pipelines on systems with 32GB VRAM.
      • Added support for FLUX.1-schnell pipeline.
      • Enabled weight streaming mode for Flux pipeline.
  • Plugin Changes

    • On Blackwell and later platforms, TensorRT will drop cuDNN support on the following categories of plugins
      • User-written IPluginV2Ext, IPluginV2DynamicExt, and IPluginV2IOExt plugins that are dependent on cuDNN handles provided by TensorRT (via the attachToContext() API).
      • TensorRT standard plugins that use cuDNN, specifically:
        • InstanceNormalization_TRT (version: 1, 2, and 3) present in plugin/instanceNormalizationPlugin/.
        • GroupNormalizationPlugin (version: 1) present in plugin/groupNormalizationPlugin/.
        • Note: These normalization plugins are superseded by TensorRT’s native INormalizationLayer (C++, Python). TensorRT support for cuDNN-dependent plugins remain unchanged on pre-Blackwell platforms.
  • Parser Changes

    • Now prioritizes using plugins over local functions when a corresponding plugin is available in the registry.
    • Added dynamic axes support for Squeeze and Unsqueeze operations.
    • Added support for parsing mixed-precision BatchNormalization nodes in strongly-typed mode.
  • Addressed Issues

TensorRT OSS v10.6.0

05 Nov 21:55
c468d67
Compare
Choose a tag to compare

10.6.0 GA

For more information, see the TensorRT 10.6.0 release notes.

Key Feature and Updates:

  • Demo Changes

    • demoBERT: The use of fcPlugin in demoBERT has been removed.
    • demoBERT: All TensorRT plugins now used in demoBERT (CustomEmbLayerNormDynamic, CustomSkipLayerNormDynamic, and CustomQKVToContextDynamic) now have versions that inherit from IPluginV3 interface classes. The user can opt-in to use these V3 plugins by specifying --use-v3-plugins to the builder scripts.
      • Opting-in to use V3 plugins does not affect performance, I/O, or plugin attributes.
      • There is a known issue in the V3 (version 4) of CustomQKVToContextDynamic plugin from TensorRT 10.6.0, causing an internal assertion error if either the batch or sequence dimensions differ at runtime from the ones used to serialize the engine. See the “known issues” section of the TensorRT-10.6.0 release notes.
      • For smoother migration, the default behavior is still using the deprecated IPluginV2DynamicExt-derived plugins, when the flag: --use-v3-plugins isn't specified in the builder scripts. The flag --use-deprecated-plugins was added as an explicit way to enforce the default behavior, and is mutually exclusive with --use-v3-plugins.
    • demoDiffusion
      • Introduced BF16 and FP8 support for the Flux.1-dev pipeline.
      • Expanded FP8 support on Ada platforms.
      • Enabled LoRA adapter compatibility for SDv1.5, SDv2.1, and SDXL pipelines using Diffusers version 0.30.3.
  • Sample Changes

    • Added the Python sample quickly_deployable_plugins, which demonstrates quickly deployable Python-based plugin definitions (QDPs) in TensorRT. QDPs are a simple and intuitive decorator-based approach to defining TensorRT plugins, requiring drastically less code.
  • Plugin Changes

    • The fcPlugin has been deprecated. Its functionality has been superseded by the IMatrixMultiplyLayer that is natively provided by TensorRT.
    • Migrated IPluginV2-descendent version 1 of CustomEmbLayerNormDynamic, to version 6, which implements IPluginV3.
      • The newer versions preserve the attributes and I/O of the corresponding older plugin version.
      • The older plugin versions are deprecated and will be removed in a future release.
  • Parser Changes

    • Updated ONNX submodule version to 1.17.0.
    • Fixed issue where conditional layers were incorrectly being added.
    • Updated local function metadata to contain more information.
    • Added support for parsing nodes with Quickly Deployable Plugins.
    • Fixed handling of optional outputs.
  • Tool Updates

    • ONNX-Graphsurgeon updated to version 0.5.3
    • Polygraphy updated to 0.49.14.

TensorRT OSS v10.5.0

10 Oct 19:47
c8a5043
Compare
Choose a tag to compare

Release 10.5-GA

Key Features and Updates:

  • Demo changes
  • Sample changes
    • None
  • Plugin changes
    • Migrated IPluginV2-descendent versions of bertQKVToContextPlugin (1, 2, 3) to newer versions (4, 5, 6 respectively) which implement IPluginV3.
    • Note:
      • The newer versions preserve the attributes and I/O of the corresponding older plugin version
      • The older plugin versions are deprecated and will be removed in a future release
  • Quickstart guide
    • None
  • Parser changes
    • Added support for real-valued STFT operations
    • Improved error handling in IParser

Known issues:

  • Demos:
    • TensorRT engine might not be build successfully when using --fp8 flag on H100 GPUs.

TensorRT OSS v10.4.0

12 Sep 00:59
866548c
Compare
Choose a tag to compare

10.4.0 GA - 2024-09-11

Key Features and Updates:

  • Demo changes

    • Added Stable Cascade pipeline.
    • Enabled INT8 and FP8 quantization for Stable Diffusion v1.5, v2.0 and v2.1 pipelines.
    • Enabled FP8 quantization for Stable Diffusion XL pipeline.
  • Sample changes

    • Add a new python sample aliased_io_plugin which demonstrates how in-place updates to plugin inputs can be achieved through I/O aliasing.
  • Plugin changes

    • Migrated IPluginV2-descendent versions (a) of the following plugins to newer versions (b) which implement IPluginV3 (a->b):
      • scatterElementsPlugin (1->2)
      • skipLayerNormPlugin (1->5, 2->6, 3->7, 4->8)
      • embLayerNormPlugin (2->4, 3->5)
      • bertQKVToContextPlugin (1->4, 2->5, 3->6)
    • Note
      • The newer versions preserve the corresponding attributes and I/O of the corresponding older plugin version.
      • The older plugin versions are deprecated and will be removed in a future release.
  • Quickstart guide

  • Parser changes

    • Added support for tensor axes for Pad operations.
    • Added support for BlackmanWindow, HammingWindow, and HannWindow operations.
    • Improved error handling in IParserRefitter.
    • Fixed kernel shape inference in multi-input convolutions.
  • Updated tooling

    • polygraphy-extension-trtexec v0.0.9

TensorRT OSS v10.3.0

08 Aug 23:23
c5b9de3
Compare
Choose a tag to compare

10.3.0 GA

Key Features and Updates:

  • Demo changes
  • Plugin changes
    • Deprecated Version 1 of ScatterElements plugin. It is superseded by Version 2, which implements the IPluginV3 interface.
  • Quickstart guide
  • Parser changes
    • Added support for tensor axes inputs for Slice node.
    • Updated ScatterElements importer to use Version 2 of ScatterElements plugin, which implements the IPluginV3 interface.
  • Updated tooling
    • Polygraphy v0.49.13

TensorRT OSS v10.2.0

15 Jul 16:16
2332a71
Compare
Choose a tag to compare

Key Features and Updates:

  • Demo changes
  • Plugin changes
    • Version 3 of the InstanceNormalization plugin (InstanceNormalization_TRT) has been added. This version is based on the IPluginV3 interface and is used by the TensorRT ONNX parser when native InstanceNormalization is disabled.
  • Tooling changes
    • Pytorch Quantization development has transitioned to TensorRT Model Optimizer. All developers are encouraged to use TensorRT Model Optimizer to benefit from the latest advancements on quantization and compression.
  • Build containers
    • Updated default cuda versions to 12.5.0.

TensorRT OSS v10.1.0

18 Jun 00:26
9db1508
Compare
Choose a tag to compare

Key Features and Updates:

  • Parser changes
    • Added supportsModelV2 API
    • Added support for DeformConv operation
    • Added support for PluginV3 TensorRT Plugins
    • Marked all IParser and IParserRefitter APIs as noexcept
  • Plugin changes
    • Added version 2 of ROIAlign_TRT plugin, which implements the IPluginV3 plugin interface. When importing an ONNX model with the RoiAlign op, this new version of the plugin will be inserted to the TRT network.
  • Samples changes
  • Updated tooling
    • Polygraphy v0.49.12
    • ONNX-GraphSurgeon v0.5.3

TensorRT OSS v10.0.1

30 Apr 18:05
d2f4ef7
Compare
Choose a tag to compare

Key Features and Updates:

  • Parser changes
    • Added support for building with protobuf-lite.
    • Fixed issue when parsing and refitting models with nested BatchNormalization nodes.
    • Added support for empty inputs in custom plugin nodes.
  • Demo changes
    • The following demos have been removed: Jasper, Tacotron2, HuggingFace Diffusers notebook
  • Updated tooling
    • Polygraphy v0.49.10
    • ONNX-GraphSurgeon v0.5.2
  • Build Containers
    • Updated default cuda versions to 12.4.0.
    • Added Rocky Linux 8 and Rocky Linux 9 build containers

TensorRT v10.0.0

03 Apr 21:45
Compare
Choose a tag to compare

Key Features and Updates:

  • Samples changes
    • Added a sample showcasing weight-stripped engines.
    • Added a sample demonstrating the use of custom tactics with IPluginV3.
    • Added a sample to showcase plugins with data-dependent output shapes, using IPluginV3.
  • Parser changes
    • Added a new class IParserRefitter that can be used to refit a TensorRT engine with the weights of an ONNX model.
    • kNATIVE_INSTANCENORM is now set to ON by default.
    • Added support for IPluginV3 interfaces from TensorRT.
    • Added support for INT4 quantization.
    • Added support for the reduction attribute in ScatterElements.
    • Added support for wrap padding mode in Pad
  • Plugin changes
    • A new plugin has been added in compliance with ONNX ScatterElements.
    • The TensorRT plugin library no longer has a load-time link dependency on cuBLAS or cuDNN libraries.
    • All plugins which relied on cuBLAS/cuDNN handles passed through IPluginV2Ext::attachToContext() have moved to use cuBLAS/cuDNN resources initialized by the plugin library itself. This works by dynamically loading the required cuBLAS/cuDNN library. Additionally, plugins which independently initialized their cuBLAS/cuDNN resources have also moved to dynamically loading the required library. If the respective library is not discoverable through the library path(s), these plugins will not work.
    • bertQKVToContextPlugin: Version 2 of this plugin now supports head sizes less than or equal to 32.
    • reorgPlugin: Added a version 2 which implements IPluginV2DynamicExt.
    • disentangledAttentionPlugin: Fixed a kernel bug.
  • Demo changes
    • HuggingFace demos have been removed. For all users using TensorRT to accelerate Large Language Model inference, please use TensorRT-LLM.
  • Updated tooling
    • Polygraphy v0.49.9
    • ONNX-GraphSurgeon v0.5.1
    • TensorRT Engine Explorer v0.1.8
  • Build Containers
    • RedHat/CentOS 7.x are no longer officially supported starting with TensorRT 10.0. The corresponding container has been removed from TensorRT-OSS.

TensorRT OSS v9.3.0

09 Feb 22:30
6d1397e
Compare
Choose a tag to compare

TensorRT OSS release corresponding to TensorRT 9.3.0.1 release.

Updates since TensorRT 9.2.0 release.

Key Features and Updates:

  • Faster Text-to-image using SDXL & INT8 quantization using AMMO
  • Updated Polygraphy v0.49.7