v1.1.0: ORTTrainer, Seq2SeqORTTrainer, ONNX Runtime optimization and quantization API improvements
ORTTrainer and Seq2SeqORTTrainer
The ORTTrainer
and Seq2SeqORTTrainer
are two newly experimental classes.
- Both
ORTTrainer
andSeq2SeqORTTrainer
were created to have a similar user-facing API as theTrainer
andSeq2SeqTrainer
of the Transformers library. ORTTrainer
allows the usage of the ONNX Runtime backend to train a given PyTorch model in order to accelerate training. ONNX Runtime will run the forward and backward passes using an optimized automatically-exported ONNX computation graph, while the rest of the training loop is executed by native PyTorch.ORTTrainer
allows the usage of ONNX Runtime inferencing during both the evaluation and the prediction step.- For
Seq2SeqORTTrainer
, ONNX Runtime inferencing is incompatible with--predict_with_generate
, as the generate method is not supported yet.
ONNX Runtime optimization and quantization APIs improvements
The ORTQuantizer
and ORTOptimizer
classes underwent a massive refactoring that should allow a simpler and more flexible user-facing API.
- Addition of the possibility to iteratively compute the quantization activation ranges when applying static quantization by using the
ORTQuantizer
methodpartial_fit
. This is especially useful when using memory-hungry calibration methods such as Entropy and Percentile methods. - When using the MinMax calibration method, it is now possible to compute the moving average of the minimum and maximum values representing the activations quantization ranges instead of the global minimum and maximum (feature available with onnxruntime v1.11.0 or higher).
- The classes
OptimizationConfig
,QuantizationConfig
andCalibrationConfig
were added in order to better segment the different ONNX Runtime related parameters instead of having one unique configurationORTConfig
. - The
QuantizationPreprocessor
class was added in order to find the nodes to include and / or exclude from quantization, by finding the nodes following a given pattern (such as the nodes forming LayerNorm for example). This is particularly useful in the context of static quantization, where the quantization of modules such as LayerNorm or GELU are responsible of important drop in accuracy.