You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ported NNCF OpenVINO backend to use the nGraph representation of OpenVINO models.
Changed dependecies of NNCF OpenVINO backend. It now depends on openvino package and not on the openvino-dev package.
Added GRU/LSTM quantization support.
Added quantizer scales unification.
Added support for models with 3D and 5D Depthwise convolution.
Added FP16 OpenVINO models support.
Added "overflow_fix" parameter (for quantize(...) & quantize_with_accuracy_control(...) methods) support & functionality. It improves accuracy for optimized model for affected devices. More details in Quantization section.
(OpenVINO) Added support for in-place statistics collection (reduce memory footprint during optimization).
(OpenVINO) Added Quantization with accuracy control algorithm.
quantize(...) method can generate inaccurate int8 results for models with the DenseNet-like architecture. Use quantize_with_accuracy_control(...) in such case.
quantize(...) method can hang on models with transformer architecture when fast_bias_correction optional parameter is set to False. Don't set it to False or use quantize_with_accuracy_control(...) in such case.
quantize(...) method can generate inaccurate int8 results for models with the MobileNet-like architecture on non-VNNI machines.
Compression-aware training:
New Features:
Introduced automated structured pruning algorithm for JPQD with support for BERT, Wave2VecV2, Swin, ViT, DistilBERT, CLIP, and MobileBERT models.
Added nncf.common.utils.patcher.Patcher - this class can be used to patch methods on live PyTorch model objects with wrappers such as nncf.torch.dynamic_graph.context.no_nncf_trace when doing so in the model code is not possible (e.g. if the model comes from an external library package).
Compression controllers of the nncf.api.compression.CompressionAlgorithmController class now have a .strip() method that will return the compressed model object with as many custom NNCF additions removed as possible while preserving the functioning of the model object as a compressed model.
Fixes:
Fixed statistics computation for pruned layers.
(PyTorch) Fixed traced tensors to implement the YOLOv8 from Ultralytics.
Improvements:
Extension of attributes (transpose/permute/getitem) for pruning node selector.
NNCFNetwork was refactored from a wrapper-approach to a mixin-like approach.
Added average pool 3d-like ops to pruning mask.
Added Conv3d for overflow fix.
nncf.set_log_file(...) can now be used to set location of the NNCF log file.
(PyTorch) Added support for pruning of torch.nn.functional.pad operation.
(PyTorch) Added torch.baddbmm as an alias for the matmul metatype for quantization purposes.
(PyTorch) Added config file for ResNet18 accuracy-aware pruning + quantization on CIFAR10.
(PyTorch) Fixed JIT-traceable PyTorch models with internal patching.
(PyTorch) Added __matmul__ magic functions to the list of patched ops (for SwinTransformer by Microsoft).