[TorchFX] Torch FX/PyTorch 2 Export Quantization #2766

alexsu52 · 2024-06-27T14:44:04Z

🚀 Feature request

Quantization is a widely used technique to accelerate models, particularly when using the torch.compile. For detailed tutorials and demonstrations on model quantization using PyTorch 2 Export Quantization, please refer to the following resources:

These guides show how to obtain a quantized model via the PyTorch 2 Export Quantization API and run it using torch.compile. However OpenVINO provide backend for torch.compile, but NNCF does not support quantization PyTorch 2 Export (torch.fx.GraphModule) models and users have to use X86InductorQuantizer to quantize models. Comparisons between PyTorch 2 Export INT8 models quantized by X86InductorQuantizer and OpenVINO INT8 models quantized by NNCF show that NNCF produces more accurate and efficient INT8 models.

Feature request is to support for torch.fx.GraphModule models in nncf.quantize to enable the creation of accurate and highly efficient models using torch.compile with the OpenVINO backend.

Feature Use Case

import torch
import nncf

# initialize a floating point model
float_model = M().eval()

# program capture
# NOTE: this API will be updated to torch.export API in the future, but the captured result should mostly stay the same
model = capture_pre_autograd_graph(float_model, *example_inputs)

# quantization
quantized_model = nncf.quantize(model, calibration_dataset)

# compile quantized model with OpenVINO backend
compiled_model = torch.compile(quantized_model, backend='openvino')

Are you going to submit a PR?

Yes I'd like to help by submitting a PR!

The text was updated successfully, but these errors were encountered:

alexsu52 · 2024-06-27T14:47:33Z

@daniil-lyakhov, please, analyze this feature request and open issues as sub-tasks of this feature request.

alexsu52 · 2024-07-01T11:23:02Z

I suugest to introduce the following API in NNCF, to support third-party quantizers and better alignment with PyTorch 2 Export Quantization API:

class OpenVINOQuantizer(Quantizer):
    # annotate nodes in the graph with observer or fake quant constructors
    # to convey the desired way of quantization
    def annotate(self, model: torch.fx.GraphModule) -> torch.fx.GraphModule:
        pass

    # validate the annotated graph is supported by the backend
    def validate(self, model: torch.fx.GraphModule) -> None:
        pass

    # annotate nodes in the graph with observer or fake quant constructors
    # to convey the desired way of quantization
    @classmethod
    def get_supported_operators(cls) -> List[OperatorConfig]:
        pass


# apply quantization pipeline for torch.export.ExportedProgram
def quantize_pt2e(
    model: torch.export.ExportedProgram, 
    calibration_dataset: Dataset, 
    quantizer: torch.ao.quantization.quantizer.Quantizer, 
    subset_size: int = 300,
    fast_bias_correction: Optional[bool] = True,
    smooth_quant: Optional[bool] = None,
    channel_alignment: Optional[bool] = None,
    bias_correction_params: Optional[AdvancedBiasCorrectionParameters] = None, 
    smooth_quant_alphas: Optional[AdvancedSmoothQuantParameters] = None,
)

### Changes Added a test in tests/torch/fx/test_models.py to include a test for quantized graph which compares the quantized graph with a reference quantized graph. ### Reason for changes To check if the graph was quantized correctly ### Ticket #2766 ### Tests test_quantized_model() was added in tests/torch/fx/test_models.py

…izers (#2854) ### Changes Quantizer merge logic updated to check that all output branches are quantized before quantizers merging and propagating up. ### Reason for changes To prevent merging of quantizers in case of ScaledDotProductAttention op, which should have quantizers on [0, 1] input ports and shouldn't have a quantizer on the 3 input port. ### Related tickets 148211 #2766 ### Tests * Common solver test for ScaleDotProductAttention branch merging and quantization initialization * Graph tests for torch/ov backends

### Changes Conformance test for resnet18 ### Reason for changes To extend testing scope for the TorchFX backend ### Related tickets #2766 ### Tests post_training_quantization/442 is successfull

### Changes Torch FX pre-hook insertion support ### Reason for changes To enable vit_b_16 quantization ### Related tickets #2766 ### Tests test_quantized_models is updated by vit_b_16 and swin_v2_s

### Changes Constant linear layers support ### Reason for changes To support swint_v2_s FBC ### Related tickets #2766 ### Tests Build post_training_quantization/444/ is finished successfully Unit test `test_model_transformer.test_model_extraction` is presented

### Changes TorchFX SmoothQuant backend implementation * module_insertion_transformation_builder is introduced * Transformation requires names for new modules and nodes * vit_b_16 is introduced in the conformance tests ### Reason for changes To improve metrics of quantized models: swin_v2_s and vit_b_16 * To insert SQ multiply nodes to the graph * To make node names human-readable and consistent * To check sq algorithm E2E ### Related tickets #2766 ### Tests * Smooth quant test template is implemented for TorchfX backed * Conformance test: post_training_quantization/446/ is successfull * Test models check SQ multiplies for swin_v2_s and vit_b_16 models

alexsu52 · 2024-10-09T06:26:59Z

### Changes Transformation for removing fake quantize nodes and saving all weights to disk in int8 format after quantization. It works as follows: 1. Reshape the scale if qdq operation is per-channel. 2. Pattern match the quantize-dequantize nodes. 3. Filter the matches to only include quantize-dequantize ops with constant input. 4. Replace with the multiplication of the scale and input. ### Reason for changes To compress the model after quantization ### Tests Add `test_post_quantization_compression()` in `tests/torch/fx/test_model_transformer.py` which checks the data type of all weights in the model after applying quantization and also checks the value after the decompression step (element-wise multiplication operation). ### Tickets #2766 --------- Co-authored-by: Daniil Lyakhov <[email protected]>

### Changes * Resnet18 TorchFX example ### Reason for changes * To showcase NNCF TorchFX quantization ### Related tickets #2766 ### Tests test_examples/544/ - Done

daniil-lyakhov · 2024-10-23T10:24:49Z

I suugest to introduce the following API in NNCF, to support third-party quantizers and better alignment with PyTorch 2 Export Quantization API:

class OpenVINOQuantizer(Quantizer):
    # annotate nodes in the graph with observer or fake quant constructors
    # to convey the desired way of quantization
    def annotate(self, model: torch.fx.GraphModule) -> torch.fx.GraphModule:
        pass

    # validate the annotated graph is supported by the backend
    def validate(self, model: torch.fx.GraphModule) -> None:
        pass

    # annotate nodes in the graph with observer or fake quant constructors
    # to convey the desired way of quantization
    @classmethod
    def get_supported_operators(cls) -> List[OperatorConfig]:
        pass


# apply quantization pipeline for torch.export.ExportedProgram
def quantize_pt2e(
    model: torch.export.ExportedProgram, 
    calibration_dataset: Dataset, 
    quantizer: torch.ao.quantization.quantizer.Quantizer, 
    subset_size: int = 300,
    fast_bias_correction: Optional[bool] = True,
    smooth_quant: Optional[bool] = None,
    channel_alignment: Optional[bool] = None,
    bias_correction_params: Optional[AdvancedBiasCorrectionParameters] = None, 
    smooth_quant_alphas: Optional[AdvancedSmoothQuantParameters] = None,
)

plus parameter range estimators

### Changes * ~~Constant folding is applied to all TorchFX models before the quantization~~ * Some torchvision models (swin_v2_s, vit_16_b) are exported by `torch.export.export` before ov conversation * Moc transformations are applied to openvino compressed models after the compression After the #2984 * Fixed `_compress_qdq_constant_transformation` for per tensor case ### Reason for changes * To align TorchFX/OV quantized models ### Related tickets #2766 ### Tests post_training_quantization/504/ is finished successfully

### Changes Constant folding is enabled by default in TorchFX backend ### Reason for changes To align quantizers placement between OV and TorchFX ### Related tickets #2766 ### Tests * test_constant_folding * test_constant_folding_with_constraints * test_models.py references are updated * post_training_quantization/535/ - finished successfully --------- Co-authored-by: Alexander Suslov <[email protected]> Co-authored-by: Aamir Nazir <[email protected]>

### Changes * TorchFX Unit tests are moved from `torch._export.capture_pre_autograd_graph` to `torch.export.export_for_training` ALL REFERENCE GRAPHS WERE VALIDATED MANUALLY * BC types for `fuse_bn_node` are updated * NNCFGraphBuilder is updated to support a batch-norm type with only one output node (instead of three) * Model extractor does not traverse down from constans to prevent redundant nodes in the extracted model when the constant is shared * `shared_constants_unification_transformation` is removed * Tests which require `capture_pre_autograd_graph` are removed ### Reason for changes * To migrate to the lates and recommended export method for TorchFX backend ### Related tickets #2766 ### Tests test_shared_constants_unification_not_connected_const post_training_quantization/540/ is finished successfully

…it#3075) ### Changes * TorchFX Unit tests are moved from `torch._export.capture_pre_autograd_graph` to `torch.export.export_for_training` ALL REFERENCE GRAPHS WERE VALIDATED MANUALLY * BC types for `fuse_bn_node` are updated * NNCFGraphBuilder is updated to support a batch-norm type with only one output node (instead of three) * Model extractor does not traverse down from constans to prevent redundant nodes in the extracted model when the constant is shared * `shared_constants_unification_transformation` is removed * Tests which require `capture_pre_autograd_graph` are removed ### Reason for changes * To migrate to the lates and recommended export method for TorchFX backend ### Related tickets openvinotoolkit#2766 ### Tests test_shared_constants_unification_not_connected_const post_training_quantization/540/ is finished successfully

PR #3075 to the release branch: ### Changes * TorchFX Unit tests are moved from `torch._export.capture_pre_autograd_graph` to `torch.export.export_for_training` ALL REFERENCE GRAPHS WERE VALIDATED MANUALLY * BC types for `fuse_bn_node` are updated * NNCFGraphBuilder is updated to support a batch-norm type with only one output node (instead of three) * Model extractor does not traverse down from constans to prevent redundant nodes in the extracted model when the constant is shared * `shared_constants_unification_transformation` is removed * Tests which require `capture_pre_autograd_graph` are removed ### Reason for changes * To migrate to the lates and recommended export method for TorchFX backend ### Related tickets #2766 ### Tests test_shared_constants_unification_not_connected_const post_training_quantization/540/ is finished successfully

### Changes * Main README.md, Usage.md and post training quantization docs are updated with info about the TorchFX ### Reason for changes * To reflect new experimental features of TorchFX in the docs ### Related tickets #2766

PR #2917 to the release branch ### Changes * Main README.md, Usage.md and post training quantization docs are updated with info about the TorchFX ### Reason for changes * To reflect new experimental features of TorchFX in the docs ### Related tickets #2766

### Changes * Torch SDPA pattern is updated * As the concat node has his input nodes in format `args=([inp_1, ..., inp_n], dim)`, thus it should be treated differently. Retrieving concat inputs by input port id was supported in each TorchFX transformation ### Reason for changes * To support quantization of ultralytics/yolo11n in TorchFX backend ### Related tickets #2766 157032 ### Tests * `tests/torch/fx/test_model_transformer.py` and `tests/torch/fx/test_compress_weights.py` are updated to check all cases with the concat node. All .`dot` / `.json` were checked manually. * `tests/torch/fx/test_models.py` is updated with `YOLO11N_SDPABlock` synthetic model to check the correctness of SDPA pattern matching

### Changes All `capture_pre_autograd_graph` calls in the conformance test were replaced by `torch.export.export_for_training`. ### Reason for changes To remove deprecated `capture_pre_autograd_graph` from the conformance test. ### Related tickets #2766 ### Tests post_training_quantization/555/ have finished succesfully

…notoolkit#3078) ### Changes All `capture_pre_autograd_graph` calls in the conformance test were replaced by `torch.export.export_for_training`. ### Reason for changes To remove deprecated `capture_pre_autograd_graph` from the conformance test. ### Related tickets openvinotoolkit#2766 ### Tests post_training_quantization/555/ have finished succesfully

### Changes * Bias fusing is removed from default transformations * `constant_folding` is updated to remove inplace operations without users * `extract_model` is updated to support original model output as a subgraph output ### Reason for changes To make it possible to apply quantization the same way it done by X86Quantizer ### Related tickets #2766 110985 ### Tests * All int8 references are updated and checked manually * `test_constant_folding` and `test_constant_folding_with_constraints` are updated with a constant subgraph which contains an inplace op (`relu_`) * `test_model_extraction_with_original_output` is introduced * conformance test post_training_quantization/557 have finished successfully

### Changes Folded constants do not require gradient ### Reason for changes * To unify all model constant/buffers * To make compressed model deepcopy-able ### Related tickets #2766 ### Tests `test_constant_folding` is updated

…it#3075) ### Changes * TorchFX Unit tests are moved from `torch._export.capture_pre_autograd_graph` to `torch.export.export_for_training` ALL REFERENCE GRAPHS WERE VALIDATED MANUALLY * BC types for `fuse_bn_node` are updated * NNCFGraphBuilder is updated to support a batch-norm type with only one output node (instead of three) * Model extractor does not traverse down from constans to prevent redundant nodes in the extracted model when the constant is shared * `shared_constants_unification_transformation` is removed * Tests which require `capture_pre_autograd_graph` are removed ### Reason for changes * To migrate to the lates and recommended export method for TorchFX backend ### Related tickets openvinotoolkit#2766 ### Tests test_shared_constants_unification_not_connected_const post_training_quantization/540/ is finished successfully

### Changes * Main README.md, Usage.md and post training quantization docs are updated with info about the TorchFX ### Reason for changes * To reflect new experimental features of TorchFX in the docs ### Related tickets openvinotoolkit#2766

alexsu52 added the enhancement New feature or request label Jun 27, 2024

alexsu52 assigned daniil-lyakhov Jun 27, 2024

alexsu52 changed the title ~~Torch FX Quantization~~ Torch FX/PyTorch 2 Export Quantization Jun 27, 2024

This was referenced Jul 1, 2024

[Good First Issue][NNCF][TorchFX]: Test statistic aggregator #2777

Closed

[Good First Issue][NNCF][TorchFX]: Test model transformer #2775

Closed

MaximProshin changed the title ~~Torch FX/PyTorch 2 Export Quantization~~ [TorchFX] Torch FX/PyTorch 2 Export Quantization Jul 5, 2024

daniil-lyakhov mentioned this issue Jul 13, 2024

[NNCF][TorchFX]: Test Fast Bias Correction algorithm #2809

Closed

1 task

This was referenced Jul 24, 2024

[TorchFX] Conformance test init #2841

Merged

Remove gather metatype from input quantizable #2840

Closed

[Experimental] TorchFX PTQ backend #2764

Merged

This was referenced Aug 6, 2024

[Good First Issue][NNCF]: [TorchFX] Test MinMax algorithm #2872

Closed

[Good First Issue][NNCF]: [TorchFX] Test PTQ MinMax parameters #2873

Closed

[Good First Issue][NNCF]: [TorchFX] Test PTQ quantizer config #2874

Closed

daniil-lyakhov mentioned this issue Aug 9, 2024

[TorchFX] SmoothQuant algorithm implementation #2875

Merged

This was referenced Aug 20, 2024

[TorchFX] Deptwise convolution support #2896

Merged

[TorchFX] Bias correction implementation #2882

Merged

[TorchFX] Performance check #2899

Draft

daniil-lyakhov mentioned this issue Oct 4, 2024

[Conformance] TorchFX/OV backends Alignment #2996

Merged

anzr299 mentioned this issue Oct 17, 2024

[Torch FX] Post Quantize Weights Compression #2984

Merged

AlexanderDokuchaev pushed a commit that referenced this issue Oct 22, 2024

[TorchFX] Rensen18 example (#2913)

7263096

### Changes * Resnet18 TorchFX example ### Reason for changes * To showcase NNCF TorchFX quantization ### Related tickets #2766 ### Tests test_examples/544/ - Done

daniil-lyakhov mentioned this issue Oct 30, 2024

[TorchFX] Constant folding #3047

Merged

daniil-lyakhov mentioned this issue Nov 12, 2024

[TorchFX] Export to torch.export.export_for_training #3075

Merged

daniil-lyakhov mentioned this issue Nov 14, 2024

[Release][TorchFX] Export to torch.export.export_for_training #3087

Merged

daniil-lyakhov mentioned this issue Nov 15, 2024

[Release][TorchFX] Documetation update #3090

Merged

This was referenced Nov 22, 2024

[TorchFX][Conformance] Move all models to export_for_training #3078

Merged

[TorchFX] YoloV11 support #3105

Merged

daniil-lyakhov mentioned this issue Nov 25, 2024

[TorchFX] Bias fusing is removed from default transformations #3027

Merged

daniil-lyakhov mentioned this issue Nov 28, 2024

[Experimental][TorchFX] quantize_pt2e + X86Quantizer introduction #3121

Open

daniil-lyakhov mentioned this issue Dec 3, 2024

[TorchFX][MicroFix] Folded constants do not require grad #3128

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TorchFX] Torch FX/PyTorch 2 Export Quantization #2766

[TorchFX] Torch FX/PyTorch 2 Export Quantization #2766

alexsu52 commented Jun 27, 2024 •

edited by daniil-lyakhov

Loading

alexsu52 commented Jun 27, 2024

alexsu52 commented Jul 1, 2024

alexsu52 commented Oct 9, 2024 •

edited by daniil-lyakhov

Loading

daniil-lyakhov commented Oct 23, 2024 •

edited

Loading

[TorchFX] Torch FX/PyTorch 2 Export Quantization #2766

[TorchFX] Torch FX/PyTorch 2 Export Quantization #2766

Comments

alexsu52 commented Jun 27, 2024 • edited by daniil-lyakhov Loading

🚀 Feature request

Feature Use Case

Are you going to submit a PR?

alexsu52 commented Jun 27, 2024

alexsu52 commented Jul 1, 2024

alexsu52 commented Oct 9, 2024 • edited by daniil-lyakhov Loading

daniil-lyakhov commented Oct 23, 2024 • edited Loading

alexsu52 commented Jun 27, 2024 •

edited by daniil-lyakhov

Loading

alexsu52 commented Oct 9, 2024 •

edited by daniil-lyakhov

Loading

daniil-lyakhov commented Oct 23, 2024 •

edited

Loading