Add quantize/dequantize_per_channel_group support to QNN backend#19629
Open
Hyungkeun-Park-Nota wants to merge 1 commit into
Open
Add quantize/dequantize_per_channel_group support to QNN backend#19629Hyungkeun-Park-Nota wants to merge 1 commit into
Hyungkeun-Park-Nota wants to merge 1 commit into
Conversation
quantized_decomposed.quantize_per_channel_group and dequantize_per_channel_group are used for LLM weight-only quantization (e.g. int4 group-wise) but were not recognized by the QNN backend, causing the ops to be decomposed or failing with a KeyError in InsertIOQDQ. Five files are changed: - builders/node_visitor.py: add both ops to q_ops/dq_ops and to the per_block_encoding set so get_quant_encoding_conf routes them through make_qnn_per_block_config, matching the existing torchao affine path. - partition/utils.py: add both ops to get_skip_decomp_table so they are preserved as-is during torch.export and reach the backend. - _passes/utils.py: remap "scales" -> QCOM_SCALE in get_quant_attrs, parallel to the existing "input_dtype" -> QCOM_DTYPE remap, so AnnotateQuantAttrs correctly propagates the scale tensor for per-channel-group nodes. - _passes/qnn_pass_manager.py: dynamically register both ops into node_visitor.q_ops/dq_ops in get_to_edge_transform_passes, following the same pattern used for torchao ops. - _passes/insert_io_qdq.py: add per-channel-group entries to q_dq_map to fix KeyError when a pre-quantized weight node with dequantize_per_channel_group encoding feeds the graph output. A unit test is added to test_passes.py that injects per_channel_group quant attrs onto a node feeding the output and verifies InsertIOQDQ completes without KeyError.
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19629
Note: Links to docs will display an error until the docs builds have been completed. ❗ 2 Active SEVsThere are 2 currently active SEVs. If your PR is affected, please view them below:
|
Contributor
Author
|
@pytorchbot label "release notes: qualcomm" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
quantized_decomposed.quantize_per_channel_groupanddequantize_per_channel_groupare used for LLM weight-only quantization (e.g. int4 group-wise / per-channel-group). The QNN backend did not recognize these ops, causing two distinct failures:torch.exportinstead of being preserved for the backend.InsertIOQDQraised aKeyErrorwhen a pre-quantized weight node annotated withdequantize_per_channel_groupencoding fed the graph output, because the op was absent fromq_dq_map.This change adds the same treatment that
torchao.quantize_affine/dequantize_affinealready receive, applying it to thequantized_decomposedper-channel-group variants.Changes
backends/qualcomm/builders/node_visitor.pyq_ops/dq_opssets.per_block_encodingsoget_quant_encoding_confroutes them throughmake_qnn_per_block_config.backends/qualcomm/partition/utils.pyget_skip_decomp_tableso they are preserved during export and reach the backend unchanged.backends/qualcomm/_passes/utils.py"scales"→QCOM_SCALEinget_quant_attrs, parallel to the existing"input_dtype"→QCOM_DTYPEremap for torchao ops, soAnnotateQuantAttrscorrectly propagates the scale tensor for per-channel-group nodes.backends/qualcomm/_passes/qnn_pass_manager.pynode_visitor.q_ops/dq_opsinget_to_edge_transform_passes, following the same pattern used for torchao ops.backends/qualcomm/_passes/insert_io_qdq.pyq_dq_mapto fix theKeyError.Testing
Unit test added to
backends/qualcomm/tests/test_passes.py:The existing
InsertIOQDQtests also continue to pass:The full QNN export pipeline (annotation → fold QDQ → insert IO QDQ → layout transform → partition) completes without errors for a quantized linear model. End-to-end execution on device was not verified in this environment.