Add quantize/dequantize_per_channel_group support to QNN backend by Hyungkeun-Park-Nota · Pull Request #19629 · pytorch/executorch

Hyungkeun-Park-Nota · 2026-05-18T08:48:21Z

Motivation

quantized_decomposed.quantize_per_channel_group and dequantize_per_channel_group are used for LLM weight-only quantization (e.g. int4 group-wise / per-channel-group). The QNN backend did not recognize these ops, causing two distinct failures:

The ops were decomposed away during torch.export instead of being preserved for the backend.
InsertIOQDQ raised a KeyError when a pre-quantized weight node annotated with dequantize_per_channel_group encoding fed the graph output, because the op was absent from q_dq_map.

This change adds the same treatment that torchao.quantize_affine / dequantize_affine already receive, applying it to the quantized_decomposed per-channel-group variants.

Changes

backends/qualcomm/builders/node_visitor.py

Add both ops to q_ops / dq_ops sets.
Add both ops to per_block_encoding so get_quant_encoding_conf routes them through make_qnn_per_block_config.

backends/qualcomm/partition/utils.py

Add both ops to get_skip_decomp_table so they are preserved during export and reach the backend unchanged.

backends/qualcomm/_passes/utils.py

Remap "scales" → QCOM_SCALE in get_quant_attrs, parallel to the existing "input_dtype" → QCOM_DTYPE remap for torchao ops, so AnnotateQuantAttrs correctly propagates the scale tensor for per-channel-group nodes.

backends/qualcomm/_passes/qnn_pass_manager.py

Dynamically register both ops into node_visitor.q_ops / dq_ops in get_to_edge_transform_passes, following the same pattern used for torchao ops.

backends/qualcomm/_passes/insert_io_qdq.py

Add per-channel-group entries to q_dq_map to fix the KeyError.

Testing

Unit test added to backends/qualcomm/tests/test_passes.py:

python -m pytest backends/qualcomm/tests/test_passes.py::TestPasses::test_insert_io_qdq_per_channel_group_no_key_error -v

The existing InsertIOQDQ tests also continue to pass:

python -m pytest backends/qualcomm/tests/test_passes.py::TestPasses::test_insert_io_qdq_handles_dequant_encoding backends/qualcomm/tests/test_passes.py::TestPasses::test_insert_io_qdq_no_revisit -v

The full QNN export pipeline (annotation → fold QDQ → insert IO QDQ → layout transform → partition) completes without errors for a quantized linear model. End-to-end execution on device was not verified in this environment.

quantized_decomposed.quantize_per_channel_group and dequantize_per_channel_group are used for LLM weight-only quantization (e.g. int4 group-wise) but were not recognized by the QNN backend, causing the ops to be decomposed or failing with a KeyError in InsertIOQDQ. Five files are changed: - builders/node_visitor.py: add both ops to q_ops/dq_ops and to the per_block_encoding set so get_quant_encoding_conf routes them through make_qnn_per_block_config, matching the existing torchao affine path. - partition/utils.py: add both ops to get_skip_decomp_table so they are preserved as-is during torch.export and reach the backend. - _passes/utils.py: remap "scales" -> QCOM_SCALE in get_quant_attrs, parallel to the existing "input_dtype" -> QCOM_DTYPE remap, so AnnotateQuantAttrs correctly propagates the scale tensor for per-channel-group nodes. - _passes/qnn_pass_manager.py: dynamically register both ops into node_visitor.q_ops/dq_ops in get_to_edge_transform_passes, following the same pattern used for torchao ops. - _passes/insert_io_qdq.py: add per-channel-group entries to q_dq_map to fix KeyError when a pre-quantized weight node with dequantize_per_channel_group encoding feeds the graph output. A unit test is added to test_passes.py that injects per_channel_group quant attrs onto a node feeding the output and verifies InsertIOQDQ completes without KeyError.

pytorch-bot · 2026-05-18T08:48:25Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19629

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 2 Active SEVs

There are 2 currently active SEVs. If your PR is affected, please view them below:

⚠️ 11 Awaiting Approval

As of commit 79a29da with merge base 824cbff ():

AWAITING APPROVAL - The following workflows need approval before CI can run:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Hyungkeun-Park-Nota · 2026-05-18T08:57:50Z

@pytorchbot label "release notes: qualcomm"

Hyungkeun-Park-Nota requested review from abhinaykukkadapu and psiddh as code owners May 18, 2026 08:48

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 18, 2026

pytorch-bot Bot added the release notes: qualcomm Changes to the Qualcomm backend delegate label May 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add quantize/dequantize_per_channel_group support to QNN backend#19629

Add quantize/dequantize_per_channel_group support to QNN backend#19629
Hyungkeun-Park-Nota wants to merge 1 commit into
pytorch:mainfrom
Hyungkeun-Park-Nota:feat/qnn-mps-per-channel-group-quantization

Hyungkeun-Park-Nota commented May 18, 2026

Uh oh!

pytorch-bot Bot commented May 18, 2026 •

edited

Loading

Uh oh!

Hyungkeun-Park-Nota commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Hyungkeun-Park-Nota commented May 18, 2026

Motivation

Changes

Testing

Uh oh!

pytorch-bot Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19629

❗ 2 Active SEVs

⚠️ 11 Awaiting Approval

Uh oh!

Hyungkeun-Park-Nota commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pytorch-bot Bot commented May 18, 2026 •

edited

Loading