Skip to content

Add quantize/dequantize_per_channel_group support to QNN backend#19629

Open
Hyungkeun-Park-Nota wants to merge 1 commit into
pytorch:mainfrom
Hyungkeun-Park-Nota:feat/qnn-mps-per-channel-group-quantization
Open

Add quantize/dequantize_per_channel_group support to QNN backend#19629
Hyungkeun-Park-Nota wants to merge 1 commit into
pytorch:mainfrom
Hyungkeun-Park-Nota:feat/qnn-mps-per-channel-group-quantization

Conversation

@Hyungkeun-Park-Nota
Copy link
Copy Markdown
Contributor

Motivation

quantized_decomposed.quantize_per_channel_group and dequantize_per_channel_group are used for LLM weight-only quantization (e.g. int4 group-wise / per-channel-group). The QNN backend did not recognize these ops, causing two distinct failures:

  1. The ops were decomposed away during torch.export instead of being preserved for the backend.
  2. InsertIOQDQ raised a KeyError when a pre-quantized weight node annotated with dequantize_per_channel_group encoding fed the graph output, because the op was absent from q_dq_map.

This change adds the same treatment that torchao.quantize_affine / dequantize_affine already receive, applying it to the quantized_decomposed per-channel-group variants.

Changes

backends/qualcomm/builders/node_visitor.py

  • Add both ops to q_ops / dq_ops sets.
  • Add both ops to per_block_encoding so get_quant_encoding_conf routes them through make_qnn_per_block_config.

backends/qualcomm/partition/utils.py

  • Add both ops to get_skip_decomp_table so they are preserved during export and reach the backend unchanged.

backends/qualcomm/_passes/utils.py

  • Remap "scales"QCOM_SCALE in get_quant_attrs, parallel to the existing "input_dtype"QCOM_DTYPE remap for torchao ops, so AnnotateQuantAttrs correctly propagates the scale tensor for per-channel-group nodes.

backends/qualcomm/_passes/qnn_pass_manager.py

  • Dynamically register both ops into node_visitor.q_ops / dq_ops in get_to_edge_transform_passes, following the same pattern used for torchao ops.

backends/qualcomm/_passes/insert_io_qdq.py

  • Add per-channel-group entries to q_dq_map to fix the KeyError.

Testing

Unit test added to backends/qualcomm/tests/test_passes.py:

python -m pytest backends/qualcomm/tests/test_passes.py::TestPasses::test_insert_io_qdq_per_channel_group_no_key_error -v

The existing InsertIOQDQ tests also continue to pass:

python -m pytest backends/qualcomm/tests/test_passes.py::TestPasses::test_insert_io_qdq_handles_dequant_encoding backends/qualcomm/tests/test_passes.py::TestPasses::test_insert_io_qdq_no_revisit -v

The full QNN export pipeline (annotation → fold QDQ → insert IO QDQ → layout transform → partition) completes without errors for a quantized linear model. End-to-end execution on device was not verified in this environment.

quantized_decomposed.quantize_per_channel_group and
dequantize_per_channel_group are used for LLM weight-only quantization
(e.g. int4 group-wise) but were not recognized by the QNN backend,
causing the ops to be decomposed or failing with a KeyError in
InsertIOQDQ.

Five files are changed:

- builders/node_visitor.py: add both ops to q_ops/dq_ops and to the
  per_block_encoding set so get_quant_encoding_conf routes them through
  make_qnn_per_block_config, matching the existing torchao affine path.

- partition/utils.py: add both ops to get_skip_decomp_table so they
  are preserved as-is during torch.export and reach the backend.

- _passes/utils.py: remap "scales" -> QCOM_SCALE in get_quant_attrs,
  parallel to the existing "input_dtype" -> QCOM_DTYPE remap, so
  AnnotateQuantAttrs correctly propagates the scale tensor for
  per-channel-group nodes.

- _passes/qnn_pass_manager.py: dynamically register both ops into
  node_visitor.q_ops/dq_ops in get_to_edge_transform_passes, following
  the same pattern used for torchao ops.

- _passes/insert_io_qdq.py: add per-channel-group entries to q_dq_map
  to fix KeyError when a pre-quantized weight node with
  dequantize_per_channel_group encoding feeds the graph output.

A unit test is added to test_passes.py that injects per_channel_group
quant attrs onto a node feeding the output and verifies InsertIOQDQ
completes without KeyError.
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented May 18, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19629

Note: Links to docs will display an error until the docs builds have been completed.

❗ 2 Active SEVs

There are 2 currently active SEVs. If your PR is affected, please view them below:

⚠️ 11 Awaiting Approval

As of commit 79a29da with merge base 824cbff (image):

AWAITING APPROVAL - The following workflows need approval before CI can run:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 18, 2026
@Hyungkeun-Park-Nota
Copy link
Copy Markdown
Contributor Author

@pytorchbot label "release notes: qualcomm"

@pytorch-bot pytorch-bot Bot added the release notes: qualcomm Changes to the Qualcomm backend delegate label May 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. release notes: qualcomm Changes to the Qualcomm backend delegate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants