Skip to content

Fix A5 tmov treshape valid shape#687

Merged
zhangstevenunity merged 2 commits into
mainfrom
codex/issue686-a5-tmov-treshape-valid-shape
May 21, 2026
Merged

Fix A5 tmov treshape valid shape#687
zhangstevenunity merged 2 commits into
mainfrom
codex/issue686-a5-tmov-treshape-valid-shape

Conversation

@zhangstevenunity
Copy link
Copy Markdown
Collaborator

@zhangstevenunity zhangstevenunity commented May 20, 2026

Summary

  • add pto.get_validshape to read runtime tile valid_row/valid_col metadata, with EmitC lowering to tile.GetValidRow() / tile.GetValidCol()
  • normalize risky A5 vec-to-vec col_major TMOV by emitting row-major treshape in the producer pass
  • set auto-inserted treshape valid shape from runtime get_validshape, swapping col/row without compile-time valid-dim lookup
  • preserve treshape alias semantics through tile-handle materialization so generated code declares the row-major tile, calls TRESHAPE, then SetValidShape
  • preserve user-authored explicit treshape valid-shape semantics and cover dynamic, square, handwritten static-valid, and get_validshape cases

Testing

  • ninja -C build-wsl-pr567-verify tools/ptoas/ptoas
  • python3 /usr/lib/llvm-18/build/utils/lit/lit.py -a build-wsl-pr567-verify/test/lit/pto/get_validshape_emitc.pto build-wsl-pr567-verify/test/lit/pto/issue686_a5_tmov_treshape_dynamic_valid_shape.pto build-wsl-pr567-verify/test/lit/pto/issue686_a5_tmov_treshape_square_dynamic_valid_shape.pto build-wsl-pr567-verify/test/lit/pto/treshape_explicit_dynamic_valid_shape_preserved.pto build-wsl-pr567-verify/test/lit/pto/treshape_static_valid_shape_emitc.pto
  • ptoas/FileCheck for issue660_trowexpandmul_set_validshape_preserves_alloc_valid.pto
  • ptoas/FileCheck for tpush_tpop_dynamic_validshape_default_a5.pto --check-prefix=A5
  • git diff --check

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to correctly handle dynamic valid shapes during the normalization of TMOV operations for the A5 architecture. It implements a new internal attribute, __pto.a5_tmov_normalize_treshape, to track TReshapeOp instances and ensures that dynamic valid dimensions are properly materialized or swapped during layout transitions. These changes are integrated into the PTOA5NormalizeTMovPass, PTOMaterializeTileHandles, and PTOViewToMemref passes, supported by new test cases. I have no feedback to provide.

@zhangstevenunity zhangstevenunity force-pushed the codex/issue686-a5-tmov-treshape-valid-shape branch from 353c46f to 9569f9b Compare May 20, 2026 02:32
@reedhecre
Copy link
Copy Markdown

reedhecre commented May 20, 2026

Codex Review

该评论由 review 机器人自动更新。

  • PR: Fix A5 tmov treshape valid shape #687 Fix A5 tmov treshape valid shape
  • Author: zhangstevenunity
  • Base/Head: main / codex/issue686-a5-tmov-treshape-valid-shape
  • Head SHA: bb24702d2765
  • Trigger: PR 有新提交
  • Generated At: 2026-05-21T06:35:21Z
  • Previous Head SHA: 2971d21f361a
  • Status: completed

Summary

PR #687 新增了 pto.get_validshape,但没有同步 ptobc 的 opcode/schema,存在明确的 bytecode 兼容性回归。

Findings

  1. P2 新引入的 `pto.get_validshape` 没有接入 ptobc,`--emit-pto-ir` 产物将无法再被编码/round-trip include/PTO/IR/PTOOps.td:373

这里把 pto.get_validshape 作为新 public op 引入了,而且 A5 TMOV 规范化也会自动生成它。问题是 tools/ptobc/generated/ptobc_opcodes_v0.h 里仍然没有这个 op 的条目;tools/ptobc/src/mlir_encode.cpp 对不在 v0 opcode table 里的 op 会直接报错(除非显式打开 PTOBC_ALLOW_GENERIC)。这会把受影响 kernel 的 ptoas --emit-pto-ir 输出变成 ptobc 无法编码/round-trip 的 IR,属于明确的兼容性回归。

@zhangstevenunity zhangstevenunity force-pushed the codex/issue686-a5-tmov-treshape-valid-shape branch 5 times, most recently from 45618d5 to 875eac4 Compare May 20, 2026 07:50
@zhangstevenunity zhangstevenunity force-pushed the codex/issue686-a5-tmov-treshape-valid-shape branch from 875eac4 to 2971d21 Compare May 20, 2026 08:34
@zhangstevenunity zhangstevenunity marked this pull request as ready for review May 20, 2026 08:49
@zhangstevenunity
Copy link
Copy Markdown
Collaborator Author

/run all

@reedhecre
Copy link
Copy Markdown

已接收 /run all,A3 板测器会处理这条请求。

页面会自动刷新,可以直接看当前阶段、排队情况和最近结果。

@reedhecre
Copy link
Copy Markdown

A3 板测完成(有跳过)

  • 触发方式:manual
  • 源码提交:9e2dbc5be4e7
  • 结果汇总:OK 212 / FAIL 0 / SKIP 2
  • 日志:/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260520_173705_manual_pr687.log
  • 结果 TSV:/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260520_173705_manual_pr687.tsv
  • 手动指令:/run all
  • 触发人:zhangstevenunity
  • 触发评论:Fix A5 tmov treshape valid shape #687 (comment)

@zhangstevenunity zhangstevenunity merged commit e7e2de7 into main May 21, 2026
9 of 10 checks passed
@reedhecre
Copy link
Copy Markdown

A5 板测成功

  • 触发方式:merged
  • 源码提交:e7e2de790650
  • 结果汇总:OK 14 / FAIL 0 / SKIP 0
  • 日志:/root/ptoas-board-monitor-a5/logs/20260521_142505_merged_pr687.log
  • 结果 TSV:/root/ptoas-board-monitor-a5/logs/20260521_142505_merged_pr687.tsv

@reedhecre
Copy link
Copy Markdown

A3 板测失败

  • 触发方式:merged
  • 源码提交:e7e2de790650
  • 结果汇总:OK 205 / FAIL 4 / SKIP 2
  • 日志:/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260521_150804_merged_pr687.log
  • 失败阶段:board-validation / exit=1

失败用例

  • rope_kv_cache (run, exit=2)
  • down_proj_residual (run, exit=1)
  • out_proj_residual (run, exit=1)
  • qwen3_decode_incore_5 (run, exit=1)

@reedhecre
Copy link
Copy Markdown

A3 板测失败详情:PR #687

rope_kv_cache

stage=run info=exit=2

[ERROR] Mismatch (bf16 golden_v1.bin vs v1.bin): max ulp diff=30463 at idx=26720 (golden_bits=48012, out_bits=15218, golden=-0.0042724609375, out=0.003692626953125)
[ERROR] compare failed
[2026-05-21 15:30:55] ERROR: testcase failed (exit 2): rope_kv_cache
down_proj_residual

stage=run info=exit=1

Traceback (most recent call last):
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260521_150804_merged_pr687/npu_validation/Qwen3DecodeA3/down_proj_residual/./golden.py", line 14, in <module>
    run_case('down_proj_residual')
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260521_150804_merged_pr687/npu_validation/Qwen3DecodeA3/down_proj_residual/qwen3_decode_golden_lib.py", line 568, in run_case
    buffers, golden = BUILDERS[case_name](meta, generator, ints)
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260521_150804_merged_pr687/npu_validation/Qwen3DecodeA3/down_proj_residual/qwen3_decode_golden_lib.py", line 530, in build_down_proj_residual
    resid = load_strided_2d(
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260521_150804_merged_pr687/npu_validation/Qwen3DecodeA3/down_proj_residual/validation_runtime.py", line 118, in load_strided_2d
    raise ValueError(f'strided load out of bounds: [{start}:{stop}] > {flat.size}')
ValueError: strided load out of bounds: [57600:57856] > 57600
[2026-05-21 15:31:09] ERROR: testcase failed (exit 1): down_proj_residual
out_proj_residual

stage=run info=exit=1

Traceback (most recent call last):
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260521_150804_merged_pr687/npu_validation/Qwen3DecodeA3/out_proj_residual/./golden.py", line 14, in <module>
    run_case('out_proj_residual')
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260521_150804_merged_pr687/npu_validation/Qwen3DecodeA3/out_proj_residual/qwen3_decode_golden_lib.py", line 568, in run_case
    buffers, golden = BUILDERS[case_name](meta, generator, ints)
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260521_150804_merged_pr687/npu_validation/Qwen3DecodeA3/out_proj_residual/qwen3_decode_golden_lib.py", line 418, in build_out_proj_residual
    load_strided_2d(
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260521_150804_merged_pr687/npu_validation/Qwen3DecodeA3/out_proj_residual/validation_runtime.py", line 118, in load_strided_2d
    raise ValueError(f'strided load out of bounds: [{start}:{stop}] > {flat.size}')
ValueError: strided load out of bounds: [57600:57856] > 57600
[2026-05-21 15:31:13] ERROR: testcase failed (exit 1): out_proj_residual
qwen3_decode_incore_5

stage=run info=exit=1

Traceback (most recent call last):
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260521_150804_merged_pr687/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_5/./golden.py", line 14, in <module>
    run_case('qwen3_decode_incore_5')
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260521_150804_merged_pr687/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_5/qwen3_decode_golden_lib.py", line 568, in run_case
    buffers, golden = BUILDERS[case_name](meta, generator, ints)
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260521_150804_merged_pr687/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_5/qwen3_decode_golden_lib.py", line 287, in build_softmax
    scores_valid = load_strided_2d(buffers["v4"], offset=in_offset, rows=Q_HEAD_BATCH, cols=SEQ_TILE, row_stride=SEQ_TILE).astype(np.float32)
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260521_150804_merged_pr687/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_5/validation_runtime.py", line 118, in load_strided_2d
    raise ValueError(f'strided load out of bounds: [{start}:{stop}] > {flat.size}')
ValueError: strided load out of bounds: [198400:198656] > 198401
[2026-05-21 15:31:57] ERROR: testcase failed (exit 1): qwen3_decode_incore_5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants