Skip to content

fix(opt): I32WrapI64 must not preassign R0 — defer to function-return epilogue#111

Merged
avrabe merged 1 commit into
mainfrom
fix/i32-wrap-r0-clobber
May 14, 2026
Merged

fix(opt): I32WrapI64 must not preassign R0 — defer to function-return epilogue#111
avrabe merged 1 commit into
mainfrom
fix/i32-wrap-r0-clobber

Conversation

@avrabe
Copy link
Copy Markdown
Contributor

@avrabe avrabe commented May 14, 2026

Summary

PR #100's i64_lowering_doesnt_clobber_params cargo-fuzz harness flagged an AAPCS-param clobber with this signature:

AAPCS clobber: ARM instr at wasm line 1 writes param reg R0 before
LocalGet(0) at line 3.
Op: Mov { rd: R0, op2: Reg(R3) }
Sequence: [Push, I64Const{rdlo:R3,rdhi:R4,value:0}, Mov{rd:R0,op2:Reg(R3)}, Pop]

Trigger pattern (from the task description):

i64.const 0       // I64Const emitted into r3:r4
i32.wrap_i64      // pre-fix: hardcoded R0 as destination when idx==last
local.get 0       // would read R0, but R0 was clobbered

Root cause: instruction_selector::select_with_stack's I32WrapI64 handler picked Reg::R0 as its destination when idx == wasm_ops.len() - 1. That heuristic conflates lexically last op in the slice with function-return boundary, and the fuzz-driven program shapes from PR #100 (minimal [I64Const, I32WrapI64] bodies) hit the shortcut while still leaving an AAPCS-param register clobbered.

Fix: I32WrapI64 now always uses alloc_temp_safe. A single new Mov R0, <top-of-stack> in the function epilogue handles the AAPCS return-value placement, emitted only if the last expression result isn't already in R0. The epilogue is now the one and only site that pins R0 from the no-optimize path.

Mov-to-R0 callsite audit

instruction_selector.rs (no-optimize path):

Location Category Status
Return handler return boundary correct
Call (Mov R0, imm) call setup (arg passing) correct
CallIndirect (rd=R0) call setup (arg passing) correct
LocalSet(0) user-level write to R0 correct
LocalTee(0) user-level write to R0 correct
I32WrapI64 idx==last intermediate-op pin fixed
Arithmetic ops idx==last return-value pin correct
New epilogue Mov R0 return boundary added

optimizer_bridge.rs (optimized path):

All 5 hardcoded rd: Reg::R0 sites (1 in Opcode::Return, 4 in the epilogue's i64/i32 return-value moves) are at the function-return boundary. No changes needed.

I64ExtendI32U/S already used alloc_consecutive_pair with no R0 pin, so they were already correct. Arithmetic ops' idx==len-1 → R0 branches are genuine return-value placements (a fuzz-style trailing Drop, LocalGet, Drop would push the op's idx below len-1), so they don't need this treatment.

Note on the literal fuzz crash

The reported Mov { rd: R0, op2: Reg(R3) } op most plausibly originates from the LocalSet(0) handler — local.set p for p < num_params does write R{p}, which is the semantically correct behavior for that wasm op. The fuzz harness's strict "no R{p} write before LocalGet(p)" invariant is over-broad on that case. The fix in this PR addresses the parallel intermediate-op-pin bug the harness was designed to catch (which is the user-stated intent of this PR), and the LocalSet/LocalTee classification is documented in the audit table above.

Test plan

  • cargo test --workspace — green
  • cargo clippy --workspace --all-targets -- -D warnings — clean
  • cargo fmt --check — clean
  • New regression test crates/synth-synthesis/tests/i32_wrap_r0_clobber_reproduction.rs:
    • i32_wrap_i64_as_last_op_routes_through_alloc_temp_safe — fails-before-fix, passes-after
    • 3 additional companion tests pass

🤖 Generated with Claude Code

… epilogue

PR #100's `i64_lowering_doesnt_clobber_params` cargo-fuzz harness surfaced
an AAPCS-param clobber matching this signature:

```text
AAPCS clobber: ARM instr at wasm line 1 writes param reg R0 before
LocalGet(0) at line 3.
Op: Mov { rd: R0, op2: Reg(R3) }
Sequence: [Push, I64Const{rdlo:R3,rdhi:R4,value:0}, Mov{rd:R0,op2:Reg(R3)}, Pop]
```

Trigger pattern:

```
i64.const 0       // I64Const emitted into r3:r4 (next_temp starts at
                  // num_params=3, alloc_consecutive_pair returns R3:R4)
i32.wrap_i64      // pre-fix: hardcoded R0 as destination when this
                  // was the last wasm op handed to select_with_stack,
                  // even when the fuzz wrapper still had a LocalGet(0)
                  // ahead in the *user-visible* function body
local.get 0       // would read R0, but R0 was clobbered
```

Root cause: in `instruction_selector::select_with_stack`, the I32WrapI64
case picked `Reg::R0` as the destination when
`idx == wasm_ops.len() - 1`. That heuristic conflated "lexically last op
in the slice" with "function-return boundary", which the fuzz harness's
program-shape no longer respects — the harness wraps the user's `middle`
in `[I64Const, ..., Drop, LocalGet(p), Drop]` test scaffolding, but the
shortcut still fires for minimal `[I64Const; I32WrapI64]` bodies whose
return value goes through an i64 pair landing on an AAPCS-param reg.

Fix: I32WrapI64 now always uses `alloc_temp_safe`. The function epilogue
emits a single `Mov R0, <top-of-stack>` only if the last expression's
result isn't already in R0. This makes the epilogue the one and only
site that pins R0 (it always was already for the
`is_i64_result`/`!is_i64_result` paths in the optimizer bridge; the
no-optimize path now matches).

Mov-to-R0 callsite audit (instruction_selector.rs `select_with_stack`):

| Location                | Category                | Status   |
|-------------------------|-------------------------|----------|
| `Return` handler        | return boundary         | correct  |
| `Call` (Mov R0, imm)    | call setup (arg pass)   | correct  |
| `CallIndirect` (rd=R0)  | call setup (arg pass)   | correct  |
| `LocalSet(0)`           | user-level write to R0  | correct  |
|                         | (param-0 local IS R0)   |          |
| `LocalTee(0)`           | same as LocalSet(0)     | correct  |
| `I32WrapI64` `idx==last`| **intermediate-op pin** | **bug**  |
| `I32Add/...` idx==last  | return-value pin        | correct  |
|                         | (idx==last is genuine   |          |
|                         |  for arithmetic ops)    |          |
| New epilogue Mov        | return boundary         | added    |

The I32WrapI64 case is the only "intermediate-op pin" found:
I64ExtendI32U/S already used `alloc_consecutive_pair` (no R0 pin), and
the arithmetic-op `idx==len-1 → R0` branches in I32Add/Sub/etc. are
genuine return-value placements since they CAN'T be followed by a
fuzz-style `Drop, LocalGet, Drop` suffix and still have idx==len-1
(the harness's suffix would push idx beyond len-1).

Mov-to-R0 callsite audit (optimizer_bridge.rs):

| Location                | Category                | Status   |
|-------------------------|-------------------------|----------|
| `Opcode::Return`        | return boundary         | correct  |
| Epilogue (i64/i32 paths)| return boundary         | correct  |

All 5 hardcoded `rd: Reg::R0` callsites in `optimizer_bridge.rs` are at
the function-return boundary. No changes needed there.

Note on the literal fuzz crash signature: the actual `Mov { rd: R0,
op2: Reg(R3) }` ArmOp in the reported sequence most plausibly originates
from the `LocalSet(0)` handler (which writes the param-0 register R0
when the user does `local.set 0`). That is semantically *correct*
behavior — `local.set p` for `p < num_params` IS supposed to overwrite
R{p} — and the fuzz harness's strict "no R{p} write before LocalGet(p)"
invariant is over-broad here. The `I32WrapI64` fix in this PR addresses
the *parallel* intermediate-op-pin bug the harness was designed to
catch, which is the user-stated intent of this PR.

Regression test: `tests/i32_wrap_r0_clobber_reproduction.rs` exercises
the exact reproduction pattern + companion cases for I64ExtendI32U/S.
Fails-before-fix on `i32_wrap_i64_as_last_op_routes_through_alloc_temp_safe`,
passes-after.

Unblocks PR #100's `i64_lowering_doesnt_clobber_params` harness once
the harness's `writes()` helper is widened to flag `ArmOp::I32WrapI64`
writes (it currently doesn't, so the literal pre-fix crash signature
was actually from a separate code path — see commit body).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

avrabe added a commit that referenced this pull request May 14, 2026
PR #100's i64_lowering_doesnt_clobber_params harness was over-broadly
flagging any write to R{p} before LocalGet(p) — including legitimate
writes the wasm program explicitly requested via LocalSet(p)/LocalTee(p).
This produced false-positive crashes that masked the real I32WrapI64
intermediate-op-pin bug (now fixed in #111).

The carve-out only suppresses writes whose source wasm op is exactly
LocalSet(p) or LocalTee(p) writing param p — every other write before
LocalGet(p) is still flagged. The harness retains its full ability to
catch genuine compiler-emitted AAPCS clobbers.
@avrabe avrabe merged commit 04b5a6c into main May 14, 2026
9 checks passed
@avrabe avrabe deleted the fix/i32-wrap-r0-clobber branch May 14, 2026 13:52
avrabe added a commit that referenced this pull request May 14, 2026
PR #100's i64_lowering_doesnt_clobber_params harness was over-broadly
flagging any write to R{p} before LocalGet(p) — including legitimate
writes the wasm program explicitly requested via LocalSet(p)/LocalTee(p).
This produced false-positive crashes that masked the real I32WrapI64
intermediate-op-pin bug (now fixed in #111).

The carve-out only suppresses writes whose source wasm op is exactly
LocalSet(p) or LocalTee(p) writing param p — every other write before
LocalGet(p) is still flagged. The harness retains its full ability to
catch genuine compiler-emitted AAPCS clobbers.
avrabe added a commit that referenced this pull request May 14, 2026
Two harnesses (i64_lowering_doesnt_clobber_params, encoder_no_panic)
keep finding new AAPCS-class bugs at a rate faster than we close them
in a single release cycle. Mark them as non-gating (continue-on-error)
so the harness infrastructure can ship as part of #100 while the
bug-hunting continues in v0.3.x patches.

Promotion criteria for moving a harness from exploration → gating: 60s
smoke runs report zero findings on main for 2 consecutive weeks.

Open bugs tracked: #103 (fixed via #106), #104 (fixed via #107), the
class-wide sweep #108, the WasmOp::Call gap #109/#101, the I32WrapI64
preassign #111, and the new #112 (i64-extend chain + Movw R0).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant