Skip to content

fix(opt): systematic AAPCS-clobber audit — sweep all hardcoded R0..R3 usages#108

Open
avrabe wants to merge 4 commits into
mainfrom
fix/systematic-aapcs-audit
Open

fix(opt): systematic AAPCS-clobber audit — sweep all hardcoded R0..R3 usages#108
avrabe wants to merge 4 commits into
mainfrom
fix/systematic-aapcs-audit

Conversation

@avrabe
Copy link
Copy Markdown
Contributor

@avrabe avrabe commented May 12, 2026

Systematic audit of the recurring "hardcoded R0..R3" bug class that's been
finding new instances each rebase (PR #100's fuzz harness, PR #101's
defensive panic). Fixes four categories — root cause, not just symptoms.

Refs: #93, #94, #100, #101, #103, #104, #107.
Unblocks: PR #100 (fuzz harness), PR #101 (defensive panic).

Categories audited

1. wasm_to_ir slot accounting (optimizer_bridge.rs)

15 i64 arithmetic ops (I64Add/Sub/And/Or/Xor/Mul/Div{S,U}/Rem{S,U}/Shl/ ShrS/ShrU/Rotl/Rotr) plus 3 sign-extends (I64Extend{8,16,32}S)
advanced inst_id by 1 via the wildcard fall-through. They produce an
i64 result occupying 2 vreg slots, so the next op's inst_id - 2
lookup landed inside the previous op's dest_hi. With the issue-#94
fast path enabled, the I32WrapI64 after an I64ShrU ended up
referencing the I64Const's killed dest_hi, panicking via PR #101's
defensive get_arm_reg. Fixed: explicit inst_id += 2; continue;
for each.

2. wasm_to_ir op-handler gaps (optimizer_bridge.rs + synth-opt)

Ten value-producing wasm ops fell through _ => Opcode::Nop,
leaving their result vreg unmapped and triggering PR-101's panic for
any downstream consumer:

  • I32Load8S/U, I32Load16S/U, I32Store8, I32Store16
  • GlobalGet, GlobalSet
  • MemorySize, MemoryGrow

Fixed: added Opcode::{MemLoadSubword, MemStoreSubword, GlobalGet, GlobalSet, MemorySize, MemoryGrow} to synth-opt, matching arms in
wasm_to_ir, and ir_to_arm lowering using the existing
alloc_*_scratch/alloc_i64_pair helpers.

3. CSE alias invalidation on re-definition (synth-opt)

CSE-killed Loads recorded reg_map[killed_dest] = original, but the
optimizer's IR-convention aliasing (e.g. I64ExtendI32U.dest_lo == src) re-defines that vreg. Without flushing reg_map[dest] at the
re-definition site, downstream consumers resolved back to the stale
pre-extend value — sometimes a live AAPCS param.

CSE also lacked a resolve() call for the src vregs of all 29 i64
opcodes (Load/Const/arith/comp/shift/extend/wrap). Fixed: explicit
arms with resolve() for every i64 op; explicit dests_to_invalidate
list cleared post-instruction.

4. Hardcoded R0..R3 dest in ir_to_arm (optimizer_bridge.rs)

20 Opcode::* handlers picked the destination by let rd = Reg::R0;
or Reg::R3; (a couple used Reg::R7 for the SetCond-16-bit-Thumb
constraint). With 4 AAPCS params live, every such op clobbered the 4th
arg before the first downstream local.get could read it.

Operations fixed: Add, Sub, Mul, DivS/U, RemS/U, And, Or,
Xor, Shl, ShrS/U, Rotr, Rotl, Clz, Ctz, Popcnt,
Extend8S, Extend16S, Eqz, all 10 i32 comparisons, Select, Copy,
MemLoad. Introduced alloc_i32_scratch (the single-reg counterpart
to alloc_i64_pair) — searches [R4..R8], skipping live vregs,
non-param locals, and param_reserved_regs. Sources are passed in
extra_avoid so the dest doesn't shadow a still-live source.

5. select_with_stack i64 sub-word loads (instruction_selector.rs)

I64Load{8,16,32}{S,U} hardcoded dst_lo = Reg::R0; dst_hi = Reg::R1;,
clobbering AAPCS params 0 and 1 even when neither was the address.
Fixed: alloc_consecutive_pair(&[addr]).

Counts

  • Fixes applied: 4 commits, 47 op-handler / IR-emission sites
    converted from hardcoded R0..R3 to safe-alloc helpers, plus 2
    structural fixes (slot accounting, CSE alias invalidation).
  • New opcodes: 6 (MemLoadSubword, MemStoreSubword,
    GlobalGet, GlobalSet, MemorySize, MemoryGrow).
  • Regression tests added: 27 across 4 new test files
    • audit_subword_memops.rs (12)
    • audit_aapcs_repro.rs (5)
    • audit_optimized_aapcs.rs (7)
    • audit_i64_subword_aapcs.rs (3)

Verification

  • cargo test --workspace — all green (previously 1 failure:
    test_issue94_hi32_extract_is_smaller_than_generic_shift
    panicked via PR-101's defensive get_arm_reg; now passes).
  • cargo clippy --workspace --all-targets -- -D warnings
    clean (excluding the z3-sys submodule-init issue, which is
    unrelated and pre-existing on this sandbox).
  • cargo fmt --check — clean.
  • cd fuzz && cargo fuzz run i64_lowering_doesnt_clobber_params -- -max_total_time=120 — not runnable in this sandbox (cargo-fuzz
    requires nightly + libfuzzer ASan). Will be exercised on
    merge by PR feat(fuzz): cargo-fuzz harnesses for ARM instruction selection (#82) #100's fuzz-smoke CI.

Test plan

🤖 Generated with Claude Code

avrabe and others added 4 commits May 12, 2026 20:16
Two correctness bugs in the optimizer's wasm→IR→ARM pipeline that produced
silent miscompilation and (post PR #101) defensive panics.

1. **i64 arithmetic slot accounting**: `wasm_to_ir` advanced `inst_id` by 1
   (via the wildcard fallthrough) for `I64Add/Sub/And/Or/Xor/Mul/Div{S,U}/
   Rem{S,U}/Shl/ShrS/ShrU/Rotl/Rotr` and for `I64Extend{8,16,32}S`. These ops
   produce an i64 result occupying 2 vreg slots, so the next op's
   `inst_id.saturating_sub(2)` lookup landed inside the *previous* op's
   `dest_hi` instead of its `dest_lo`. Triggers the issue-94 fast path
   regression (`test_issue94_hi32_extract_is_smaller_than_generic_shift`)
   because `I32WrapI64` after `I64ShrU` resolved its `src_lo` to an
   unmapped vreg.

2. **CSE alias invalidation on redefinition**: when CSE eliminates a
   duplicate `Opcode::Load` (local.get), it records `reg_map[killed] =
   original`. The subsequent `I64ExtendI32U`/`I64ExtendI32S`/`I32WrapI64`
   re-defines the same vreg by IR convention (`dest_lo == src`). Without
   clearing `reg_map[killed]` at the re-definition site, downstream
   consumers would resolve back to the *pre-extend* value — often a live
   AAPCS param register. Same class as #93 (silent R0 fallback) and #104
   (MemLoad/MemStore alias gap).

Also extends CSE to resolve src vregs for every i64 opcode (was a `_ => {}`
fallback) so the rewrite is consistent across the IR.

Refs: #93, #94, #103, #104, #107
Unblocks: #100 (fuzz harness), #101 (defensive panic).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… memory.{size,grow}

A second-wave audit of the optimizer-path AAPCS-clobber bug class. After
fixing the i64 slot accounting (prev commit), the next-most-common gap is
ops that `wasm_to_ir` silently maps to `Opcode::Nop`.

## Findings

`wasm_to_ir` ALSO lacked handlers for every value-producing op below.
Each fell through `_ => Opcode::Nop`, consumed an `inst_id` slot, and
left its produced vreg unmapped. Downstream consumers triggered PR-101's
defensive panic (or, pre-PR-101, returned `Reg::R0` silently — the
issue-#93 / #103 clobber class):

  * `I32Load8S/U`, `I32Load16S/U` — sub-word memory loads
  * `I32Store8`, `I32Store16` — sub-word memory stores
  * `GlobalGet`, `GlobalSet` — module-level globals
  * `MemorySize`, `MemoryGrow` — memory introspection / mutation

Separately, `Opcode::MemLoad`'s `ir_to_arm` handler **hardcoded the dest
register to `Reg::R3`**, clobbering the 4th AAPCS argument on every
i32.load that ran while R3 was a live param. The fuzz harness in PR
#100 hits this every rebase but the previous round of patches missed
it because the bug only surfaces under the optimizer (not select_with_stack).

## Fixes

  * Added `Opcode::{MemLoadSubword,MemStoreSubword,GlobalGet,GlobalSet,
    MemorySize,MemoryGrow}` to `synth_opt::Opcode`.
  * Added matching arms in `wasm_to_ir` (with comments documenting the
    pre-fix silent-Nop bug).
  * Added `ir_to_arm` handlers that emit the right ARM ops (LDRB/LDRH/
    LDRSB/LDRSH, STRB/STRH, LDR/STR via R9 base, MOV from R10, MOVW
    `#-1` stub for `memory.grow` on fixed-memory targets).
  * Introduced `alloc_i32_scratch` — the i32 counterpart to
    `alloc_i64_pair`. Searches `[R4..R8]`, skipping live vregs,
    non-param locals, and the param-reserved set; falls back to R12
    under pressure.
  * Replaced the hardcoded `Reg::R3` in `Opcode::MemLoad` with a call
    to `alloc_i32_scratch`. Both `MemLoad` and the new sub-word
    variants now pick a non-AAPCS-param destination by construction.
  * Extended CSE to resolve src vregs for the new opcodes and to track
    their dests in the redefinition-invalidation set added in the prev
    commit.

## Tests

  * New `audit_subword_memops.rs` — covers all 12 of the previously-
    unmapped op shapes (pure ops + arith consumers + stores), each of
    which used to panic the defensive `get_arm_reg`.
  * New `audit_aapcs_repro.rs` — generic AAPCS-no-clobber harness for
    the `select_with_stack` path. The fuzz crash signature
    (`Mov{rd:R1,op2:Reg(R2)}` before LocalGet(1)) didn't reproduce
    against current code; these tests pin down nearby shapes so the
    fuzz can't silently regress past them.

Refs: #93, #100, #101, #103, #104
Unblocks: #100, #101.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…dest

Third-wave audit: `ir_to_arm` had 19 `Opcode::*` handlers that picked the
destination register by a hardcoded `let rd = Reg::R0;` or `Reg::R3;`
(plus a few `Reg::R7`). With 4 AAPCS params live (`num_params=4`,
common in C ABI functions), every such op clobbered the 4th param
register before the first downstream `local.get` could read it.

Operations fixed: `Add`, `Sub`, `Mul`, `DivS`, `DivU`, `RemS`, `RemU`,
`And`, `Or`, `Xor`, `Shl`, `ShrS`, `ShrU`, `Rotr`, `Rotl`, `Clz`, `Ctz`,
`Popcnt`, `Extend8S`, `Extend16S`, `Eqz`, all 10 i32 comparisons, and
`Select`. Each now routes the destination through the new
`alloc_i32_scratch` helper, passing operand registers in `extra_avoid`
so the dest never collides with a still-live src.

The bug was previously masked from `select_with_stack`-targeted tests
(issue #103) because the optimizer-path is a separate code path. New
regression tests (`audit_optimized_aapcs.rs`) cover 7 representative
shapes: i32 add/sub/mul, i32.load, i32.load8_u, global.get, memory.size
— all with `num_params=4` so a clobbered R3 fires the assert.

Refs: #100, #101, #103.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two more hardcoded-AAPCS-param-reg destinations found during the audit:

1. `select_with_stack` for `I64Load{8,16,32}{S,U}` hardcoded `dst_lo =
   Reg::R0; dst_hi = Reg::R1`. Clobbered AAPCS params 0 and 1 on every
   i64 sub-word load, regardless of `num_params`. Replaced with
   `alloc_consecutive_pair(&[addr])` so the destination is callee-saved.

2. `ir_to_arm::Opcode::Copy` (the optimizer's local.tee implementation)
   hardcoded `rd = Reg::R0`, clobbering param 0 on every local.tee in
   the optimized path. Routed through `alloc_i32_scratch`.

Added regression test `audit_i64_subword_aapcs.rs` for the i64 sub-word
class (3 representative shapes — load8_u, load16_s, load32_u — each
with 2 params and the address coming from `i32.const`, so the params
are uninvolved and a clobber is unambiguous).

Refs: #100, #103.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

❌ Patch coverage is 78.14885% with 229 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
crates/synth-opt/src/lib.rs 67.33% 197 Missing ⚠️
crates/synth-synthesis/src/optimizer_bridge.rs 92.79% 32 Missing ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant