fix(opt): systematic AAPCS-clobber audit — sweep all hardcoded R0..R3 usages#108
Open
avrabe wants to merge 4 commits into
Open
fix(opt): systematic AAPCS-clobber audit — sweep all hardcoded R0..R3 usages#108avrabe wants to merge 4 commits into
avrabe wants to merge 4 commits into
Conversation
Two correctness bugs in the optimizer's wasm→IR→ARM pipeline that produced silent miscompilation and (post PR #101) defensive panics. 1. **i64 arithmetic slot accounting**: `wasm_to_ir` advanced `inst_id` by 1 (via the wildcard fallthrough) for `I64Add/Sub/And/Or/Xor/Mul/Div{S,U}/ Rem{S,U}/Shl/ShrS/ShrU/Rotl/Rotr` and for `I64Extend{8,16,32}S`. These ops produce an i64 result occupying 2 vreg slots, so the next op's `inst_id.saturating_sub(2)` lookup landed inside the *previous* op's `dest_hi` instead of its `dest_lo`. Triggers the issue-94 fast path regression (`test_issue94_hi32_extract_is_smaller_than_generic_shift`) because `I32WrapI64` after `I64ShrU` resolved its `src_lo` to an unmapped vreg. 2. **CSE alias invalidation on redefinition**: when CSE eliminates a duplicate `Opcode::Load` (local.get), it records `reg_map[killed] = original`. The subsequent `I64ExtendI32U`/`I64ExtendI32S`/`I32WrapI64` re-defines the same vreg by IR convention (`dest_lo == src`). Without clearing `reg_map[killed]` at the re-definition site, downstream consumers would resolve back to the *pre-extend* value — often a live AAPCS param register. Same class as #93 (silent R0 fallback) and #104 (MemLoad/MemStore alias gap). Also extends CSE to resolve src vregs for every i64 opcode (was a `_ => {}` fallback) so the rewrite is consistent across the IR. Refs: #93, #94, #103, #104, #107 Unblocks: #100 (fuzz harness), #101 (defensive panic). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… memory.{size,grow}
A second-wave audit of the optimizer-path AAPCS-clobber bug class. After
fixing the i64 slot accounting (prev commit), the next-most-common gap is
ops that `wasm_to_ir` silently maps to `Opcode::Nop`.
## Findings
`wasm_to_ir` ALSO lacked handlers for every value-producing op below.
Each fell through `_ => Opcode::Nop`, consumed an `inst_id` slot, and
left its produced vreg unmapped. Downstream consumers triggered PR-101's
defensive panic (or, pre-PR-101, returned `Reg::R0` silently — the
issue-#93 / #103 clobber class):
* `I32Load8S/U`, `I32Load16S/U` — sub-word memory loads
* `I32Store8`, `I32Store16` — sub-word memory stores
* `GlobalGet`, `GlobalSet` — module-level globals
* `MemorySize`, `MemoryGrow` — memory introspection / mutation
Separately, `Opcode::MemLoad`'s `ir_to_arm` handler **hardcoded the dest
register to `Reg::R3`**, clobbering the 4th AAPCS argument on every
i32.load that ran while R3 was a live param. The fuzz harness in PR
#100 hits this every rebase but the previous round of patches missed
it because the bug only surfaces under the optimizer (not select_with_stack).
## Fixes
* Added `Opcode::{MemLoadSubword,MemStoreSubword,GlobalGet,GlobalSet,
MemorySize,MemoryGrow}` to `synth_opt::Opcode`.
* Added matching arms in `wasm_to_ir` (with comments documenting the
pre-fix silent-Nop bug).
* Added `ir_to_arm` handlers that emit the right ARM ops (LDRB/LDRH/
LDRSB/LDRSH, STRB/STRH, LDR/STR via R9 base, MOV from R10, MOVW
`#-1` stub for `memory.grow` on fixed-memory targets).
* Introduced `alloc_i32_scratch` — the i32 counterpart to
`alloc_i64_pair`. Searches `[R4..R8]`, skipping live vregs,
non-param locals, and the param-reserved set; falls back to R12
under pressure.
* Replaced the hardcoded `Reg::R3` in `Opcode::MemLoad` with a call
to `alloc_i32_scratch`. Both `MemLoad` and the new sub-word
variants now pick a non-AAPCS-param destination by construction.
* Extended CSE to resolve src vregs for the new opcodes and to track
their dests in the redefinition-invalidation set added in the prev
commit.
## Tests
* New `audit_subword_memops.rs` — covers all 12 of the previously-
unmapped op shapes (pure ops + arith consumers + stores), each of
which used to panic the defensive `get_arm_reg`.
* New `audit_aapcs_repro.rs` — generic AAPCS-no-clobber harness for
the `select_with_stack` path. The fuzz crash signature
(`Mov{rd:R1,op2:Reg(R2)}` before LocalGet(1)) didn't reproduce
against current code; these tests pin down nearby shapes so the
fuzz can't silently regress past them.
Refs: #93, #100, #101, #103, #104
Unblocks: #100, #101.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…dest Third-wave audit: `ir_to_arm` had 19 `Opcode::*` handlers that picked the destination register by a hardcoded `let rd = Reg::R0;` or `Reg::R3;` (plus a few `Reg::R7`). With 4 AAPCS params live (`num_params=4`, common in C ABI functions), every such op clobbered the 4th param register before the first downstream `local.get` could read it. Operations fixed: `Add`, `Sub`, `Mul`, `DivS`, `DivU`, `RemS`, `RemU`, `And`, `Or`, `Xor`, `Shl`, `ShrS`, `ShrU`, `Rotr`, `Rotl`, `Clz`, `Ctz`, `Popcnt`, `Extend8S`, `Extend16S`, `Eqz`, all 10 i32 comparisons, and `Select`. Each now routes the destination through the new `alloc_i32_scratch` helper, passing operand registers in `extra_avoid` so the dest never collides with a still-live src. The bug was previously masked from `select_with_stack`-targeted tests (issue #103) because the optimizer-path is a separate code path. New regression tests (`audit_optimized_aapcs.rs`) cover 7 representative shapes: i32 add/sub/mul, i32.load, i32.load8_u, global.get, memory.size — all with `num_params=4` so a clobbered R3 fires the assert. Refs: #100, #101, #103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two more hardcoded-AAPCS-param-reg destinations found during the audit:
1. `select_with_stack` for `I64Load{8,16,32}{S,U}` hardcoded `dst_lo =
Reg::R0; dst_hi = Reg::R1`. Clobbered AAPCS params 0 and 1 on every
i64 sub-word load, regardless of `num_params`. Replaced with
`alloc_consecutive_pair(&[addr])` so the destination is callee-saved.
2. `ir_to_arm::Opcode::Copy` (the optimizer's local.tee implementation)
hardcoded `rd = Reg::R0`, clobbering param 0 on every local.tee in
the optimized path. Routed through `alloc_i32_scratch`.
Added regression test `audit_i64_subword_aapcs.rs` for the i64 sub-word
class (3 representative shapes — load8_u, load16_s, load32_u — each
with 2 params and the address coming from `i32.const`, so the params
are uninvolved and a clobber is unambiguous).
Refs: #100, #103.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Systematic audit of the recurring "hardcoded R0..R3" bug class that's been
finding new instances each rebase (PR #100's fuzz harness, PR #101's
defensive panic). Fixes four categories — root cause, not just symptoms.
Refs: #93, #94, #100, #101, #103, #104, #107.
Unblocks: PR #100 (fuzz harness), PR #101 (defensive panic).
Categories audited
1.
wasm_to_irslot accounting (optimizer_bridge.rs)15 i64 arithmetic ops (
I64Add/Sub/And/Or/Xor/Mul/Div{S,U}/Rem{S,U}/Shl/ ShrS/ShrU/Rotl/Rotr) plus 3 sign-extends (I64Extend{8,16,32}S)advanced
inst_idby 1 via the wildcard fall-through. They produce ani64 result occupying 2 vreg slots, so the next op's
inst_id - 2lookup landed inside the previous op's
dest_hi. With the issue-#94fast path enabled, the
I32WrapI64after anI64ShrUended upreferencing the I64Const's killed
dest_hi, panicking via PR #101'sdefensive
get_arm_reg. Fixed: explicitinst_id += 2; continue;for each.
2.
wasm_to_irop-handler gaps (optimizer_bridge.rs+synth-opt)Ten value-producing wasm ops fell through
_ => Opcode::Nop,leaving their result vreg unmapped and triggering PR-101's panic for
any downstream consumer:
I32Load8S/U,I32Load16S/U,I32Store8,I32Store16GlobalGet,GlobalSetMemorySize,MemoryGrowFixed: added
Opcode::{MemLoadSubword, MemStoreSubword, GlobalGet, GlobalSet, MemorySize, MemoryGrow}tosynth-opt, matching arms inwasm_to_ir, andir_to_armlowering using the existingalloc_*_scratch/alloc_i64_pairhelpers.3. CSE alias invalidation on re-definition (
synth-opt)CSE-killed
Loads recordedreg_map[killed_dest] = original, but theoptimizer's IR-convention aliasing (e.g.
I64ExtendI32U.dest_lo == src) re-defines that vreg. Without flushingreg_map[dest]at there-definition site, downstream consumers resolved back to the stale
pre-extend value — sometimes a live AAPCS param.
CSE also lacked a
resolve()call for thesrcvregs of all 29 i64opcodes (Load/Const/arith/comp/shift/extend/wrap). Fixed: explicit
arms with
resolve()for every i64 op; explicitdests_to_invalidatelist cleared post-instruction.
4. Hardcoded R0..R3 dest in
ir_to_arm(optimizer_bridge.rs)20
Opcode::*handlers picked the destination bylet rd = Reg::R0;or
Reg::R3;(a couple usedReg::R7for the SetCond-16-bit-Thumbconstraint). With 4 AAPCS params live, every such op clobbered the 4th
arg before the first downstream
local.getcould read it.Operations fixed:
Add,Sub,Mul,DivS/U,RemS/U,And,Or,Xor,Shl,ShrS/U,Rotr,Rotl,Clz,Ctz,Popcnt,Extend8S,Extend16S,Eqz, all 10 i32 comparisons,Select,Copy,MemLoad. Introducedalloc_i32_scratch(the single-reg counterpartto
alloc_i64_pair) — searches[R4..R8], skipping live vregs,non-param locals, and
param_reserved_regs. Sources are passed inextra_avoidso the dest doesn't shadow a still-live source.5.
select_with_stacki64 sub-word loads (instruction_selector.rs)I64Load{8,16,32}{S,U}hardcodeddst_lo = Reg::R0; dst_hi = Reg::R1;,clobbering AAPCS params 0 and 1 even when neither was the address.
Fixed:
alloc_consecutive_pair(&[addr]).Counts
converted from hardcoded R0..R3 to safe-alloc helpers, plus 2
structural fixes (slot accounting, CSE alias invalidation).
MemLoadSubword,MemStoreSubword,GlobalGet,GlobalSet,MemorySize,MemoryGrow).audit_subword_memops.rs(12)audit_aapcs_repro.rs(5)audit_optimized_aapcs.rs(7)audit_i64_subword_aapcs.rs(3)Verification
cargo test --workspace— all green (previously 1 failure:test_issue94_hi32_extract_is_smaller_than_generic_shiftpanicked via PR-101's defensive
get_arm_reg; now passes).cargo clippy --workspace --all-targets -- -D warnings—clean (excluding the z3-sys submodule-init issue, which is
unrelated and pre-existing on this sandbox).
cargo fmt --check— clean.cd fuzz && cargo fuzz run i64_lowering_doesnt_clobber_params -- -max_total_time=120— not runnable in this sandbox (cargo-fuzzrequires nightly + libfuzzer ASan). Will be exercised on
merge by PR feat(fuzz): cargo-fuzz harnesses for ARM instruction selection (#82) #100's fuzz-smoke CI.
Test plan
rebase — should report no findings against this branch.
🤖 Generated with Claude Code