Skip to content

fix(opt): wasm_to_ir missing handler for i32.load/i32.store result vregs (#104)#107

Merged
avrabe merged 1 commit into
mainfrom
fix/issue-104-i32-loadstore-vreg
May 12, 2026
Merged

fix(opt): wasm_to_ir missing handler for i32.load/i32.store result vregs (#104)#107
avrabe merged 1 commit into
mainfrom
fix/issue-104-i32-loadstore-vreg

Conversation

@avrabe
Copy link
Copy Markdown
Contributor

@avrabe avrabe commented May 11, 2026

Fixes #104. Same class as #93.

Summary

When CommonSubexpressionElimination (in crates/synth-opt/src/lib.rs) eliminates a duplicate Opcode::Load (the IR form of local.get), it replaces the dead producer with a placeholder Const, marks is_dead = true, and records reg_map[dead_vreg] = live_vreg so subsequent instructions can be rewritten to reference the survivor. The CSE match had arms for all i32 arithmetic / comparison / shift / unary ops plus Copy, Return, Select, Eqz, CondBranch — but omitted Opcode::MemLoad / Opcode::MemStore (linear-memory ops), Opcode::Extend8S / Extend16S, and the src field of Store / TeeStore.

For the canonical store-then-load function (tests/wast/i32_memory.wast):

(func (param i32 i32) (result i32)
  local.get 0      ;; addr
  local.get 1      ;; value
  i32.store
  local.get 0      ;; <-- CSE kills this Load against the earlier one
  i32.load)        ;; <-- but its addr vreg never gets rewritten

the second local.get 0 lowered to Opcode::Load { dest: v3, addr: 0 }. CSE killed it (reg_map[v3] = v0). The subsequent Opcode::MemLoad { addr: v3, .. } was then dropped through the CSE match's _ => {} fallback with addr untouched. optimize_full's dead-instruction filter removed the now-is_dead Load, leaving the MemLoad referencing a vreg with no producer. ir_to_arm's get_arm_reg fell back to Reg::R0 — silent miscompile pre-PR #101; defensive panic post-PR #101 with "vreg v3 has no assigned ARM register and no spill slot". The load read from whatever R0 held at that point.

This is exactly the same shape as #93 (I64ExtendI32U / I64ExtendI32S / I32WrapI64 in wasm_to_ir): a wasm op whose result vreg ends up unmapped before ir_to_arm, masked by the silent R0 fallback in get_arm_reg.

What changed

crates/synth-opt/src/lib.rs (CSE pass):

  • Added Opcode::MemLoad arm — resolves addr through reg_map.
  • Added Opcode::MemStore arm — resolves src and addr.
  • Replaced Opcode::Store { addr, .. } => body so src is also resolved (was previously dropped silently).
  • Replaced Opcode::TeeStore { addr, .. } => body so src is also resolved (same reason).
  • Added Opcode::Extend8S / Opcode::Extend16S arms — resolve src.

We intentionally do not CSE-eliminate MemLoad itself: there is no alias analysis for linear memory in this pass, and any MemStore can invalidate any address.

Audit (per task step 5 — don't whack-a-mole)

Walked every Opcode::* variant for which CSE could leak an unresolved operand into ir_to_arm. The variants fixed above are the ones that have real consumers in wasm_to_ir's output.

Remaining unhandled variants in CSE (every i64 op: I64Add/I64Sub/I64And/I64Or/I64Xor/I64Mul/I64Div*/I64Rem*/I64Shl/I64ShrS/I64ShrU/I64Rotl/I64Rotr/I64Eq..I64GeU/I64Eqz/I64Clz/I64Ctz/I64Popcnt/I64Extend*/I64ExtendI32U/I64ExtendI32S/I32WrapI64/I64Load/I64Const) currently cannot trip this bug: CSE never produces reg_map entries for i64 dest vregs because no CSE arm targets an i64 dest. If a future patch extends CSE to handle any i64 op (which would be sound, e.g. I64Add of two constant pairs), those consumer arms must be filled in at the same time. Flagged as follow-up; not a current miscompile.

Regression test

New crates/synth-synthesis/tests/issue_104_i32_loadstore_cse.rs (4 tests):

  • issue_104_store_load_roundtrip_optimized — drives the i32_memory pattern through optimize_full + ir_to_arm and runs the emitted ARM under a mini Thumb-2 interpreter (modeling Movw, Movt, Add, Mov, Str, Ldr against a sparse-byte linear memory). Asserts the function returns the value that was stored.
  • issue_104_store_load_table_driven — same shape, 5 (addr, value) pairs covering 0, ±1, i32::MAX.
  • issue_104_load_addr_does_not_use_value_register — structural assertion that the load's address-computation Add R12, R12, R<src> reads from R1 (the address param), not R0 (the value param).
  • issue_104_load_and_store_use_consistent_base_registerStr and Ldr use the same MemAddr::base register (both should compute linear_mem_base + user_addr in the same scratch).

The first three tests use a swapped-operand variant (param 1 = address, param 0 = value): in the natural param 0 = address layout, the unmapped v3 happens to fall back to Reg::R0 — which is also param_regs[0] for local 0 — so the buggy code path happens to produce the right answer by accident, masking the bug from behavioural tests. Swapping the operands forces the correct register to be R1, so the R0 fallback gives an observably wrong answer.

Verified fails-then-passes: with the CSE fix reverted, 3 of 4 tests fail (the roundtrip returns 0x0 instead of 0xCAFEBABE, the structural Add check reports R0 instead of R1). Restoring the fix makes all 4 pass.

Pre/post-fix on existing tests

Pre-fix, with PR #101's defensive panic on top of v0.2.1:

test compile_i32_memory ... FAILED
thread 'main' panicked at crates/synth-synthesis/src/optimizer_bridge.rs:1442:21:
synth internal compiler error: vreg v3 has no assigned ARM register and no spill slot.

Post-fix, crates/synth-cli/tests/wast_compile.rs::compile_i32_memory (untouched per task instructions) goes from panic to pass. PR #101's defensive panic should now be safe to land.

Test plan

  • cargo test --package synth-synthesis — 401 tests pass (241 unit + 4 new for i32.load/i32.store wasm_to_ir produces unmapped vreg (latent #93-class bug) #104 + 5 prerelease_validation + 9 proptest_robustness + 4 rocq + 22 semantic_correctness + 57 issue_93_memset + 59 semantic_correctness_i64 + 0 doc-tests)
  • cargo test --package synth-opt — 33 tests pass
  • cargo test --package synth-cli --test wast_compile compile_i32_memory — passes
  • cargo fmt --check — clean
  • cargo clippy --workspace --all-targets -- -D warnings — clean on changed files (workspace build of z3-sys can fail with cmake/OOM in concurrent environments; not introduced by this change)
  • Regression test fails without the fix, passes with the fix (verified by git checkout HEAD~1 -- crates/synth-opt/src/lib.rs && cargo test --package synth-synthesis --test issue_104_i32_loadstore_cse)

References

🤖 Generated with Claude Code

#104)

When `CommonSubexpressionElimination` eliminates a duplicate
`Opcode::Load` (the IR-level form of `local.get`), it replaces the dead
producer with a placeholder `Const` and records `reg_map[dead] = live`
so later instructions can be rewritten to reference the survivor. The
CSE match arms covered all i32 arithmetic / comparison / shift / unary
ops plus `Copy`, `Return`, `Select`, `Eqz`, and `CondBranch` — but
omitted `Opcode::MemLoad` and `Opcode::MemStore` (linear-memory ops),
`Opcode::Extend8S` / `Extend16S`, and the `src` operand of `Store` /
`TeeStore`.

For the canonical store-then-load function

    (func (param i32 i32) (result i32)
      local.get 0
      local.get 1
      i32.store
      local.get 0   ;; <-- CSE kills this Load against the earlier one
      i32.load)     ;; <-- but its addr vreg never gets rewritten

the second `local.get 0` lowered to `Opcode::Load { dest: v3, addr: 0 }`,
which CSE killed (recording `reg_map[v3] = v0`). The subsequent
`Opcode::MemLoad { addr: v3, .. }` was then dropped through the CSE
match's `_ => {}` fallback with `addr` untouched. `optimize_full`'s
dead-instruction filter removed the now-`is_dead` Load, leaving the
MemLoad referencing a vreg with no producer. `ir_to_arm`'s
`get_arm_reg` fell back to `Reg::R0` (silent miscompile pre-PR #101;
defensive panic post-PR #101) and the load read from whatever R0 held
at that point — typically the *stored value*, not the *user-supplied
address*.

Same class as #93's `I64ExtendI32U` / `I64ExtendI32S` / `I32WrapI64`
gap: a wasm op whose result vreg ends up unmapped before `ir_to_arm`,
masked by the silent R0 fallback in `get_arm_reg`.

# What changed

`crates/synth-opt/src/lib.rs` (CSE pass):
  * Added `Opcode::MemLoad` arm — resolve `addr`.
  * Added `Opcode::MemStore` arm — resolve `src` and `addr`.
  * Replaced `Opcode::Store { addr, .. } =>` with a body that also
    resolves `src` (was previously dropped silently).
  * Replaced `Opcode::TeeStore { addr, .. } =>` with a body that
    resolves `src` (same reason).
  * Added `Opcode::Extend8S` / `Opcode::Extend16S` arms — resolve `src`.

We intentionally do not CSE-eliminate `MemLoad` itself: there is no
alias analysis for linear memory in this pass, and any `MemStore` can
invalidate any address.

# Audit

Walked every `Opcode::*` variant for which CSE could leak an unresolved
operand into `ir_to_arm`. The fixed ops above are the ones that have
real consumers in `wasm_to_ir`'s output. Remaining unhandled variants
in CSE (every i64 op: `I64Add`/`I64Sub`/`I64And`/`I64Or`/`I64Xor`/
`I64Mul`/`I64Div*`/`I64Rem*`/`I64Shl`/`I64ShrS`/`I64ShrU`/`I64Rotl`/
`I64Rotr`/`I64Eq`..`I64GeU`/`I64Eqz`/`I64Clz`/`I64Ctz`/`I64Popcnt`/
`I64Extend*`/`I64ExtendI32U`/`I64ExtendI32S`/`I32WrapI64`/`I64Load`/
`I64Const`) currently can't trip this bug — CSE never produces
`reg_map` entries for those operations' input registers because no CSE
arm targets an i64 dest. If a future patch extends CSE to handle any
i64 op (which would be sound, e.g. `I64Add` of two constant pairs),
those consumer arms must be filled in at the same time. Flagged as
follow-up; not a current miscompile.

# Regression test

New `crates/synth-synthesis/tests/issue_104_i32_loadstore_cse.rs`
(3 tests):
  * `issue_104_store_load_roundtrip_optimized` — drives the i32_memory
    pattern through `optimize_full + ir_to_arm` and runs the emitted
    ARM under a mini Thumb-2 interpreter (modeling `Movw`, `Movt`,
    `Add`, `Mov`, `Str`, `Ldr` against a sparse-byte linear memory).
    Asserts the function returns the value that was stored.
  * `issue_104_store_load_table_driven` — same shape, 5 (addr, value)
    pairs covering 0, ±1, i32::MAX.
  * `issue_104_load_and_store_use_consistent_base_register` —
    structural check that `Str` and `Ldr` use the same `MemAddr::base`
    register (both should compute `linear_mem_base + user_addr` in the
    same scratch).

Pre-fix: with PR #101's defensive panic on top of v0.2.1, the existing
`crates/synth-cli/tests/wast_compile.rs::compile_i32_memory` test
fails with "vreg v3 has no assigned ARM register and no spill slot".
Post-fix: that test (untouched) goes from panic to pass, and PR #101's
defensive panic should now be safe to land.

References: #104, #93, #101.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

❌ Patch coverage is 65.21739% with 8 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
crates/synth-opt/src/lib.rs 65.21% 8 Missing ⚠️

📢 Thoughts on this report? Let us know!

@avrabe avrabe merged commit 7e5cf12 into main May 12, 2026
8 of 9 checks passed
@avrabe avrabe deleted the fix/issue-104-i32-loadstore-vreg branch May 12, 2026 04:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

i32.load/i32.store wasm_to_ir produces unmapped vreg (latent #93-class bug)

1 participant