fix(opt): wasm_to_ir missing handler for i32.load/i32.store result vregs (#104)#107
Merged
Merged
Conversation
#104) When `CommonSubexpressionElimination` eliminates a duplicate `Opcode::Load` (the IR-level form of `local.get`), it replaces the dead producer with a placeholder `Const` and records `reg_map[dead] = live` so later instructions can be rewritten to reference the survivor. The CSE match arms covered all i32 arithmetic / comparison / shift / unary ops plus `Copy`, `Return`, `Select`, `Eqz`, and `CondBranch` — but omitted `Opcode::MemLoad` and `Opcode::MemStore` (linear-memory ops), `Opcode::Extend8S` / `Extend16S`, and the `src` operand of `Store` / `TeeStore`. For the canonical store-then-load function (func (param i32 i32) (result i32) local.get 0 local.get 1 i32.store local.get 0 ;; <-- CSE kills this Load against the earlier one i32.load) ;; <-- but its addr vreg never gets rewritten the second `local.get 0` lowered to `Opcode::Load { dest: v3, addr: 0 }`, which CSE killed (recording `reg_map[v3] = v0`). The subsequent `Opcode::MemLoad { addr: v3, .. }` was then dropped through the CSE match's `_ => {}` fallback with `addr` untouched. `optimize_full`'s dead-instruction filter removed the now-`is_dead` Load, leaving the MemLoad referencing a vreg with no producer. `ir_to_arm`'s `get_arm_reg` fell back to `Reg::R0` (silent miscompile pre-PR #101; defensive panic post-PR #101) and the load read from whatever R0 held at that point — typically the *stored value*, not the *user-supplied address*. Same class as #93's `I64ExtendI32U` / `I64ExtendI32S` / `I32WrapI64` gap: a wasm op whose result vreg ends up unmapped before `ir_to_arm`, masked by the silent R0 fallback in `get_arm_reg`. # What changed `crates/synth-opt/src/lib.rs` (CSE pass): * Added `Opcode::MemLoad` arm — resolve `addr`. * Added `Opcode::MemStore` arm — resolve `src` and `addr`. * Replaced `Opcode::Store { addr, .. } =>` with a body that also resolves `src` (was previously dropped silently). * Replaced `Opcode::TeeStore { addr, .. } =>` with a body that resolves `src` (same reason). * Added `Opcode::Extend8S` / `Opcode::Extend16S` arms — resolve `src`. We intentionally do not CSE-eliminate `MemLoad` itself: there is no alias analysis for linear memory in this pass, and any `MemStore` can invalidate any address. # Audit Walked every `Opcode::*` variant for which CSE could leak an unresolved operand into `ir_to_arm`. The fixed ops above are the ones that have real consumers in `wasm_to_ir`'s output. Remaining unhandled variants in CSE (every i64 op: `I64Add`/`I64Sub`/`I64And`/`I64Or`/`I64Xor`/ `I64Mul`/`I64Div*`/`I64Rem*`/`I64Shl`/`I64ShrS`/`I64ShrU`/`I64Rotl`/ `I64Rotr`/`I64Eq`..`I64GeU`/`I64Eqz`/`I64Clz`/`I64Ctz`/`I64Popcnt`/ `I64Extend*`/`I64ExtendI32U`/`I64ExtendI32S`/`I32WrapI64`/`I64Load`/ `I64Const`) currently can't trip this bug — CSE never produces `reg_map` entries for those operations' input registers because no CSE arm targets an i64 dest. If a future patch extends CSE to handle any i64 op (which would be sound, e.g. `I64Add` of two constant pairs), those consumer arms must be filled in at the same time. Flagged as follow-up; not a current miscompile. # Regression test New `crates/synth-synthesis/tests/issue_104_i32_loadstore_cse.rs` (3 tests): * `issue_104_store_load_roundtrip_optimized` — drives the i32_memory pattern through `optimize_full + ir_to_arm` and runs the emitted ARM under a mini Thumb-2 interpreter (modeling `Movw`, `Movt`, `Add`, `Mov`, `Str`, `Ldr` against a sparse-byte linear memory). Asserts the function returns the value that was stored. * `issue_104_store_load_table_driven` — same shape, 5 (addr, value) pairs covering 0, ±1, i32::MAX. * `issue_104_load_and_store_use_consistent_base_register` — structural check that `Str` and `Ldr` use the same `MemAddr::base` register (both should compute `linear_mem_base + user_addr` in the same scratch). Pre-fix: with PR #101's defensive panic on top of v0.2.1, the existing `crates/synth-cli/tests/wast_compile.rs::compile_i32_memory` test fails with "vreg v3 has no assigned ARM register and no spill slot". Post-fix: that test (untouched) goes from panic to pass, and PR #101's defensive panic should now be safe to land. References: #104, #93, #101. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #104. Same class as #93.
Summary
When
CommonSubexpressionElimination(incrates/synth-opt/src/lib.rs) eliminates a duplicateOpcode::Load(the IR form oflocal.get), it replaces the dead producer with a placeholderConst, marksis_dead = true, and recordsreg_map[dead_vreg] = live_vregso subsequent instructions can be rewritten to reference the survivor. The CSE match had arms for all i32 arithmetic / comparison / shift / unary ops plusCopy,Return,Select,Eqz,CondBranch— but omittedOpcode::MemLoad/Opcode::MemStore(linear-memory ops),Opcode::Extend8S/Extend16S, and thesrcfield ofStore/TeeStore.For the canonical store-then-load function (
tests/wast/i32_memory.wast):the second
local.get 0lowered toOpcode::Load { dest: v3, addr: 0 }. CSE killed it (reg_map[v3] = v0). The subsequentOpcode::MemLoad { addr: v3, .. }was then dropped through the CSE match's_ => {}fallback withaddruntouched.optimize_full's dead-instruction filter removed the now-is_deadLoad, leaving the MemLoad referencing a vreg with no producer.ir_to_arm'sget_arm_regfell back toReg::R0— silent miscompile pre-PR #101; defensive panic post-PR #101 with"vreg v3 has no assigned ARM register and no spill slot". The load read from whatever R0 held at that point.This is exactly the same shape as #93 (
I64ExtendI32U/I64ExtendI32S/I32WrapI64inwasm_to_ir): a wasm op whose result vreg ends up unmapped beforeir_to_arm, masked by the silent R0 fallback inget_arm_reg.What changed
crates/synth-opt/src/lib.rs(CSE pass):Opcode::MemLoadarm — resolvesaddrthroughreg_map.Opcode::MemStorearm — resolvessrcandaddr.Opcode::Store { addr, .. } =>body sosrcis also resolved (was previously dropped silently).Opcode::TeeStore { addr, .. } =>body sosrcis also resolved (same reason).Opcode::Extend8S/Opcode::Extend16Sarms — resolvesrc.We intentionally do not CSE-eliminate
MemLoaditself: there is no alias analysis for linear memory in this pass, and anyMemStorecan invalidate any address.Audit (per task step 5 — don't whack-a-mole)
Walked every
Opcode::*variant for which CSE could leak an unresolved operand intoir_to_arm. The variants fixed above are the ones that have real consumers inwasm_to_ir's output.Remaining unhandled variants in CSE (every i64 op:
I64Add/I64Sub/I64And/I64Or/I64Xor/I64Mul/I64Div*/I64Rem*/I64Shl/I64ShrS/I64ShrU/I64Rotl/I64Rotr/I64Eq..I64GeU/I64Eqz/I64Clz/I64Ctz/I64Popcnt/I64Extend*/I64ExtendI32U/I64ExtendI32S/I32WrapI64/I64Load/I64Const) currently cannot trip this bug: CSE never producesreg_mapentries for i64 dest vregs because no CSE arm targets an i64 dest. If a future patch extends CSE to handle any i64 op (which would be sound, e.g.I64Addof two constant pairs), those consumer arms must be filled in at the same time. Flagged as follow-up; not a current miscompile.Regression test
New
crates/synth-synthesis/tests/issue_104_i32_loadstore_cse.rs(4 tests):issue_104_store_load_roundtrip_optimized— drives the i32_memory pattern throughoptimize_full + ir_to_armand runs the emitted ARM under a mini Thumb-2 interpreter (modelingMovw,Movt,Add,Mov,Str,Ldragainst a sparse-byte linear memory). Asserts the function returns the value that was stored.issue_104_store_load_table_driven— same shape, 5 (addr, value) pairs covering 0, ±1, i32::MAX.issue_104_load_addr_does_not_use_value_register— structural assertion that the load's address-computationAdd R12, R12, R<src>reads from R1 (the address param), not R0 (the value param).issue_104_load_and_store_use_consistent_base_register—StrandLdruse the sameMemAddr::baseregister (both should computelinear_mem_base + user_addrin the same scratch).The first three tests use a swapped-operand variant (param 1 = address, param 0 = value): in the natural
param 0 = addresslayout, the unmappedv3happens to fall back toReg::R0— which is alsoparam_regs[0]forlocal 0— so the buggy code path happens to produce the right answer by accident, masking the bug from behavioural tests. Swapping the operands forces the correct register to be R1, so the R0 fallback gives an observably wrong answer.Verified fails-then-passes: with the CSE fix reverted, 3 of 4 tests fail (the roundtrip returns
0x0instead of0xCAFEBABE, the structural Add check reportsR0instead ofR1). Restoring the fix makes all 4 pass.Pre/post-fix on existing tests
Pre-fix, with PR #101's defensive panic on top of v0.2.1:
Post-fix,
crates/synth-cli/tests/wast_compile.rs::compile_i32_memory(untouched per task instructions) goes from panic to pass. PR #101's defensive panic should now be safe to land.Test plan
cargo test --package synth-synthesis— 401 tests pass (241 unit + 4 new for i32.load/i32.store wasm_to_ir produces unmapped vreg (latent #93-class bug) #104 + 5 prerelease_validation + 9 proptest_robustness + 4 rocq + 22 semantic_correctness + 57 issue_93_memset + 59 semantic_correctness_i64 + 0 doc-tests)cargo test --package synth-opt— 33 tests passcargo test --package synth-cli --test wast_compile compile_i32_memory— passescargo fmt --check— cleancargo clippy --workspace --all-targets -- -D warnings— clean on changed files (workspace build of z3-sys can fail with cmake/OOM in concurrent environments; not introduced by this change)git checkout HEAD~1 -- crates/synth-opt/src/lib.rs && cargo test --package synth-synthesis --test issue_104_i32_loadstore_cse)References
🤖 Generated with Claude Code