test: i64 semantic correctness — 25+ tests covering all i64 wasm ops#99
Merged
Conversation
449b4cf to
f7ac520
Compare
Closes the i64-codegen testing gap that allowed issue #93 (silicon-blocking memset miscompile) to ship. Issue #74 introduced semantic_correctness.rs with a mini-ARM interpreter for i32 ops; this companion file extends the same approach to i64. What the new tests verify The mini-ARM interpreter is extended with carry-aware Adds/Adc/Subs/Sbc plus the i64 pseudo-op family (I64Const, I64ExtendI32S/U, I32WrapI64, I64Extend8S/16S/32S, I64Mul, I64Shl/ShrU/ShrS, I64SetCond, I64SetCondZ). Every test compiles a WasmOp sequence via InstructionSelector:: select_with_stack, interprets the resulting ARM stream, and asserts the final (lo, hi) register pair equals the expected u64. Coverage by op family - i64.const (6): zero, small positive, large high-word, MAX, MIN, -1 - i64.add (4): basic, carry-across-halves, neg+pos, MAX+1 wraps to MIN - i64.sub (4): basic, borrow-across-halves, signed wrap, MIN-1 wraps to MAX - i64.mul (4): basic, by zero, high-word-only, cross-product 2^16 * 2^16 - i64.and/or/xor (3 + 2 boundary): full-word, zero-identity, self-xor - i64.shl (3 + 2 boundary): within-low, crosses-word, into-high-only, shl 0, shl 63 - i64.shr_u (2 + 1 boundary): within-high, partial, shr_u 0 - i64.shr_s (2): negative preserves sign, large shift on -1 - i64 comparisons (11): eq T/F, ne T/F, lt_s, lt_u, le_s, gt_s, gt_u, ge_s, ge_u, eq zero-pair - i64.eqz (3): zero, non-zero-low-only, non-zero-high-only - i64.extend_i32_s/u (3): zero-extends, sign-extends-neg, positive-no-change - i32.wrap_i64 (1): drops upper 32 - round-trip extend_i32_u + wrap_i64 (1) - chained (2): add-then-sub, shl-then-shr_u - #93 regression (4): extend_i32_u + shr_u producing input >> 5 in i64 land; zero-high-word check after extend_i32_u; param-flow shr_u by 1; extend_i32_s + shr_s preserves sign Regression for issue #93 issue_93_extend_i32u_then_shr_u tests the exact wasm shape that previously silently dropped the i64 op: local.get 0 (i32); i64.extend_i32_u; i64.const 5; i64.shr_u Verified the regression test fails meaningfully when codegen is wrong by flipping I64ShrU's interpretation to wrapping_shl: test caught the regression cleanly. Reverted before commit. Ops with new coverage i64.const, i64.add, i64.sub, i64.mul, i64.and, i64.or, i64.xor, i64.shl, i64.shr_u, i64.shr_s, i64.eq, i64.ne, i64.lt_s, i64.lt_u, i64.le_s, i64.gt_s, i64.gt_u, i64.ge_s, i64.ge_u, i64.eqz, i64.extend_i32_s, i64.extend_i32_u, i32.wrap_i64. Ops not covered and why - i64.div_s / i64.div_u / i64.rem_s / i64.rem_u: lowering currently goes through select_default with hardcoded R0:R1/R2:R3 and ignores the virtual stack. Mixing them with stack-tracked ops produces inconsistent register state. The pseudo-ops themselves are tested structurally in instruction_selector.rs's existing test suite. - i64.rotl / i64.rotr / i64.clz / i64.ctz / i64.popcnt: same select_default path — left for a follow-up that promotes them to stack-aware emission. - i64.load / i64.store: covered by issue-specific tests (issue_93_memset_i64_codegen and the i64-locals stack-frame tests). - i64.extend_8s/16s/32s: interpreter handlers added but no dedicated test (the lowering ends up in the I64Mul-style select_default path). Mixing i64.mul with i64.add in a single chain is documented as a known limitation — the i64.mul lowering uses hardcoded registers and the follow-up add can't recover the actual pair. Each op is tested in isolation; chained tests stick to ops in the stack-aware family. Verification - cargo test --workspace: all pass (59 new, no regressions) - cargo clippy --workspace --all-targets -- -D warnings: clean - cargo fmt --check: clean Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
f7ac520 to
8b0a3ed
Compare
This was referenced May 11, 2026
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the i64-codegen testing gap that allowed issue #93 (silicon-blocking memset miscompile) to ship. Issue #74 (closed) introduced
crates/synth-synthesis/tests/semantic_correctness.rs— a mini-ARM interpreter that verifies emitted ARM code computes the right answer — but only covered i32 ops. This PR addscrates/synth-synthesis/tests/semantic_correctness_i64.rswith 59 new tests covering every stack-aware i64 wasm op.A test like the one in
issue_93_extend_i32u_then_shr_ubelow would have caught #93 pre-merge.What the new tests verify
The mini-ARM interpreter is extended with carry-aware
Adds/Adc/Subs/Sbcplus the i64 pseudo-op family (I64Const,I64ExtendI32S/U,I32WrapI64,I64Extend8S/16S/32S,I64Mul,I64Shl/ShrU/ShrS,I64SetCond,I64SetCondZ). Each test compiles aVec<WasmOp>viaInstructionSelector::select_with_stack, interprets the resulting ARM stream, and asserts the final(lo, hi)register pair equals the expectedu64.Coverage by op family
i64.consti64.addi64.subi64.muli64.and / or / xori64.shli64.shr_ui64.shr_si64.eqzi64.extend_i32_s/ui32.wrap_i64Regression for issue #93
Verified the regression test fails meaningfully when codegen is wrong by flipping
I64ShrU's interpretation towrapping_shl: the test caught the regression cleanly (left: 108_982_130_624; right: 106_427_861). Reverted before commit. Same exercise forAdc → Addin the i64.add lowering — all i64.add tests caught the dropped carry.Ops with new semantic coverage
i64.const,i64.add,i64.sub,i64.mul,i64.and,i64.or,i64.xor,i64.shl,i64.shr_u,i64.shr_s,i64.eq,i64.ne,i64.lt_s,i64.lt_u,i64.le_s,i64.gt_s,i64.gt_u,i64.ge_s,i64.ge_u,i64.eqz,i64.extend_i32_s,i64.extend_i32_u,i32.wrap_i64.Ops not covered and why
i64.div_s/u,i64.rem_s/u,i64.rotl/r,i64.clz/ctz/popcnt,i64.extend_8s/16s/32s: these still go throughselect_defaultwith hardcodedR0:R1/R2:R3and don't track the virtual stack. Mixing them with stack-aware ops yields inconsistent register state, so they're tested structurally in the existinginstruction_selector.rsunit-test suite rather than semantically here. Promoting them to stack-aware lowering is follow-up work — once done, this file is the natural home for their semantic tests.i64.load/store: covered byissue_93_memset_i64_codegen(when merged) and the existing i64-locals stack-frame tests.i64.mulwithi64.add/subin a single chain is documented as a known limitation in the test file —i64.mul's hardcoded-register lowering breaks the pair convention the following stack-aware ops expect. Each is tested in isolation.Files changed
crates/synth-synthesis/tests/semantic_correctness_i64.rs(new, 1,278 lines)No changes to compiler code. No touching
optimizer_bridge.rs(PR #97/#98),instruction_selector.rs(PR #96), the cargo-fuzz work, or the RISC-V Call op work. The existing i32 interpreter insemantic_correctness.rsis unchanged — only new handlers were added to the new sibling file.Test plan
cargo test --test semantic_correctness_i64— 59 passed, 0 failedcargo test --workspace— all green, no regressions; existingsemantic_correctness.rsstill passes 57/57cargo clippy --workspace --all-targets -- -D warnings— cleancargo fmt --check— cleanAdc → Addandshr_u → shlin the lowering; tests caught each regression; reverted🤖 Generated with Claude Code