test: i64 semantic correctness — 25+ tests covering all i64 wasm ops by avrabe · Pull Request #99 · pulseengine/synth

avrabe · 2026-05-10T21:12:17Z

Summary

Closes the i64-codegen testing gap that allowed issue #93 (silicon-blocking memset miscompile) to ship. Issue #74 (closed) introduced crates/synth-synthesis/tests/semantic_correctness.rs — a mini-ARM interpreter that verifies emitted ARM code computes the right answer — but only covered i32 ops. This PR adds crates/synth-synthesis/tests/semantic_correctness_i64.rs with 59 new tests covering every stack-aware i64 wasm op.

A test like the one in issue_93_extend_i32u_then_shr_u below would have caught #93 pre-merge.

What the new tests verify

The mini-ARM interpreter is extended with carry-aware Adds/Adc/Subs/Sbc plus the i64 pseudo-op family (I64Const, I64ExtendI32S/U, I32WrapI64, I64Extend8S/16S/32S, I64Mul, I64Shl/ShrU/ShrS, I64SetCond, I64SetCondZ). Each test compiles a Vec<WasmOp> via InstructionSelector::select_with_stack, interprets the resulting ARM stream, and asserts the final (lo, hi) register pair equals the expected u64.

Coverage by op family

Family	Tests	Boundary values exercised
`i64.const`	6	zero, small +, large hi-word, MAX, MIN, -1
`i64.add`	4	basic, carry across halves, neg+pos, MAX+1 → MIN
`i64.sub`	4	basic, borrow across halves, signed wrap, MIN-1 → MAX
`i64.mul`	4	basic, by zero, high-word-only, cross-product 2¹⁶ × 2¹⁶
`i64.and / or / xor`	5	full-word, zero-identity, self-xor=0
`i64.shl`	5	within-low, crosses-word, into-high-only, shl 0, shl 63
`i64.shr_u`	3	within-high, partial, shr_u 0
`i64.shr_s`	2	-8 >>s 1 = -4; -1 >>s 63 = -1
i64 comparisons	11	both true/false branches; eq/ne/lt_s/lt_u/le_s/gt_s/gt_u/ge_s/ge_u + eq on zero-pair
`i64.eqz`	3	zero, nonzero-low-only, nonzero-high-only
`i64.extend_i32_s/u`	3	zero-extend, sign-extend-neg, positive-no-change
`i32.wrap_i64`	1	drops upper 32
round-trip extend+wrap	1	preserves the original i32
chained (stack-aware)	2	add-then-sub, shl-then-shr_u
#93 regressions	4	extend_i32_u + shr_u; zero-high-word; param-flow shr_u; extend_i32_s + shr_s sign-preserve

Regression for issue #93

// crates/synth-synthesis/tests/semantic_correctness_i64.rs:830
fn issue_93_extend_i32u_then_shr_u() {
    // local.get 0 (i32); i64.extend_i32_u; i64.const 5; i64.shr_u
    // — the exact wasm shape that silently dropped its i64 op pre-#97.
    let instrs = selector.select_with_stack(&[
        WasmOp::LocalGet(0),
        WasmOp::I64ExtendI32U,
        WasmOp::I64Const(5),
        WasmOp::I64ShrU,
    ], 1).unwrap();

    state.set(Reg::R0, 0xCAFE_BABE);
    /* interpret */
    assert_eq!(result, 0xCAFE_BABEu64 >> 5);
}

Verified the regression test fails meaningfully when codegen is wrong by flipping I64ShrU's interpretation to wrapping_shl: the test caught the regression cleanly (left: 108_982_130_624; right: 106_427_861). Reverted before commit. Same exercise for Adc → Add in the i64.add lowering — all i64.add tests caught the dropped carry.

Ops with new semantic coverage

i64.const, i64.add, i64.sub, i64.mul, i64.and, i64.or, i64.xor, i64.shl, i64.shr_u, i64.shr_s, i64.eq, i64.ne, i64.lt_s, i64.lt_u, i64.le_s, i64.gt_s, i64.gt_u, i64.ge_s, i64.ge_u, i64.eqz, i64.extend_i32_s, i64.extend_i32_u, i32.wrap_i64.

Ops not covered and why

i64.div_s/u, i64.rem_s/u, i64.rotl/r, i64.clz/ctz/popcnt, i64.extend_8s/16s/32s: these still go through select_default with hardcoded R0:R1/R2:R3 and don't track the virtual stack. Mixing them with stack-aware ops yields inconsistent register state, so they're tested structurally in the existing instruction_selector.rs unit-test suite rather than semantically here. Promoting them to stack-aware lowering is follow-up work — once done, this file is the natural home for their semantic tests.
i64.load/store: covered by issue_93_memset_i64_codegen (when merged) and the existing i64-locals stack-frame tests.
Mixing i64.mul with i64.add/sub in a single chain is documented as a known limitation in the test file — i64.mul's hardcoded-register lowering breaks the pair convention the following stack-aware ops expect. Each is tested in isolation.

Files changed

crates/synth-synthesis/tests/semantic_correctness_i64.rs (new, 1,278 lines)

No changes to compiler code. No touching optimizer_bridge.rs (PR #97/#98), instruction_selector.rs (PR #96), the cargo-fuzz work, or the RISC-V Call op work. The existing i32 interpreter in semantic_correctness.rs is unchanged — only new handlers were added to the new sibling file.

Test plan

cargo test --test semantic_correctness_i64 — 59 passed, 0 failed
cargo test --workspace — all green, no regressions; existing semantic_correctness.rs still passes 57/57
cargo clippy --workspace --all-targets -- -D warnings — clean
cargo fmt --check — clean
Verified failure detection: flipped Adc → Add and shr_u → shl in the lowering; tests caught each regression; reverted

🤖 Generated with Claude Code

Unblocks every open PR (#96, #97, #99, #100, #101) that was failing on `No space left on device` during z3-sys's C++ build on smithy runners.

Closes the i64-codegen testing gap that allowed issue #93 (silicon-blocking memset miscompile) to ship. Issue #74 introduced semantic_correctness.rs with a mini-ARM interpreter for i32 ops; this companion file extends the same approach to i64. What the new tests verify The mini-ARM interpreter is extended with carry-aware Adds/Adc/Subs/Sbc plus the i64 pseudo-op family (I64Const, I64ExtendI32S/U, I32WrapI64, I64Extend8S/16S/32S, I64Mul, I64Shl/ShrU/ShrS, I64SetCond, I64SetCondZ). Every test compiles a WasmOp sequence via InstructionSelector:: select_with_stack, interprets the resulting ARM stream, and asserts the final (lo, hi) register pair equals the expected u64. Coverage by op family - i64.const (6): zero, small positive, large high-word, MAX, MIN, -1 - i64.add (4): basic, carry-across-halves, neg+pos, MAX+1 wraps to MIN - i64.sub (4): basic, borrow-across-halves, signed wrap, MIN-1 wraps to MAX - i64.mul (4): basic, by zero, high-word-only, cross-product 2^16 * 2^16 - i64.and/or/xor (3 + 2 boundary): full-word, zero-identity, self-xor - i64.shl (3 + 2 boundary): within-low, crosses-word, into-high-only, shl 0, shl 63 - i64.shr_u (2 + 1 boundary): within-high, partial, shr_u 0 - i64.shr_s (2): negative preserves sign, large shift on -1 - i64 comparisons (11): eq T/F, ne T/F, lt_s, lt_u, le_s, gt_s, gt_u, ge_s, ge_u, eq zero-pair - i64.eqz (3): zero, non-zero-low-only, non-zero-high-only - i64.extend_i32_s/u (3): zero-extends, sign-extends-neg, positive-no-change - i32.wrap_i64 (1): drops upper 32 - round-trip extend_i32_u + wrap_i64 (1) - chained (2): add-then-sub, shl-then-shr_u - #93 regression (4): extend_i32_u + shr_u producing input >> 5 in i64 land; zero-high-word check after extend_i32_u; param-flow shr_u by 1; extend_i32_s + shr_s preserves sign Regression for issue #93 issue_93_extend_i32u_then_shr_u tests the exact wasm shape that previously silently dropped the i64 op: local.get 0 (i32); i64.extend_i32_u; i64.const 5; i64.shr_u Verified the regression test fails meaningfully when codegen is wrong by flipping I64ShrU's interpretation to wrapping_shl: test caught the regression cleanly. Reverted before commit. Ops with new coverage i64.const, i64.add, i64.sub, i64.mul, i64.and, i64.or, i64.xor, i64.shl, i64.shr_u, i64.shr_s, i64.eq, i64.ne, i64.lt_s, i64.lt_u, i64.le_s, i64.gt_s, i64.gt_u, i64.ge_s, i64.ge_u, i64.eqz, i64.extend_i32_s, i64.extend_i32_u, i32.wrap_i64. Ops not covered and why - i64.div_s / i64.div_u / i64.rem_s / i64.rem_u: lowering currently goes through select_default with hardcoded R0:R1/R2:R3 and ignores the virtual stack. Mixing them with stack-tracked ops produces inconsistent register state. The pseudo-ops themselves are tested structurally in instruction_selector.rs's existing test suite. - i64.rotl / i64.rotr / i64.clz / i64.ctz / i64.popcnt: same select_default path — left for a follow-up that promotes them to stack-aware emission. - i64.load / i64.store: covered by issue-specific tests (issue_93_memset_i64_codegen and the i64-locals stack-frame tests). - i64.extend_8s/16s/32s: interpreter handlers added but no dedicated test (the lowering ends up in the I64Mul-style select_default path). Mixing i64.mul with i64.add in a single chain is documented as a known limitation — the i64.mul lowering uses hardcoded registers and the follow-up add can't recover the actual pair. Each op is tested in isolation; chained tests stick to ops in the stack-aware family. Verification - cargo test --workspace: all pass (59 new, no regressions) - cargo clippy --workspace --all-targets -- -D warnings: clean - cargo fmt --check: clean Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

codecov · 2026-05-11T18:25:07Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

avrabe mentioned this pull request May 11, 2026

fix(ci): move Test + Clippy back to ubuntu-latest (unblocks all open PRs) #102

Merged

avrabe added a commit that referenced this pull request May 11, 2026

fix(ci): move Test + Clippy back to ubuntu-latest (#102)

6febcbe

Unblocks every open PR (#96, #97, #99, #100, #101) that was failing on `No space left on device` during z3-sys's C++ build on smithy runners.

avrabe force-pushed the test/i64-semantic-correctness branch from 449b4cf to f7ac520 Compare May 11, 2026 05:30

avrabe force-pushed the test/i64-semantic-correctness branch from f7ac520 to 8b0a3ed Compare May 11, 2026 17:42

This was referenced May 11, 2026

i32.load/i32.store wasm_to_ir produces unmapped vreg (latent #93-class bug) #104

Closed

docs+test: spectre/csdb policy + aarch64 CVE audit + arXiv 2604.17391 citation #105

Merged

avrabe merged commit 62d865f into main May 11, 2026
8 checks passed

avrabe deleted the test/i64-semantic-correctness branch May 11, 2026 18:17

avrabe mentioned this pull request May 11, 2026

fix(opt): wasm_to_ir missing handler for i32.load/i32.store result vregs (#104) #107

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: i64 semantic correctness — 25+ tests covering all i64 wasm ops#99

test: i64 semantic correctness — 25+ tests covering all i64 wasm ops#99
avrabe merged 1 commit into
mainfrom
test/i64-semantic-correctness

avrabe commented May 10, 2026

Uh oh!

Uh oh!

codecov Bot commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

avrabe commented May 10, 2026

Summary

What the new tests verify

Coverage by op family

Regression for issue #93

Ops with new semantic coverage

Ops not covered and why

Files changed

Test plan

Uh oh!

Uh oh!

codecov Bot commented May 11, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant