Skip to content

test: i64 semantic correctness — 25+ tests covering all i64 wasm ops#99

Merged
avrabe merged 1 commit into
mainfrom
test/i64-semantic-correctness
May 11, 2026
Merged

test: i64 semantic correctness — 25+ tests covering all i64 wasm ops#99
avrabe merged 1 commit into
mainfrom
test/i64-semantic-correctness

Conversation

@avrabe
Copy link
Copy Markdown
Contributor

@avrabe avrabe commented May 10, 2026

Summary

Closes the i64-codegen testing gap that allowed issue #93 (silicon-blocking memset miscompile) to ship. Issue #74 (closed) introduced crates/synth-synthesis/tests/semantic_correctness.rs — a mini-ARM interpreter that verifies emitted ARM code computes the right answer — but only covered i32 ops. This PR adds crates/synth-synthesis/tests/semantic_correctness_i64.rs with 59 new tests covering every stack-aware i64 wasm op.

A test like the one in issue_93_extend_i32u_then_shr_u below would have caught #93 pre-merge.

What the new tests verify

The mini-ARM interpreter is extended with carry-aware Adds/Adc/Subs/Sbc plus the i64 pseudo-op family (I64Const, I64ExtendI32S/U, I32WrapI64, I64Extend8S/16S/32S, I64Mul, I64Shl/ShrU/ShrS, I64SetCond, I64SetCondZ). Each test compiles a Vec<WasmOp> via InstructionSelector::select_with_stack, interprets the resulting ARM stream, and asserts the final (lo, hi) register pair equals the expected u64.

Coverage by op family

Family Tests Boundary values exercised
i64.const 6 zero, small +, large hi-word, MAX, MIN, -1
i64.add 4 basic, carry across halves, neg+pos, MAX+1 → MIN
i64.sub 4 basic, borrow across halves, signed wrap, MIN-1 → MAX
i64.mul 4 basic, by zero, high-word-only, cross-product 2¹⁶ × 2¹⁶
i64.and / or / xor 5 full-word, zero-identity, self-xor=0
i64.shl 5 within-low, crosses-word, into-high-only, shl 0, shl 63
i64.shr_u 3 within-high, partial, shr_u 0
i64.shr_s 2 -8 >>s 1 = -4; -1 >>s 63 = -1
i64 comparisons 11 both true/false branches; eq/ne/lt_s/lt_u/le_s/gt_s/gt_u/ge_s/ge_u + eq on zero-pair
i64.eqz 3 zero, nonzero-low-only, nonzero-high-only
i64.extend_i32_s/u 3 zero-extend, sign-extend-neg, positive-no-change
i32.wrap_i64 1 drops upper 32
round-trip extend+wrap 1 preserves the original i32
chained (stack-aware) 2 add-then-sub, shl-then-shr_u
#93 regressions 4 extend_i32_u + shr_u; zero-high-word; param-flow shr_u; extend_i32_s + shr_s sign-preserve

Regression for issue #93

// crates/synth-synthesis/tests/semantic_correctness_i64.rs:830
fn issue_93_extend_i32u_then_shr_u() {
    // local.get 0 (i32); i64.extend_i32_u; i64.const 5; i64.shr_u
    // — the exact wasm shape that silently dropped its i64 op pre-#97.
    let instrs = selector.select_with_stack(&[
        WasmOp::LocalGet(0),
        WasmOp::I64ExtendI32U,
        WasmOp::I64Const(5),
        WasmOp::I64ShrU,
    ], 1).unwrap();

    state.set(Reg::R0, 0xCAFE_BABE);
    /* interpret */
    assert_eq!(result, 0xCAFE_BABEu64 >> 5);
}

Verified the regression test fails meaningfully when codegen is wrong by flipping I64ShrU's interpretation to wrapping_shl: the test caught the regression cleanly (left: 108_982_130_624; right: 106_427_861). Reverted before commit. Same exercise for Adc → Add in the i64.add lowering — all i64.add tests caught the dropped carry.

Ops with new semantic coverage

i64.const, i64.add, i64.sub, i64.mul, i64.and, i64.or, i64.xor, i64.shl, i64.shr_u, i64.shr_s, i64.eq, i64.ne, i64.lt_s, i64.lt_u, i64.le_s, i64.gt_s, i64.gt_u, i64.ge_s, i64.ge_u, i64.eqz, i64.extend_i32_s, i64.extend_i32_u, i32.wrap_i64.

Ops not covered and why

  • i64.div_s/u, i64.rem_s/u, i64.rotl/r, i64.clz/ctz/popcnt, i64.extend_8s/16s/32s: these still go through select_default with hardcoded R0:R1/R2:R3 and don't track the virtual stack. Mixing them with stack-aware ops yields inconsistent register state, so they're tested structurally in the existing instruction_selector.rs unit-test suite rather than semantically here. Promoting them to stack-aware lowering is follow-up work — once done, this file is the natural home for their semantic tests.
  • i64.load/store: covered by issue_93_memset_i64_codegen (when merged) and the existing i64-locals stack-frame tests.
  • Mixing i64.mul with i64.add/sub in a single chain is documented as a known limitation in the test file — i64.mul's hardcoded-register lowering breaks the pair convention the following stack-aware ops expect. Each is tested in isolation.

Files changed

  • crates/synth-synthesis/tests/semantic_correctness_i64.rs (new, 1,278 lines)

No changes to compiler code. No touching optimizer_bridge.rs (PR #97/#98), instruction_selector.rs (PR #96), the cargo-fuzz work, or the RISC-V Call op work. The existing i32 interpreter in semantic_correctness.rs is unchanged — only new handlers were added to the new sibling file.

Test plan

  • cargo test --test semantic_correctness_i64 — 59 passed, 0 failed
  • cargo test --workspace — all green, no regressions; existing semantic_correctness.rs still passes 57/57
  • cargo clippy --workspace --all-targets -- -D warnings — clean
  • cargo fmt --check — clean
  • Verified failure detection: flipped Adc → Add and shr_u → shl in the lowering; tests caught each regression; reverted

🤖 Generated with Claude Code

avrabe added a commit that referenced this pull request May 11, 2026
Unblocks every open PR (#96, #97, #99, #100, #101) that was failing on `No space left on device` during z3-sys's C++ build on smithy runners.
@avrabe avrabe force-pushed the test/i64-semantic-correctness branch from 449b4cf to f7ac520 Compare May 11, 2026 05:30
Closes the i64-codegen testing gap that allowed issue #93 (silicon-blocking
memset miscompile) to ship. Issue #74 introduced semantic_correctness.rs
with a mini-ARM interpreter for i32 ops; this companion file extends the
same approach to i64.

What the new tests verify

The mini-ARM interpreter is extended with carry-aware Adds/Adc/Subs/Sbc
plus the i64 pseudo-op family (I64Const, I64ExtendI32S/U, I32WrapI64,
I64Extend8S/16S/32S, I64Mul, I64Shl/ShrU/ShrS, I64SetCond, I64SetCondZ).
Every test compiles a WasmOp sequence via InstructionSelector::
select_with_stack, interprets the resulting ARM stream, and asserts the
final (lo, hi) register pair equals the expected u64.

Coverage by op family

- i64.const (6): zero, small positive, large high-word, MAX, MIN, -1
- i64.add (4): basic, carry-across-halves, neg+pos, MAX+1 wraps to MIN
- i64.sub (4): basic, borrow-across-halves, signed wrap, MIN-1 wraps to MAX
- i64.mul (4): basic, by zero, high-word-only, cross-product 2^16 * 2^16
- i64.and/or/xor (3 + 2 boundary): full-word, zero-identity, self-xor
- i64.shl (3 + 2 boundary): within-low, crosses-word, into-high-only, shl 0, shl 63
- i64.shr_u (2 + 1 boundary): within-high, partial, shr_u 0
- i64.shr_s (2): negative preserves sign, large shift on -1
- i64 comparisons (11): eq T/F, ne T/F, lt_s, lt_u, le_s, gt_s, gt_u, ge_s, ge_u, eq zero-pair
- i64.eqz (3): zero, non-zero-low-only, non-zero-high-only
- i64.extend_i32_s/u (3): zero-extends, sign-extends-neg, positive-no-change
- i32.wrap_i64 (1): drops upper 32
- round-trip extend_i32_u + wrap_i64 (1)
- chained (2): add-then-sub, shl-then-shr_u
- #93 regression (4): extend_i32_u + shr_u producing input >> 5 in i64 land;
  zero-high-word check after extend_i32_u; param-flow shr_u by 1;
  extend_i32_s + shr_s preserves sign

Regression for issue #93

issue_93_extend_i32u_then_shr_u tests the exact wasm shape that
previously silently dropped the i64 op:

    local.get 0 (i32); i64.extend_i32_u; i64.const 5; i64.shr_u

Verified the regression test fails meaningfully when codegen is wrong
by flipping I64ShrU's interpretation to wrapping_shl: test caught the
regression cleanly. Reverted before commit.

Ops with new coverage

i64.const, i64.add, i64.sub, i64.mul, i64.and, i64.or, i64.xor, i64.shl,
i64.shr_u, i64.shr_s, i64.eq, i64.ne, i64.lt_s, i64.lt_u, i64.le_s,
i64.gt_s, i64.gt_u, i64.ge_s, i64.ge_u, i64.eqz, i64.extend_i32_s,
i64.extend_i32_u, i32.wrap_i64.

Ops not covered and why

- i64.div_s / i64.div_u / i64.rem_s / i64.rem_u: lowering currently
  goes through select_default with hardcoded R0:R1/R2:R3 and ignores
  the virtual stack. Mixing them with stack-tracked ops produces
  inconsistent register state. The pseudo-ops themselves are tested
  structurally in instruction_selector.rs's existing test suite.
- i64.rotl / i64.rotr / i64.clz / i64.ctz / i64.popcnt: same
  select_default path — left for a follow-up that promotes them to
  stack-aware emission.
- i64.load / i64.store: covered by issue-specific tests
  (issue_93_memset_i64_codegen and the i64-locals stack-frame tests).
- i64.extend_8s/16s/32s: interpreter handlers added but no dedicated
  test (the lowering ends up in the I64Mul-style select_default path).

Mixing i64.mul with i64.add in a single chain is documented as a known
limitation — the i64.mul lowering uses hardcoded registers and the
follow-up add can't recover the actual pair. Each op is tested in
isolation; chained tests stick to ops in the stack-aware family.

Verification

- cargo test --workspace: all pass (59 new, no regressions)
- cargo clippy --workspace --all-targets -- -D warnings: clean
- cargo fmt --check: clean

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@avrabe avrabe force-pushed the test/i64-semantic-correctness branch from f7ac520 to 8b0a3ed Compare May 11, 2026 17:42
@avrabe avrabe merged commit 62d865f into main May 11, 2026
8 checks passed
@avrabe avrabe deleted the test/i64-semantic-correctness branch May 11, 2026 18:17
@codecov
Copy link
Copy Markdown

codecov Bot commented May 11, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant