feat(kvs-lance): gridlake step-2 — adaptive batching, WAL/ACID recovery, per-row seq (savant-reviewed)#30
Conversation
The .rustfmt.toml enabled nightly-only options (wrap_comments, imports_granularity, group_imports, comment_width) while the build toolchain is pinned stable 1.95 and the fmt job that would apply them (pinned rust-toolchain.nightly via ci.yml) is not running — GitHub Actions is not enabled on this fork (PR #29 head has zero check runs). So those options were enforced by nothing yet made stable `cargo fmt` warn and reformat divergently. Comment them out so stable `cargo fmt` (the 1.95 pin, matching org policy of 99% stable / nightly only for Miri) is the single authoritative formatter. Re-enable as a block if a nightly-fmt CI job is ever turned on and the tree is normalized under it in one dedicated pass. Findings (no CI on the fork; rustfmt split-brain) recorded in .claude/board/EPIPHANIES.md. https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R
… + SoA roadmap Design doc for aligning SurrealDB writes with the Lance columnar/versioned store: the convergent ingest→migrate→compact pattern and its 5 invariants; a pattern→code mapping (WAL+memtable+flusher == RocksDB; CommitGate == write-group leader; merge_insert == SST/part build; tombstone rows == ClickHouse _row_exists mask / Lance deletion vectors; Lance version == seqno/manifest; background_optimizer == compaction); the ACID story pinned property-by-property to its enforcement point; ClickHouse parity and where it breaks down; the per-row seq keystone (decoupling replay fidelity from physical batching); the SoA/gridlake transcode-free flush; compaction GC; and a status-tagged phased roadmap. Forward-looking items carry explicit CONJECTURE/ROADMAP markers so design is never confused with as-built. https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R
… pending]
Checkpoint of agent-authored Phase-1 code: ClickHouse-parity adaptive
flush triggers (rows | bytes | tick) with a rate floor so the flusher
cannot out-run compaction.
- FlusherConfig += { max_pending_bytes (8 MiB), min_flush_interval (50ms) }
- Memtable::pending_bytes() (sums key+val; tombstone counts key only)
- flusher_loop trigger extracted into a pure `should_flush`, gated by a
web_time::Instant rate floor
- tests: pending_bytes_*, should_flush_* , config defaults
Agent-reported green (91 kvs::lance tests pass) but NOT yet independently
verified: this is the "unverified code" stage of the review pipeline.
Savant review + orchestrator cargo-verification + marker removal happen
before the PR is opened. DO NOT MERGE as-is.
https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R
…, per-row seq column
P1 ClickHouse-parity adaptive batching: FlusherConfig += {max_pending_bytes
(8 MiB), min_flush_interval (50 ms)}; Memtable::pending_bytes(); the flusher
triggers on rows | bytes | periodic-tick, gated by a web_time::Instant rate
floor (shutdown final-drain bypasses it); trigger logic extracted into a pure
should_flush() for testability.
P2 WAL/ACID: lsm_recovery_atomic_multi_op_batch proves a single multi-op
transaction (2 inserts + 1 delete = one WalRecord) replays all-or-nothing
across a Box::leak crash simulation + reopen.
P3 per-row commit-sequence column: seq:UInt64 added to the Lance schema, to
build_write_batch_lance and build_tombstone_batch_lance (which now take a
parallel &[u64] seqs slice); a Datastore-level commit_seq AtomicU64 is minted
once per transaction and stamped per row through the memtable + flusher; WAL
replay reassigns fresh seqs (on-disk format unchanged); the legacy CommitGate
stamps seq = version. This decouples per-commit replay identity from the
batch-granular `version` column.
Verification: orchestrator independently ran
`cargo test -p surrealdb-core --features kv-lance --lib kvs::lance`
=> 96 passed, 0 failed, 3 ignored (confirms the build agent's result).
Multi-savant logic review (concurrency/ACID, Lance data-model, Rust idiom &
tests) is in progress; any findings land as follow-up fix commits before this
branch is proposed for merge.
Additive only; no new dependencies; stable Rust; no WAL on-disk format change.
https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R
…d max (BLOCKER) A 3-savant read-only review of the step-2 code (which passed 96/0 tests) caught a BLOCKER that single-session tests can't see: `commit_seq` was re-initialised to 0 on every `Datastore::new` and never advanced past the max `seq` already persisted in Lance, so after a restart (WAL truncated) new commits re-minted seqs that collided with / regressed below flushed rows — defeating the `seq` column's purpose as a cross-restart monotonic per-commit replay axis. `generation` is memtable-local and unpersisted (so it need only clear the WAL tail); `seq` is a Lance column and must seed from the persisted max. Fixes applied: - BLOCKER: add `Datastore::max_persisted_seq` (tolerant `seq`-column scan; 0 for empty/legacy datasets) and seed `commit_seq` from it at open; replayed WAL records mint above that floor. - Upgrade the `debug_assert_eq!` seqs/rows length guards in both batch builders to real `ArrowError` checks (enforced in release too). - Add a `flusher_tick_interval` LanceConfig knob and use it to make the seq-coalescing test deterministic (was timing-fragile on slow disks). - NITs: align the re-loop nudge to `>=`; correct the flusher floor-gating module doc; refresh the stale `KvSchema::build_write_batch` doc. - Tests: +`seq_survives_restart_above_persisted_max` (fails on the old code, proves the fix), +`seq_column_gate_path_equals_version`, +`scan_versions_for_tests` helper. Documented (accepted, pre-release) limitations the savants surfaced are recorded in .claude/board/EPIPHANIES.md (pre-`seq` schema migration; WAL reseq on replay; gate-path seq=version; seq/generation two-atomic ordering). Verified by the orchestrator (sole cargo runner): `cargo check --features kv-lance --tests` clean; `cargo test --features kv-lance --lib kvs::lance` → 98 passed, 0 failed, 3 ignored. https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e329a7a7c3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
) `max_persisted_seq` treated a dataset whose schema lacks the `seq` column as success (returning 0), but the write paths now always build 5-column batches, so the first flush/merge against such a 4-column dataset would hit an opaque Lance schema mismatch. Return a clear migration error instead — exactly the "fail with a clear migration error before allowing commits" Codex suggested. A fresh dataset created by this code always carries the 5-column schema, so this only fires for genuinely legacy (pre-release) data; all 4 seq tests pass. Also records the Cognitive-RISC substrate↔invariant mapping (and the N1 trap: do not add class_id to the kv-lance schema — it stays policy-free) in .claude/board/EPIPHANIES.md. https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R
What & why
Gridlake step 2, stacked on #29 (base =
claude/kvs-lance-timeline, so this PR shows only the step-2 delta; it auto-retargets tomainwhen #29 merges). Goal: align SurrealDB writes with Lance so thekv-lancebackend gets RocksDB-grade WAL/ACID and ClickHouse-grade write batching while staying faithful to Lance's columnar/versioned model. Full architecture in.claude/lance-backend/GRIDLAKE.md(800 lines, status-tagged).Changes (additive, stable-only, no new deps)
FlusherConfig+=max_pending_bytes(8 MiB) andmin_flush_interval(rate floor); the flusher triggers on rows ∣ bytes ∣ tick, gated by the floor so it can't out-run compaction (the "too many parts/versions" discipline). Trigger logic extracted into a pureshould_flush().Memtable::pending_bytes()added.lsm_recovery_atomic_multi_op_batch: a single multi-op transaction (2 inserts + 1 delete = oneWalRecord) replays all-or-nothing across a crash sim.seqcolumnseq: UInt64added to the Lance schema + both batch builders; aDatastoreAtomicU64minted once per transaction and stamped per row through the memtable + flusher. Decouples per-commit replay identity from the batch-granularversioncolumn..rustfmt.tomlmade stable-honest (nightly-only opts that no gate enforced are commented out, per the org's 99%-stable policy).Quality gate: multi-savant review caught a BLOCKER the tests didn't
The agent-authored code passed 96/0 tests, but a 3-savant read-only review (concurrency/ACID, Lance data-model, Rust idiom/tests) found a real BLOCKER single-session tests structurally can't see:
Fix:
Datastore::max_persisted_seq(a tolerantseq-column scan; 0 for empty/legacy datasets) seedscommit_seqat open; replayed WAL records mint above that floor. The distinction that makes this correct:generationis memtable-local and unpersisted (need only clear the WAL tail), butseqis a Lance column and must seed from the persisted max. New regressionseq_survives_restart_above_persisted_maxfails on the old code, passes on the fix.Also fixed from the review:
debug_assert_eq!length guards → realArrowErrorchecks (enforced in release); a deterministicflusher_tick_intervalknob replacing the timing-fragile coalescing test; NITs (>=symmetry, floor-gating doc, stale doc).Verification (orchestrator-run; sole cargo runner)
cargo check -p surrealdb-core --features kv-lance --tests→ Finished, 0 errorscargo test -p surrealdb-core --features kv-lance --lib kvs::lance→ 98 passed, 0 failed, 3 ignored (incl. the restart-regression proving the BLOCKER fix)Documented limitations (accepted; pre-release)
Recorded in
.claude/board/EPIPHANIES.md+ the review log.claude/board/GRIDLAKE_REVIEW.md:seq(4-column) on-disk dataset (fresh datasets only today).seq, so replay renumbers seqs above the persisted max (monotonic + unique) rather than recovering exact pre-crash values.LegacyCommitGatestampsseq = version(true per-commit fidelity is an LSM-path property).seq/generationmint from two atomics, so under concurrency seq order can disagree with commit order (harmless today; reads never consultseq).Scope / non-goals
Step 2 only. Not here: schema-migration/backfill for
seq, persistingseqin the WAL, per-commitseqon the gate path, the columnar/SoA memtable (Phase 3 in GRIDLAKE.md), and compaction-GC of tombstones (Phase 4).https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R
Generated by Claude Code