kv-lance: native lance read/write backend (drop hand-rolled LSM) by AdaWorldAPI · Pull Request #31 · AdaWorldAPI/surrealdb

AdaWorldAPI · 2026-05-30T18:19:35Z

What

Rewrites the SurrealDB kv-lance backend to read & write through lance's native path — the same way lance-graph uses lance — replacing the hand-rolled LSM (memtable + WAL + flusher + commit-gate) prior sessions had layered on top of it.

commit = one MergeInsertBuilder::execute_reader → exactly one lance dataset version per SurrealDB commit (writes + delete-tombstones folded into a single batch).
reads = pending buffer (read-your-writes) → Dataset::checkout_version(v) / latest → scan().filter().project(), pending merged before skip/limit.
versioning / time-travel = native Dataset::versions() / checkout_version().
compaction / GC = lance's own optimize / cleanup_old_versions (background optimizer), not a custom flusher.

Removed (the reinvention)

memtable.rs, wal.rs, flusher.rs, commit_gate.rs, the WritePath enum, and the LSM-era integration_tests.rs — all duplicated lance 6's built-in machinery.

Kept

tx_buffer (lance has no per-row txn buffer), schema, cnf, background_optimizer, and the read-only timeline view over Dataset::versions().

Built & verified

One agent per file (mod.rs / config.rs / tests.rs).
Three independent savant reviews — ACID/contract, lance-6.0.0 API correctness, idiom/clippy — in .claude/board/SAVANT_REVIEW.md. Key fix: commit conflicts now map to the retryable Error::TransactionConflict via From<lance::Error> (+ TooMuchWriteContention arm), restoring CAS-on-commit retry.
Gate: cargo clippy -p surrealdb-core --features kv-lance --tests -- -D warnings → clean. (Heads-up: stock cargo make ci-clippy omits kv-lance and would cfg-strip this code — the feature-gated command is the real gate.)
Tests: cargo test -p surrealdb-core --features kv-lance --lib kvs::lance → 67 passed, 0 failed (get/set/del/CAS/scan/keys/savepoints/versioning/OCC + HashMap differential + timeline).

Notes / follow-ups

Unversioned reads observe lance @ latest (read-committed) — intentional alignment with native lance-graph behaviour, documented in-code.
Reserved/vestigial helper items (unwired timeline surface, schema builders superseded by the *_lance variants, flusher-era cnf knobs) carry #[allow(dead_code)]; pruning is a follow-up.

🤖 https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R

Generated by Claude Code

Summary by CodeRabbit

Release Notes

New Features
- Implemented native Lance backend with per-commit sequence tracking for improved transaction semantics.
Refactor
- Simplified kv-lance configuration by removing legacy write-path variants and consolidating commit handling.
Documentation
- Added comprehensive design and specification documentation for the Lance backend architecture.

The .rustfmt.toml enabled nightly-only options (wrap_comments, imports_granularity, group_imports, comment_width) while the build toolchain is pinned stable 1.95 and the fmt job that would apply them (pinned rust-toolchain.nightly via ci.yml) is not running — GitHub Actions is not enabled on this fork (PR #29 head has zero check runs). So those options were enforced by nothing yet made stable `cargo fmt` warn and reformat divergently. Comment them out so stable `cargo fmt` (the 1.95 pin, matching org policy of 99% stable / nightly only for Miri) is the single authoritative formatter. Re-enable as a block if a nightly-fmt CI job is ever turned on and the tree is normalized under it in one dedicated pass. Findings (no CI on the fork; rustfmt split-brain) recorded in .claude/board/EPIPHANIES.md. https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R

… + SoA roadmap Design doc for aligning SurrealDB writes with the Lance columnar/versioned store: the convergent ingest→migrate→compact pattern and its 5 invariants; a pattern→code mapping (WAL+memtable+flusher == RocksDB; CommitGate == write-group leader; merge_insert == SST/part build; tombstone rows == ClickHouse _row_exists mask / Lance deletion vectors; Lance version == seqno/manifest; background_optimizer == compaction); the ACID story pinned property-by-property to its enforcement point; ClickHouse parity and where it breaks down; the per-row seq keystone (decoupling replay fidelity from physical batching); the SoA/gridlake transcode-free flush; compaction GC; and a status-tagged phased roadmap. Forward-looking items carry explicit CONJECTURE/ROADMAP markers so design is never confused with as-built. https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R

… pending] Checkpoint of agent-authored Phase-1 code: ClickHouse-parity adaptive flush triggers (rows | bytes | tick) with a rate floor so the flusher cannot out-run compaction. - FlusherConfig += { max_pending_bytes (8 MiB), min_flush_interval (50ms) } - Memtable::pending_bytes() (sums key+val; tombstone counts key only) - flusher_loop trigger extracted into a pure `should_flush`, gated by a web_time::Instant rate floor - tests: pending_bytes_*, should_flush_* , config defaults Agent-reported green (91 kvs::lance tests pass) but NOT yet independently verified: this is the "unverified code" stage of the review pipeline. Savant review + orchestrator cargo-verification + marker removal happen before the PR is opened. DO NOT MERGE as-is. https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R

…, per-row seq column P1 ClickHouse-parity adaptive batching: FlusherConfig += {max_pending_bytes (8 MiB), min_flush_interval (50 ms)}; Memtable::pending_bytes(); the flusher triggers on rows | bytes | periodic-tick, gated by a web_time::Instant rate floor (shutdown final-drain bypasses it); trigger logic extracted into a pure should_flush() for testability. P2 WAL/ACID: lsm_recovery_atomic_multi_op_batch proves a single multi-op transaction (2 inserts + 1 delete = one WalRecord) replays all-or-nothing across a Box::leak crash simulation + reopen. P3 per-row commit-sequence column: seq:UInt64 added to the Lance schema, to build_write_batch_lance and build_tombstone_batch_lance (which now take a parallel &[u64] seqs slice); a Datastore-level commit_seq AtomicU64 is minted once per transaction and stamped per row through the memtable + flusher; WAL replay reassigns fresh seqs (on-disk format unchanged); the legacy CommitGate stamps seq = version. This decouples per-commit replay identity from the batch-granular `version` column. Verification: orchestrator independently ran `cargo test -p surrealdb-core --features kv-lance --lib kvs::lance` => 96 passed, 0 failed, 3 ignored (confirms the build agent's result). Multi-savant logic review (concurrency/ACID, Lance data-model, Rust idiom & tests) is in progress; any findings land as follow-up fix commits before this branch is proposed for merge. Additive only; no new dependencies; stable Rust; no WAL on-disk format change. https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R

…d max (BLOCKER) A 3-savant read-only review of the step-2 code (which passed 96/0 tests) caught a BLOCKER that single-session tests can't see: `commit_seq` was re-initialised to 0 on every `Datastore::new` and never advanced past the max `seq` already persisted in Lance, so after a restart (WAL truncated) new commits re-minted seqs that collided with / regressed below flushed rows — defeating the `seq` column's purpose as a cross-restart monotonic per-commit replay axis. `generation` is memtable-local and unpersisted (so it need only clear the WAL tail); `seq` is a Lance column and must seed from the persisted max. Fixes applied: - BLOCKER: add `Datastore::max_persisted_seq` (tolerant `seq`-column scan; 0 for empty/legacy datasets) and seed `commit_seq` from it at open; replayed WAL records mint above that floor. - Upgrade the `debug_assert_eq!` seqs/rows length guards in both batch builders to real `ArrowError` checks (enforced in release too). - Add a `flusher_tick_interval` LanceConfig knob and use it to make the seq-coalescing test deterministic (was timing-fragile on slow disks). - NITs: align the re-loop nudge to `>=`; correct the flusher floor-gating module doc; refresh the stale `KvSchema::build_write_batch` doc. - Tests: +`seq_survives_restart_above_persisted_max` (fails on the old code, proves the fix), +`seq_column_gate_path_equals_version`, +`scan_versions_for_tests` helper. Documented (accepted, pre-release) limitations the savants surfaced are recorded in .claude/board/EPIPHANIES.md (pre-`seq` schema migration; WAL reseq on replay; gate-path seq=version; seq/generation two-atomic ordering). Verified by the orchestrator (sole cargo runner): `cargo check --features kv-lance --tests` clean; `cargo test --features kv-lance --lib kvs::lance` → 98 passed, 0 failed, 3 ignored. https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R

) `max_persisted_seq` treated a dataset whose schema lacks the `seq` column as success (returning 0), but the write paths now always build 5-column batches, so the first flush/merge against such a 4-column dataset would hit an opaque Lance schema mismatch. Return a clear migration error instead — exactly the "fail with a clear migration error before allowing commits" Codex suggested. A fresh dataset created by this code always carries the 5-column schema, so this only fires for genuinely legacy (pre-release) data; all 4 seq tests pass. Also records the Cognitive-RISC substrate↔invariant mapping (and the N1 trap: do not add class_id to the kv-lance schema — it stays policy-free) in .claude/board/EPIPHANIES.md. https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R

Fresh-session audit of claude/sleepy-cori-aRK2x (HEAD 55fc45c) against the GRIDLAKE §8 roadmap: adaptive batching (max_pending_bytes + min_flush_interval rate floor) and the per-row seq column are both shipped + tested, while §8 still tags them ROADMAP/IN PROGRESS. Recorded as an append-only EPIPHANIES entry (tee -a) citing the flusher.rs/schema.rs/mod.rs line evidence; the GRIDLAKE doc body is left unmutated per the append-only discipline. https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R

Introduces the opt-in WritePath::LsmColumnar variant, wired exhaustively through every write_path dispatch in mod.rs — commit routing, get's snapshot selection, and scan_impl's snapshot selection (or-patterns with LsmWithWal), plus the two read-path == LsmWithWal checks. The variant currently aliases the proven LsmWithWal hot path (WAL fsync -> memtable -> async flush); the single-pass columnar flush builder lands next behind this stable seam, keeping the default row path untouched until benchmarked at parity. Adds writepath_lsm_columnar_smoke (set / cross-tx get / overwrite / delete via the public Transactable surface). Verified: cargo test -p surrealdb-core --features kv-lance --lib kvs::lance -> 99 passed; 0 failed; 3 ignored. https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R

https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R

…step 2) On the LsmColumnar path the background flush now builds the Lance merge-insert source in one up-front-sized Arrow columnar pass over the memtable snapshot (build_columnar_merge_batch) — a single fused batch carrying live rows (tombstone=false) and tombstone rows (tombstone=true, empty val) in the shared [key,val,version,tombstone,seq] schema — instead of partitioning into row vecs and concatenating two batches. Trims the section 6.1 transpose tax; emits identical rows + schema, so the resulting Lance state matches the row path. - FlusherConfig gains a columnar flag, set from write_path == LsmColumnar at spawn; do_flush branches on it (row path unchanged, the default). - The MergeInsertBuilder execution is extracted into execute_merge, shared by the row path (single_lance_commit) and the columnar path. - The memtable + WAL stay row-oriented; a fully native-Arrow memtable/WAL (GRIDLAKE section 6.2 SoA, CONJECTURE) remains future work. Verified: cargo test -p surrealdb-core --features kv-lance --lib kvs::lance -> 100 passed; 0 failed; 3 ignored. New writepath_lsm_columnar_flush_persists forces a memtable->Lance flush via the columnar builder and reads it back after reopen (live rows present, tombstoned key gone). https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R

https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R

…d fns clippy (rust 1.95) flagged 3 default-on lints in get()/scan_impl() — the read-path functions Phase 3 edited for the LsmColumnar variant. All 3 predate this work (confirmed against 348bb4d) but sit in code just modified, so cleared with clippy's verbatim suggestions: - get(): collapse the nested if-let into a let-chain (stable on 1.95) - scan_impl(): two match { Ok(s)=>Some(s), Err(_)=>None } -> .ok() Behaviour-identical. Verified: cargo test -p surrealdb-core --features kv-lance --lib kvs::lance -> 100 passed; 0 failed; 3 ignored. The 6 remaining clippy warnings are unwired TimelineView dead-code from a prior session (intentional pending its consumer), out of scope here. https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R

https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R

Per-file implementation specs for the native rewrite (commit via lance MergeInsert, reads via checkout_version+scan, lance optimize for GC — same path lance-graph uses). One spec per writing agent: mod.rs, config.rs, tests.rs. Code files are being written by background agents and will be committed once they complete + savant review + clippy. https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R

…gent 2) Part of the native rewrite: LanceConfig drops the WritePath/flusher knobs (compaction/GC delegated to lance optimize/cleanup_old_versions). Now just { versioned: bool }; from_params returns default (other knobs live in kvs/lance/cnf.rs). Crate intentionally won't compile until mod.rs is rewritten to match (clippy is the end-gate). Carries one // ///REVIEW anchor, stripped before the PR. https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R

https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R

…SM (agent 1) commit() = one lance MergeInsert (one commit = one version); get/scan read pending -> lance checkout_version/scan (memtable branch deleted). Datastore/ Transaction slimmed to {dataset, versioned, background_optimizer, commit_seq}. Removed memtable.rs/wal.rs/flusher.rs/commit_gate.rs + their decls + WritePath. Crate non-compiling until tests.rs/integration_tests reconciled (clippy is the end-gate). 4 // ///REVIEW anchors pending (incl. lance-conflict -> Retryable). https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R

…integration_tests tests.rs: 57 Transactable-contract tests against the native single-path backend; all LSM/WritePath/seq/WAL/commit_gate tests deleted; every LanceConfig literal is { versioned } only. integration_tests.rs (LSM-era, referenced removed modules) deleted + its mod decl dropped from mod.rs. Native source is now coherent; 8 // ///REVIEW anchors remain for the fix pass (stripped before PR). clippy is the end-gate. https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R

…ce native rewrite https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R

…vant fix pass) - execute_merge: .map_err(Error::from) so lance RetryableCommitConflict/ CommitConflict/IncompatibleTransaction map to retryable Error::TransactionConflict (was opaque Error::Datastore, defeating CAS-on-commit retry) -- savant BLOCKER. - err.rs From<lance::Error>: + TooMuchWriteContention => TransactionConflict. - stripped all // ///REVIEW sentinels; anchors resolved per SAVANT_REVIEW.md (keep read_version+1; timeline ==before+1; get@version sound). https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R

…test accessors clippy -D warnings (the real gate: -p surrealdb-core --features kv-lance --tests) flagged dead code: TimelineView (+version/get/scan) is an unwired read surface for a future consumer; dataset_for_tests/scan_versions_for_tests/scan_seqs_for_tests are #[cfg(test)] accessors whose callers were removed with the LSM/seq tests. Both are intentional/test-only, so #[allow(dead_code)] (allow_attributes is permitted). https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R

…clippy gate) Native rewrite leaves intentional dead items the lib target flags under -D warnings: the unwired timeline read surface (timeline.rs module + Datastore::timeline()), schema.rs builders superseded by the *_lance variants + tombstone-row deletes (arrow_schema/build_write_batch/ build_tombstone_batch/build_delete_predicate), and cnf.rs flusher-era knobs. Module-level allow(dead_code) on the helper modules; prune as a follow-up. Used builders (build_get/range_predicate) unaffected. https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R

coderabbitai · 2026-05-30T18:19:47Z

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

This PR refactors the kv-lance backend from a hand-rolled LSM-style architecture (commit gate, WAL, memtable, flusher) to Lance's native read/write API. A new per-commit monotonic seq: UInt64 column is added to decouple commit sequencing from batch granularity. Configuration is simplified, all LSM modules are removed, and reads/writes now flow directly through Lance versioning.

Changes

Lance native path rewrite

Layer / File(s)	Summary
Design documentation, specs, and review boards `.claude/board/`, `.claude/lance-backend/`, `.rustfmt.toml`	Comprehensive GRIDLAKE audit, module-level implementation specs, SAVANT reviews identifying one BLOCKER (OCC conflict mapping), and rustfmt configuration updated to stable-only settings; captures architecture decisions, roadmap phases, and identified test coverage gaps.
Configuration API simplification `surrealdb/core/src/kvs/config.rs`	Remove `WritePath` enum and flusher-related fields; reduce `LanceConfig` to only the `versioned` flag and delegate compaction/GC to Lance's `optimize`/`cleanup_old_versions`.
Schema update: add seq column and batch builders `surrealdb/core/src/kvs/lance/schema.rs`	Add `seq: UInt64` column to Arrow schema; update `build_write_batch` and `build_tombstone_batch` to construct and populate seq arrays, defaulting `seq = version` in these helpers.
Remove LSM and commit-gate components `surrealdb/core/src/kvs/lance/commit_gate.rs`, `flusher.rs`, `memtable.rs`, `wal.rs`, `integration_tests.rs`	Delete entire LSM write pipeline modules (commit gate batching, background flushing, WAL durability, in-memory buffering) and integration smoke tests; no longer needed with Lance native operations.
Native Lance datastore and transaction implementation `surrealdb/core/src/kvs/lance/mod.rs`	Refactor `Datastore` and `Transaction` structs: remove LSM fields, add `commit_seq: Arc<AtomicU64>` for monotonic sequencing; seed `commit_seq` from persisted max seq at startup; rewrite `Transaction::commit` to partition pending writes/deletes, mint one seq value, build write+tombstone batches with aligned seq, and execute via single native `MergeInsertBuilder::execute_reader` call; add timeline and test-only scanning helpers.
Read paths (get, scan) and tracing instrumentation `surrealdb/core/src/kvs/lance/mod.rs`	Simplify `get` and `scan_impl` to check pending buffer first, then read from Lance using `checkout_version(v)` (versioned) or latest (unversioned); remove memtable overlay; add `#[instrument(...)]` tracing to transaction mutation/range/savepoint APIs.
Error handling and dead-code linting `surrealdb/core/src/kvs/err.rs`, `surrealdb/core/src/kvs/lance/cnf.rs`, `timeline.rs`	Map `lance::Error::TooMuchWriteContention` to `Error::TransactionConflict` for proper retry handling; add `#![allow(dead_code)]` to suppress warnings on vestigial helpers after native rewrite.

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

AdaWorldAPI/surrealdb#29: Builds on this PR's native write+delete-in-one-version model by adding timeline view and regression tests around per-commit seq ordering.

Poem

🐰 The memtable hops away, the WAL fades to gray,
Lance native paths now lead the way—one commit, one version, one day.
With seq in every row, monotonic and true,
The rabbit's rewrite is done; check that BLOCKER through! 🚀

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: replacing the hand-rolled LSM stack with Lance's native read/write backend, directly matching the PR's primary objective.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch claude/sleepy-cori-aRK2x

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…AriGraph witness, ractor mailbox/Rubicon kanban) Located the existing plans (lance-graph .claude/specs pr-ce64-mb-* series + ndarray 3DGS-4x4 SoA + shipped AriGraph) and how they sit on the just-merged kv-lance/timeline substrate (#31). New .claude/board/INTEGRATION_PLANS.md + EPIPHANIES FINDING. https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R

claude added 22 commits May 30, 2026 07:36

docs(board): log Phase 3 step-1 (LsmColumnar seam, 00f0e12)

8dd4b3f

https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R

docs(board): log Phase 3 step-2 (columnar flush, d9bfca7)

55f5ab6

https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R

docs(board): log Phase 3 clippy hygiene

d737168

https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R

docs(board): agent-2 completion log entry

ef0edfc

https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R

docs(board): savant review (A/B/C) + consolidated fix plan for kv-lan…

9bcd8e6

…ce native rewrite https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R

AdaWorldAPI merged commit 1686c57 into main May 30, 2026
1 check was pending

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv-lance: native lance read/write backend (drop hand-rolled LSM)#31

kv-lance: native lance read/write backend (drop hand-rolled LSM)#31
AdaWorldAPI merged 22 commits into
mainfrom
claude/sleepy-cori-aRK2x

AdaWorldAPI commented May 30, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 30, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Possibly related PRs

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AdaWorldAPI commented May 30, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Removed (the reinvention)

Kept

Built & verified

Notes / follow-ups

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Possibly related PRs

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AdaWorldAPI commented May 30, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 30, 2026 •

edited

Loading