feat(kvs-lance): read-only Timeline view over Lance version history (Rubicon step 1) by AdaWorldAPI · Pull Request #29 · AdaWorldAPI/surrealdb

AdaWorldAPI · 2026-05-30T06:15:42Z

What & why

First plateau of the SoA / SurrealDB-as-view / ractor arc. Adds the read-only time-series view into the Lance-backed SoA — the concrete surface for the Rubicon ruling: lance-graph is the leading store; SurrealDB is one VIEW over it, never a store.

Per-key time-travel already existed (checkout_version + tombstone-as-data in get()/scan_impl()). This adds the missing timeline enumeration + a read-only view handle, so a downstream kanban/replay consumer can walk the version history and materialise the SoA as it stood at each commit — with no write path back into Lance.

Changes (additive only — no existing signature touched)

File	What
`kvs/lance/timeline.rs` (new, 264 LOC)	`Timeline` (read-only view over the dataset version history), `TimelineView` (immutable snapshot pinned to one version, tombstone-aware `get`/`scan`), `VersionInfo { version: u64, timestamp_us: Option<i64> }`
`kvs/lance/mod.rs`	`Datastore::timeline()` accessor — shares the live dataset handle, no second open
`kvs/mvcc_source.rs` (new, 170 LOC)	`MvccSource` trait + `LocalGeneratedMvcc`, recovered verbatim from the reverted PR #24 (`2a54a32`); additive, `dead_code`-gated until its consumer lands
`kvs/lance/tests.rs`	2 integration tests (below)
`kvs/mod.rs`	register `mvcc_source` module

Correctness guardrails

Read-only by construction. TimelineView owns a checked-out snapshot and exposes no set/del/commit. "SurrealDB never mutates the leading store" is a type-system guarantee, not a convention.
Confirmed API only. Every Lance call was verified against fetched lance-6.0.0 source — Dataset::versions() -> Vec<Version{version:u64, timestamp:DateTime<Utc>, ..}> (dataset.rs:202/2000), checkout_version, version().version, Scanner::project/filter/limit — and cross-checked against in-org usage in lance-graph/.../graph/versioned.rs. No guessed signatures.
Additive shape mirrors PR feat(integration): surrealdb-ractor + cf::stream + kvs::mvcc_source (Sprint 0/1) #24's discipline: new modules + mod lines only.

⚠️ Design finding surfaced by this work (important for step 2)

Timeline granularity is write-path-dependent:

WritePath::LsmWithWal (default): commits batch through WAL+memtable; the background flusher migrates them into Lance asynchronously → the timeline reflects flush boundaries, not individual commits.
WritePath::LegacyCommitGate: commit() returns only after its own Lance commit lands → 1 commit = 1 version.

A per-commit Rubicon kanban (each commit/plan/prune a distinct entry) must run on the gate path (or force a flush). Recorded in .claude/board/EPIPHANIES.md.

Verification

cargo check -p surrealdb-core --features kv-lance → Finished, 0 errors
cargo test -p surrealdb-core --features kv-lance --lib kvs::lance::tests::test_timeline → 2 passed; 0 failed
- test_timeline_versions_grow_with_commits — history grows + is monotone; tail == current_version
- test_timeline_view_reads_historical_soa — key present in the view at its write version, absent in the view before it

Remaining warnings are all never used on Timeline/TimelineView/timeline() — expected, because the consumer (the ractor/kanban side) is the next step; they clear the moment step 2 calls Datastore::timeline().

Scope / non-goals

Step 1 only: the read-only view. Not in this PR: the ractor mailbox that owns the SoA and publishes onto the timeline, EpisodicWitness64, BindSpace replacement, the cognitive-shader-driver wiring. Those land next on claude/sleepy-cori-aRK2x.

https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R

Generated by Claude Code

Summary by CodeRabbit

New Features
- Read-only Lance time-travel: enumerate dataset versions and open pinned timeline views to query historical snapshots (point reads and scans).
Bug Fixes
- Mixed write+delete commits now produce one atomic dataset version via tombstone batching, avoiding intermediate-visible states.
Tests
- Added integration and regression tests validating timeline behavior, version growth, historical reads, and atomic write+delete commits.
Chores
- Removed obsolete delete-via-tombstone config flag.

…Rubicon step 1) Adds the "SurrealDB-as-view-over-Lance" surface: a time-series view into the SoA backed entirely by Lance 6.0.0's native dataset versioning. No new storage — the Lance dataset already IS the source of truth (the Rubicon ruling: Lance leads, SurrealDB is one read-only view, never a store). New: - kvs/lance/timeline.rs — `Timeline` enumerates the dataset version history (`Dataset::versions()` -> Vec<Version>) and hands out `TimelineView`s, each an immutable snapshot pinned via `checkout_version(u64)`. `VersionInfo` carries {version:u64, timestamp_us}. Reads are tombstone-aware (a delete at/before the pinned version reads back absent). Read-only BY CONSTRUCTION: the view owns a checked-out snapshot and exposes no set/del/commit, so SurrealDB structurally cannot mutate the leading store. - kvs/lance/mod.rs — `Datastore::timeline()` shares the live dataset handle (no second open). - kvs/mvcc_source.rs — `MvccSource` trait + `LocalGeneratedMvcc`, recovered verbatim from the reverted PR #24 (additive, dead_code-gated until its consumer lands). - kvs/lance/tests.rs — 2 timeline tests (versions grow + monotone with commits; a historical view reads the SoA as it stood: present at the write version, absent before it). Every Lance call is confirmed against fetched lance-6.0.0 source (Version struct dataset.rs:202; versions() dataset.rs:2000; Scanner project/filter/limit) and against in-org usage in lance-graph graph/versioned.rs. Additive only — no existing signature changes. Verified: `cargo check -p surrealdb-core --features kv-lance` → Finished, 0 errors. Remaining warnings are never-used on the not-yet-wired consumer side (resolve when the ractor/kanban consumer calls `timeline()` in the next step). https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R

…rsions The 2026-05-29 timeline tests assumed 1 commit = 1 Lance dataset version, but that only holds on WritePath::LegacyCommitGate. On the default LsmWithWal path, commits batch through WAL+memtable and the background flusher migrates them into Lance asynchronously, so the version timeline reflects flush boundaries, not individual commits (observed: 2 commits -> 1 version; single commit left latest_version unchanged). Both timeline tests now construct LegacyCommitGate configs, where commit() returns only after its own Lance commit lands. Result: 2 passed; 0 failed. The timeline.rs code was correct — only the test harness used the wrong write-path. Design note recorded in EPIPHANIES: the ractor/kanban consumer that publishes onto the timeline must run on the gate path (or force a flush) to get one timeline entry per Rubicon commit. https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R

coderabbitai · 2026-05-30T06:15:54Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 2ff03dc3-4316-4220-9507-b8ea5fd1f17c

📥 Commits

Reviewing files that changed from the base of the PR and between 5997eea and 38f8df0.

📒 Files selected for processing (1)

surrealdb/core/src/kvs/mvcc_source.rs

📝 Walkthrough

Walkthrough

Adds a kvs-lance read-only timeline: MVCC version source, Timeline/TimelineView types (version enumeration, pinned snapshots, tombstone-aware get/scan), datastore wiring and re-exports, tombstone batching so writes+deletes commit in one Lance version, tests, and session notes.

Changes

Lance Timeline Versioning

Layer / File(s)	Summary
MVCC version source trait and implementation `surrealdb/core/src/kvs/mod.rs`, `surrealdb/core/src/kvs/mvcc_source.rs`	`MvccSource` trait defines async `next_version()`; `LocalGeneratedMvcc` is a process-local `AtomicU64` counter starting at 1 with unit tests for monotonicity and instance independence.
Timeline types and version/snapshot read API `surrealdb/core/src/kvs/lance/timeline.rs`	Adds `VersionInfo`, `Timeline` (versions(), latest_version(), view_at/view_latest) and `TimelineView` (version(), get(), scan()) that return tombstone-filtered point and range reads from Lance snapshots.
Datastore wiring and tombstone batch builder `surrealdb/core/src/kvs/lance/mod.rs`	Re-exports `Timeline`, `TimelineView`, `VersionInfo`; adds `Datastore::timeline()` and `build_tombstone_batch_lance` helper creating tombstone Arrow RecordBatches (`tombstone=true`, empty `val`).
Commit and flusher: single-merge write+delete via tombstones `surrealdb/core/src/kvs/lance/commit_gate.rs`, `surrealdb/core/src/kvs/lance/flusher.rs`	`single_lance_commit` and flusher now fold deletes into tombstone rows and perform one `MergeInsertBuilder::execute_reader` over combined write+tombstone batches, removing the previous `Dataset::delete` path and `KvSchema` predicate usage.
Lance config cleanup `surrealdb/core/src/kvs/config.rs`	Removes `delete_via_tombstone_row` from `LanceConfig` and its default initializer.
Timeline integration and regression tests `surrealdb/core/src/kvs/lance/tests.rs`	Adds `test_timeline_versions_grow_with_commits`, `test_timeline_view_reads_historical_soa`, and `test_timeline_write_delete_commit_is_single_atomic_version` (regression asserting mixed write+delete yields a single Lance version).
Session documentation `.claude/board/AGENT_LOG.md`, `.claude/board/EPIPHANIES.md`	Appends AGENT_LOG and EPIPHANIES entries describing the timeline work, write-path granularity findings, the tombstone-based write+delete fix, and test/check results.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I nibble through snapshots, one by one,

Tombstones tucked where deletes are done,
Versions grow and histories sing,
Time-travel reads make soft hearts spring,
Hoppity hops — the timeline's fun!

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title accurately describes the main change: introducing a read-only Timeline view over Lance version history as part of the Rubicon initiative (step 1).
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch claude/kvs-lance-timeline

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9205453720

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@surrealdb/core/src/kvs/lance/timeline.rs`:
- Around line 87-139: Public async methods latest_version, versions, view_at,
view_latest (and TimelineView::get/scan) are missing tracing instrumentation;
add the #[instrument(...)] attribute to each public async fn so traces include
inputs and results. For each function add #[instrument(skip(self), name =
"<module>::<fn_name>", err)] (or at least #[instrument(skip(self))] if you don't
want to record self), and add #[instrument(skip(self, snapshot))] as appropriate
for TimelineView methods; also ensure tracing::instrument is imported. Use the
exact function names (latest_version, versions, view_at, view_latest, get, scan)
to locate and annotate the methods.

In `@surrealdb/core/src/kvs/mvcc_source.rs`:
- Around line 45-66: The trait MvccSource is currently non-object-safe because
next_version returns an impl Future; make next_version object-safe by changing
its signature to return a boxed future (e.g., Pin<Box<dyn Future<Output =
anyhow::Result<u64>> + Send>> or use
futures::future::BoxFuture<anyhow::Result<u64>>), update all implementors
(functions/structs that implement MvccSource::next_version) to return the boxed
future, and keep the doc comment about Arc<dyn MvccSource> intact (or remove
that doc claim if you prefer not to support trait objects).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 357d85b4-7107-451a-af82-e84506b32131

📥 Commits

Reviewing files that changed from the base of the PR and between 08ce3d4 and 9205453.

📒 Files selected for processing (7)

.claude/board/AGENT_LOG.md
.claude/board/EPIPHANIES.md
surrealdb/core/src/kvs/lance/mod.rs
surrealdb/core/src/kvs/lance/tests.rs
surrealdb/core/src/kvs/lance/timeline.rs
surrealdb/core/src/kvs/mod.rs
surrealdb/core/src/kvs/mvcc_source.rs

A batch carrying both writes and deletes applied writes via MergeInsertBuilder::execute_reader and then deletes via a separate Dataset::delete, producing TWO native Lance versions. The intermediate (writes-applied, delete-pending) version was hidden from live readers by the datastore write lock, but Timeline::versions() surfaced it, so a replayer's view_at() could materialize a torn state that was never an atomic SurrealDB commit (codex P1 on PR #29). Fold deletes into tombstone rows (tombstone=true) streamed into the SAME merge_insert, so a commit/flush is exactly one Lance version. The read path already filters `tombstone = false`, so get/scan/keys are unchanged; the physical Dataset::delete path is removed. Add Transaction::build_tombstone_batch_lance (mirrors build_write_batch_lance) and retire the never-read delete_via_tombstone_row config flag. Regression: test_timeline_write_delete_commit_is_single_atomic_version asserts a mixed write+delete commit adds exactly one version and the resulting snapshot reflects all ops atomically. https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R

coderabbitai

🧹 Nitpick comments (2)

surrealdb/core/src/kvs/lance/commit_gate.rs (1)

389-393: 💤 Low value

Minor inefficiency: unnecessary collect.

The RecordBatchIterator::new accepts any IntoIterator, so the intermediate .collect::<Vec<_>>() allocates an extra vector that isn't needed.

♻️ Suggested simplification

 	let schema_ref = batches[0].schema();
 	let reader = arrow_array::RecordBatchIterator::new(
-		batches.into_iter().map(Ok::<_, arrow_schema::ArrowError>).collect::<Vec<_>>(),
+		batches.into_iter().map(Ok::<_, arrow_schema::ArrowError>),
 		schema_ref,
 	);

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@surrealdb/core/src/kvs/lance/commit_gate.rs` around lines 389 - 393, The code
unnecessarily collects into a Vec before constructing the RecordBatchIterator;
change the RecordBatchIterator::new call in commit_gate.rs to take the iterator
directly by passing batches.into_iter().map(Ok::<_, arrow_schema::ArrowError>)
(remove the .collect::<Vec<_>>() step) while keeping schema_ref as before so the
iterator type matches RecordBatchIterator::new.

surrealdb/core/src/kvs/lance/flusher.rs (1)

288-292: 💤 Low value

Same minor inefficiency as commit_gate.rs: unnecessary collect.

♻️ Suggested simplification

 	let schema_ref = batches[0].schema();
 	let reader = arrow_array::RecordBatchIterator::new(
-		batches.into_iter().map(Ok::<_, arrow_schema::ArrowError>).collect::<Vec<_>>(),
+		batches.into_iter().map(Ok::<_, arrow_schema::ArrowError>),
 		schema_ref,
 	);

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@surrealdb/core/src/kvs/lance/flusher.rs` around lines 288 - 292, The code
builds a Vec unnecessarily before creating the RecordBatchIterator; instead of
collecting the mapped iterator, pass the mapped iterator directly to
arrow_array::RecordBatchIterator::new using the existing batches iterator and
schema_ref (reference the variables/function names batches, schema_ref, and
arrow_array::RecordBatchIterator::new) so remove the .collect::<Vec<_>>() step
and feed the mapped iterator (batches.into_iter().map(Ok::<_,
arrow_schema::ArrowError>)) directly to the constructor.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@surrealdb/core/src/kvs/lance/commit_gate.rs`:
- Around line 389-393: The code unnecessarily collects into a Vec before
constructing the RecordBatchIterator; change the RecordBatchIterator::new call
in commit_gate.rs to take the iterator directly by passing
batches.into_iter().map(Ok::<_, arrow_schema::ArrowError>) (remove the
.collect::<Vec<_>>() step) while keeping schema_ref as before so the iterator
type matches RecordBatchIterator::new.

In `@surrealdb/core/src/kvs/lance/flusher.rs`:
- Around line 288-292: The code builds a Vec unnecessarily before creating the
RecordBatchIterator; instead of collecting the mapped iterator, pass the mapped
iterator directly to arrow_array::RecordBatchIterator::new using the existing
batches iterator and schema_ref (reference the variables/function names batches,
schema_ref, and arrow_array::RecordBatchIterator::new) so remove the
.collect::<Vec<_>>() step and feed the mapped iterator
(batches.into_iter().map(Ok::<_, arrow_schema::ArrowError>)) directly to the
constructor.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: deefbcbc-af5c-49bc-9ee3-8c48140039ea

📥 Commits

Reviewing files that changed from the base of the PR and between 9205453 and 5997eea.

📒 Files selected for processing (7)

.claude/board/AGENT_LOG.md
.claude/board/EPIPHANIES.md
surrealdb/core/src/kvs/config.rs
surrealdb/core/src/kvs/lance/commit_gate.rs
surrealdb/core/src/kvs/lance/flusher.rs
surrealdb/core/src/kvs/lance/mod.rs
surrealdb/core/src/kvs/lance/tests.rs

💤 Files with no reviewable changes (1)

surrealdb/core/src/kvs/config.rs

✅ Files skipped from review due to trivial changes (2)

.claude/board/AGENT_LOG.md
.claude/board/EPIPHANIES.md

The .rustfmt.toml enabled nightly-only options (wrap_comments, imports_granularity, group_imports, comment_width) while the build toolchain is pinned stable 1.95 and the fmt job that would apply them (pinned rust-toolchain.nightly via ci.yml) is not running — GitHub Actions is not enabled on this fork (PR #29 head has zero check runs). So those options were enforced by nothing yet made stable `cargo fmt` warn and reformat divergently. Comment them out so stable `cargo fmt` (the 1.95 pin, matching org policy of 99% stable / nightly only for Miri) is the single authoritative formatter. Re-enable as a block if a nightly-fmt CI job is ever turned on and the tree is normalized under it in one dedicated pass. Findings (no CI on the fork; rustfmt split-brain) recorded in .claude/board/EPIPHANIES.md. https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R

`MvccSource::next_version` uses return-position `impl Trait` (RPITIT), which makes the trait non-object-safe, so the `Arc<dyn MvccSource>` the doc comment claimed will not compile. The trait is `#[allow(dead_code)]` (no consumer yet); rather than box the future (a per-call heap allocation for an unused trait-object capability), correct the doc to state zero-cost static dispatch and that a boxed-future variant can be added if runtime-pluggable sources are ever needed. Doc-only change. https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R

claude added 2 commits May 29, 2026 23:59

chatgpt-codex-connector Bot reviewed May 30, 2026

View reviewed changes

Comment thread surrealdb/core/src/kvs/lance/timeline.rs

coderabbitai Bot reviewed May 30, 2026

View reviewed changes

Comment thread surrealdb/core/src/kvs/lance/timeline.rs

Comment thread surrealdb/core/src/kvs/mvcc_source.rs

coderabbitai Bot reviewed May 30, 2026

View reviewed changes

AdaWorldAPI mentioned this pull request May 30, 2026

feat(kvs-lance): gridlake step-2 — adaptive batching, WAL/ACID recovery, per-row seq (savant-reviewed) #30

Merged

AdaWorldAPI merged commit 4eb33dd into main May 30, 2026
1 check passed

coderabbitai Bot mentioned this pull request May 30, 2026

kv-lance: native lance read/write backend (drop hand-rolled LSM) #31

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(kvs-lance): read-only Timeline view over Lance version history (Rubicon step 1)#29

feat(kvs-lance): read-only Timeline view over Lance version history (Rubicon step 1)#29
AdaWorldAPI merged 4 commits into
mainfrom
claude/kvs-lance-timeline

AdaWorldAPI commented May 30, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 30, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AdaWorldAPI commented May 30, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What & why

Changes (additive only — no existing signature touched)

Correctness guardrails

⚠️ Design finding surfaced by this work (important for step 2)

Verification

Scope / non-goals

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AdaWorldAPI commented May 30, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 30, 2026 •

edited

Loading