Skip to content

feat(kvs-lance): read-only Timeline view over Lance version history (Rubicon step 1)#29

Merged
AdaWorldAPI merged 4 commits into
mainfrom
claude/kvs-lance-timeline
May 30, 2026
Merged

feat(kvs-lance): read-only Timeline view over Lance version history (Rubicon step 1)#29
AdaWorldAPI merged 4 commits into
mainfrom
claude/kvs-lance-timeline

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

@AdaWorldAPI AdaWorldAPI commented May 30, 2026

What & why

First plateau of the SoA / SurrealDB-as-view / ractor arc. Adds the read-only time-series view into the Lance-backed SoA — the concrete surface for the Rubicon ruling: lance-graph is the leading store; SurrealDB is one VIEW over it, never a store.

Per-key time-travel already existed (checkout_version + tombstone-as-data in get()/scan_impl()). This adds the missing timeline enumeration + a read-only view handle, so a downstream kanban/replay consumer can walk the version history and materialise the SoA as it stood at each commit — with no write path back into Lance.

Changes (additive only — no existing signature touched)

File What
kvs/lance/timeline.rs (new, 264 LOC) Timeline (read-only view over the dataset version history), TimelineView (immutable snapshot pinned to one version, tombstone-aware get/scan), VersionInfo { version: u64, timestamp_us: Option<i64> }
kvs/lance/mod.rs Datastore::timeline() accessor — shares the live dataset handle, no second open
kvs/mvcc_source.rs (new, 170 LOC) MvccSource trait + LocalGeneratedMvcc, recovered verbatim from the reverted PR #24 (2a54a32); additive, dead_code-gated until its consumer lands
kvs/lance/tests.rs 2 integration tests (below)
kvs/mod.rs register mvcc_source module

Correctness guardrails

  • Read-only by construction. TimelineView owns a checked-out snapshot and exposes no set/del/commit. "SurrealDB never mutates the leading store" is a type-system guarantee, not a convention.
  • Confirmed API only. Every Lance call was verified against fetched lance-6.0.0 source — Dataset::versions() -> Vec<Version{version:u64, timestamp:DateTime<Utc>, ..}> (dataset.rs:202/2000), checkout_version, version().version, Scanner::project/filter/limit — and cross-checked against in-org usage in lance-graph/.../graph/versioned.rs. No guessed signatures.
  • Additive shape mirrors PR feat(integration): surrealdb-ractor + cf::stream + kvs::mvcc_source (Sprint 0/1) #24's discipline: new modules + mod lines only.

⚠️ Design finding surfaced by this work (important for step 2)

Timeline granularity is write-path-dependent:

  • WritePath::LsmWithWal (default): commits batch through WAL+memtable; the background flusher migrates them into Lance asynchronously → the timeline reflects flush boundaries, not individual commits.
  • WritePath::LegacyCommitGate: commit() returns only after its own Lance commit lands → 1 commit = 1 version.

A per-commit Rubicon kanban (each commit/plan/prune a distinct entry) must run on the gate path (or force a flush). Recorded in .claude/board/EPIPHANIES.md.

Verification

  • cargo check -p surrealdb-core --features kv-lanceFinished, 0 errors
  • cargo test -p surrealdb-core --features kv-lance --lib kvs::lance::tests::test_timeline2 passed; 0 failed
    • test_timeline_versions_grow_with_commits — history grows + is monotone; tail == current_version
    • test_timeline_view_reads_historical_soa — key present in the view at its write version, absent in the view before it

Remaining warnings are all never used on Timeline/TimelineView/timeline() — expected, because the consumer (the ractor/kanban side) is the next step; they clear the moment step 2 calls Datastore::timeline().

Scope / non-goals

Step 1 only: the read-only view. Not in this PR: the ractor mailbox that owns the SoA and publishes onto the timeline, EpisodicWitness64, BindSpace replacement, the cognitive-shader-driver wiring. Those land next on claude/sleepy-cori-aRK2x.

https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R


Generated by Claude Code

Summary by CodeRabbit

  • New Features
    • Read-only Lance time-travel: enumerate dataset versions and open pinned timeline views to query historical snapshots (point reads and scans).
  • Bug Fixes
    • Mixed write+delete commits now produce one atomic dataset version via tombstone batching, avoiding intermediate-visible states.
  • Tests
    • Added integration and regression tests validating timeline behavior, version growth, historical reads, and atomic write+delete commits.
  • Chores
    • Removed obsolete delete-via-tombstone config flag.

Review Change Stack

claude added 2 commits May 29, 2026 23:59
…Rubicon step 1)

Adds the "SurrealDB-as-view-over-Lance" surface: a time-series view into the
SoA backed entirely by Lance 6.0.0's native dataset versioning. No new storage
— the Lance dataset already IS the source of truth (the Rubicon ruling: Lance
leads, SurrealDB is one read-only view, never a store).

New:
- kvs/lance/timeline.rs — `Timeline` enumerates the dataset version history
  (`Dataset::versions()` -> Vec<Version>) and hands out `TimelineView`s, each an
  immutable snapshot pinned via `checkout_version(u64)`. `VersionInfo` carries
  {version:u64, timestamp_us}. Reads are tombstone-aware (a delete at/before the
  pinned version reads back absent). Read-only BY CONSTRUCTION: the view owns a
  checked-out snapshot and exposes no set/del/commit, so SurrealDB structurally
  cannot mutate the leading store.
- kvs/lance/mod.rs — `Datastore::timeline()` shares the live dataset handle
  (no second open).
- kvs/mvcc_source.rs — `MvccSource` trait + `LocalGeneratedMvcc`, recovered
  verbatim from the reverted PR #24 (additive, dead_code-gated until its
  consumer lands).
- kvs/lance/tests.rs — 2 timeline tests (versions grow + monotone with commits;
  a historical view reads the SoA as it stood: present at the write version,
  absent before it).

Every Lance call is confirmed against fetched lance-6.0.0 source (Version
struct dataset.rs:202; versions() dataset.rs:2000; Scanner project/filter/limit)
and against in-org usage in lance-graph graph/versioned.rs. Additive only — no
existing signature changes.

Verified: `cargo check -p surrealdb-core --features kv-lance` → Finished, 0
errors. Remaining warnings are never-used on the not-yet-wired consumer side
(resolve when the ractor/kanban consumer calls `timeline()` in the next step).

https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R
…rsions

The 2026-05-29 timeline tests assumed 1 commit = 1 Lance dataset version, but
that only holds on WritePath::LegacyCommitGate. On the default LsmWithWal path,
commits batch through WAL+memtable and the background flusher migrates them into
Lance asynchronously, so the version timeline reflects flush boundaries, not
individual commits (observed: 2 commits -> 1 version; single commit left
latest_version unchanged).

Both timeline tests now construct LegacyCommitGate configs, where commit()
returns only after its own Lance commit lands. Result: 2 passed; 0 failed.

The timeline.rs code was correct — only the test harness used the wrong
write-path. Design note recorded in EPIPHANIES: the ractor/kanban consumer that
publishes onto the timeline must run on the gate path (or force a flush) to get
one timeline entry per Rubicon commit.

https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 30, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 2ff03dc3-4316-4220-9507-b8ea5fd1f17c

📥 Commits

Reviewing files that changed from the base of the PR and between 5997eea and 38f8df0.

📒 Files selected for processing (1)
  • surrealdb/core/src/kvs/mvcc_source.rs

📝 Walkthrough

Walkthrough

Adds a kvs-lance read-only timeline: MVCC version source, Timeline/TimelineView types (version enumeration, pinned snapshots, tombstone-aware get/scan), datastore wiring and re-exports, tombstone batching so writes+deletes commit in one Lance version, tests, and session notes.

Changes

Lance Timeline Versioning

Layer / File(s) Summary
MVCC version source trait and implementation
surrealdb/core/src/kvs/mod.rs, surrealdb/core/src/kvs/mvcc_source.rs
MvccSource trait defines async next_version(); LocalGeneratedMvcc is a process-local AtomicU64 counter starting at 1 with unit tests for monotonicity and instance independence.
Timeline types and version/snapshot read API
surrealdb/core/src/kvs/lance/timeline.rs
Adds VersionInfo, Timeline (versions(), latest_version(), view_at/view_latest) and TimelineView (version(), get(), scan()) that return tombstone-filtered point and range reads from Lance snapshots.
Datastore wiring and tombstone batch builder
surrealdb/core/src/kvs/lance/mod.rs
Re-exports Timeline, TimelineView, VersionInfo; adds Datastore::timeline() and build_tombstone_batch_lance helper creating tombstone Arrow RecordBatches (tombstone=true, empty val).
Commit and flusher: single-merge write+delete via tombstones
surrealdb/core/src/kvs/lance/commit_gate.rs, surrealdb/core/src/kvs/lance/flusher.rs
single_lance_commit and flusher now fold deletes into tombstone rows and perform one MergeInsertBuilder::execute_reader over combined write+tombstone batches, removing the previous Dataset::delete path and KvSchema predicate usage.
Lance config cleanup
surrealdb/core/src/kvs/config.rs
Removes delete_via_tombstone_row from LanceConfig and its default initializer.
Timeline integration and regression tests
surrealdb/core/src/kvs/lance/tests.rs
Adds test_timeline_versions_grow_with_commits, test_timeline_view_reads_historical_soa, and test_timeline_write_delete_commit_is_single_atomic_version (regression asserting mixed write+delete yields a single Lance version).
Session documentation
.claude/board/AGENT_LOG.md, .claude/board/EPIPHANIES.md
Appends AGENT_LOG and EPIPHANIES entries describing the timeline work, write-path granularity findings, the tombstone-based write+delete fix, and test/check results.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I nibble through snapshots, one by one,

Tombstones tucked where deletes are done,
Versions grow and histories sing,
Time-travel reads make soft hearts spring,
Hoppity hops — the timeline's fun!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately describes the main change: introducing a read-only Timeline view over Lance version history as part of the Rubicon initiative (step 1).
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/kvs-lance-timeline

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9205453720

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread surrealdb/core/src/kvs/lance/timeline.rs
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@surrealdb/core/src/kvs/lance/timeline.rs`:
- Around line 87-139: Public async methods latest_version, versions, view_at,
view_latest (and TimelineView::get/scan) are missing tracing instrumentation;
add the #[instrument(...)] attribute to each public async fn so traces include
inputs and results. For each function add #[instrument(skip(self), name =
"<module>::<fn_name>", err)] (or at least #[instrument(skip(self))] if you don't
want to record self), and add #[instrument(skip(self, snapshot))] as appropriate
for TimelineView methods; also ensure tracing::instrument is imported. Use the
exact function names (latest_version, versions, view_at, view_latest, get, scan)
to locate and annotate the methods.

In `@surrealdb/core/src/kvs/mvcc_source.rs`:
- Around line 45-66: The trait MvccSource is currently non-object-safe because
next_version returns an impl Future; make next_version object-safe by changing
its signature to return a boxed future (e.g., Pin<Box<dyn Future<Output =
anyhow::Result<u64>> + Send>> or use
futures::future::BoxFuture<anyhow::Result<u64>>), update all implementors
(functions/structs that implement MvccSource::next_version) to return the boxed
future, and keep the doc comment about Arc<dyn MvccSource> intact (or remove
that doc claim if you prefer not to support trait objects).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 357d85b4-7107-451a-af82-e84506b32131

📥 Commits

Reviewing files that changed from the base of the PR and between 08ce3d4 and 9205453.

📒 Files selected for processing (7)
  • .claude/board/AGENT_LOG.md
  • .claude/board/EPIPHANIES.md
  • surrealdb/core/src/kvs/lance/mod.rs
  • surrealdb/core/src/kvs/lance/tests.rs
  • surrealdb/core/src/kvs/lance/timeline.rs
  • surrealdb/core/src/kvs/mod.rs
  • surrealdb/core/src/kvs/mvcc_source.rs

Comment thread surrealdb/core/src/kvs/lance/timeline.rs
Comment thread surrealdb/core/src/kvs/mvcc_source.rs
A batch carrying both writes and deletes applied writes via
MergeInsertBuilder::execute_reader and then deletes via a separate
Dataset::delete, producing TWO native Lance versions. The intermediate
(writes-applied, delete-pending) version was hidden from live readers by
the datastore write lock, but Timeline::versions() surfaced it, so a
replayer's view_at() could materialize a torn state that was never an
atomic SurrealDB commit (codex P1 on PR #29).

Fold deletes into tombstone rows (tombstone=true) streamed into the SAME
merge_insert, so a commit/flush is exactly one Lance version. The read
path already filters `tombstone = false`, so get/scan/keys are unchanged;
the physical Dataset::delete path is removed. Add
Transaction::build_tombstone_batch_lance (mirrors build_write_batch_lance)
and retire the never-read delete_via_tombstone_row config flag.

Regression: test_timeline_write_delete_commit_is_single_atomic_version
asserts a mixed write+delete commit adds exactly one version and the
resulting snapshot reflects all ops atomically.

https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
surrealdb/core/src/kvs/lance/commit_gate.rs (1)

389-393: 💤 Low value

Minor inefficiency: unnecessary collect.

The RecordBatchIterator::new accepts any IntoIterator, so the intermediate .collect::<Vec<_>>() allocates an extra vector that isn't needed.

♻️ Suggested simplification
 	let schema_ref = batches[0].schema();
 	let reader = arrow_array::RecordBatchIterator::new(
-		batches.into_iter().map(Ok::<_, arrow_schema::ArrowError>).collect::<Vec<_>>(),
+		batches.into_iter().map(Ok::<_, arrow_schema::ArrowError>),
 		schema_ref,
 	);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@surrealdb/core/src/kvs/lance/commit_gate.rs` around lines 389 - 393, The code
unnecessarily collects into a Vec before constructing the RecordBatchIterator;
change the RecordBatchIterator::new call in commit_gate.rs to take the iterator
directly by passing batches.into_iter().map(Ok::<_, arrow_schema::ArrowError>)
(remove the .collect::<Vec<_>>() step) while keeping schema_ref as before so the
iterator type matches RecordBatchIterator::new.
surrealdb/core/src/kvs/lance/flusher.rs (1)

288-292: 💤 Low value

Same minor inefficiency as commit_gate.rs: unnecessary collect.

♻️ Suggested simplification
 	let schema_ref = batches[0].schema();
 	let reader = arrow_array::RecordBatchIterator::new(
-		batches.into_iter().map(Ok::<_, arrow_schema::ArrowError>).collect::<Vec<_>>(),
+		batches.into_iter().map(Ok::<_, arrow_schema::ArrowError>),
 		schema_ref,
 	);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@surrealdb/core/src/kvs/lance/flusher.rs` around lines 288 - 292, The code
builds a Vec unnecessarily before creating the RecordBatchIterator; instead of
collecting the mapped iterator, pass the mapped iterator directly to
arrow_array::RecordBatchIterator::new using the existing batches iterator and
schema_ref (reference the variables/function names batches, schema_ref, and
arrow_array::RecordBatchIterator::new) so remove the .collect::<Vec<_>>() step
and feed the mapped iterator (batches.into_iter().map(Ok::<_,
arrow_schema::ArrowError>)) directly to the constructor.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@surrealdb/core/src/kvs/lance/commit_gate.rs`:
- Around line 389-393: The code unnecessarily collects into a Vec before
constructing the RecordBatchIterator; change the RecordBatchIterator::new call
in commit_gate.rs to take the iterator directly by passing
batches.into_iter().map(Ok::<_, arrow_schema::ArrowError>) (remove the
.collect::<Vec<_>>() step) while keeping schema_ref as before so the iterator
type matches RecordBatchIterator::new.

In `@surrealdb/core/src/kvs/lance/flusher.rs`:
- Around line 288-292: The code builds a Vec unnecessarily before creating the
RecordBatchIterator; instead of collecting the mapped iterator, pass the mapped
iterator directly to arrow_array::RecordBatchIterator::new using the existing
batches iterator and schema_ref (reference the variables/function names batches,
schema_ref, and arrow_array::RecordBatchIterator::new) so remove the
.collect::<Vec<_>>() step and feed the mapped iterator
(batches.into_iter().map(Ok::<_, arrow_schema::ArrowError>)) directly to the
constructor.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: deefbcbc-af5c-49bc-9ee3-8c48140039ea

📥 Commits

Reviewing files that changed from the base of the PR and between 9205453 and 5997eea.

📒 Files selected for processing (7)
  • .claude/board/AGENT_LOG.md
  • .claude/board/EPIPHANIES.md
  • surrealdb/core/src/kvs/config.rs
  • surrealdb/core/src/kvs/lance/commit_gate.rs
  • surrealdb/core/src/kvs/lance/flusher.rs
  • surrealdb/core/src/kvs/lance/mod.rs
  • surrealdb/core/src/kvs/lance/tests.rs
💤 Files with no reviewable changes (1)
  • surrealdb/core/src/kvs/config.rs
✅ Files skipped from review due to trivial changes (2)
  • .claude/board/AGENT_LOG.md
  • .claude/board/EPIPHANIES.md

AdaWorldAPI pushed a commit that referenced this pull request May 30, 2026
The .rustfmt.toml enabled nightly-only options (wrap_comments,
imports_granularity, group_imports, comment_width) while the build
toolchain is pinned stable 1.95 and the fmt job that would apply them
(pinned rust-toolchain.nightly via ci.yml) is not running — GitHub
Actions is not enabled on this fork (PR #29 head has zero check runs).
So those options were enforced by nothing yet made stable `cargo fmt`
warn and reformat divergently.

Comment them out so stable `cargo fmt` (the 1.95 pin, matching org
policy of 99% stable / nightly only for Miri) is the single authoritative
formatter. Re-enable as a block if a nightly-fmt CI job is ever turned on
and the tree is normalized under it in one dedicated pass.

Findings (no CI on the fork; rustfmt split-brain) recorded in
.claude/board/EPIPHANIES.md.

https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R
`MvccSource::next_version` uses return-position `impl Trait` (RPITIT), which
makes the trait non-object-safe, so the `Arc<dyn MvccSource>` the doc comment
claimed will not compile. The trait is `#[allow(dead_code)]` (no consumer
yet); rather than box the future (a per-call heap allocation for an unused
trait-object capability), correct the doc to state zero-cost static dispatch
and that a boxed-future variant can be added if runtime-pluggable sources are
ever needed. Doc-only change.

https://claude.ai/code/session_01R9AWgFa65uPnLyS2my2d2R
@AdaWorldAPI AdaWorldAPI merged commit 4eb33dd into main May 30, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants