Skip to content

WIP: CNTRLPLANE-3461: refactor defrag to minimize database lock time#378

Open
dusk125 wants to merge 8 commits into
openshift:mainfrom
dusk125:non-blocking-defrag
Open

WIP: CNTRLPLANE-3461: refactor defrag to minimize database lock time#378
dusk125 wants to merge 8 commits into
openshift:mainfrom
dusk125:non-blocking-defrag

Conversation

@dusk125
Copy link
Copy Markdown

@dusk125 dusk125 commented May 14, 2026

Split the monolithic defrag operation into three phases to allow writes to continue during the slow bulk copy:

Phase 1 (unlocked): Commit pending writes, take a bbolt MVCC snapshot, install a write journal on batchTxBuffered, then unlock. The bulk copy runs against the snapshot while normal writes continue and are captured by the journal.

Phase 2 (locked): Lock batchTx, drain the journal, replay delta writes into the temp DB. This is fast since it's only the writes that occurred during the copy.

Phase 3 (locked): Atomic switchover — close old DB, rename temp file, reopen, create new transactions, unlock.

Write blocking duration changes from O(db_size) to O(writes_during_copy) + fixed overhead for close/rename/reopen.

Summary by CodeRabbit

  • New Features

    • Defrag now uses a journaling, three-phase process so writes remain available and consistent during maintenance.
  • Bug Fixes

    • Safer switchover and improved handling of concurrent writes, reads, deletions and bucket changes; ignores missing items during cleanup; better large-replay robustness.
  • Tests

    • Large suite validating concurrent defrag scenarios, journal semantics, visibility guarantees, replay behavior, and repeated/empty defrags.

Split the monolithic defrag operation into three phases to allow writes
to continue during the slow bulk copy:

Phase 1 (unlocked): Commit pending writes, take a bbolt MVCC snapshot,
install a write journal on batchTxBuffered, then unlock. The bulk copy
runs against the snapshot while normal writes continue and are captured
by the journal.

Phase 2 (locked): Lock batchTx, drain the journal, replay delta writes
into the temp DB. This is fast since it's only the writes that occurred
during the copy.

Phase 3 (locked): Atomic switchover — close old DB, rename temp file,
reopen, create new transactions, unlock.

Write blocking duration changes from O(db_size) to
O(writes_during_copy) + fixed overhead for close/rename/reopen.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 14, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 14, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Implements a journaling-based, three-phase BoltDB defragmentation: snapshot/copy to a temp DB while allowing writes, record concurrent mutations into an in-memory defrag journal, drain and replay the journal into the temp DB, then atomically switch to the compacted DB. Adds journal types, wiring into batch transactions, and concurrency tests.

Changes

Defragmentation journaling & switchover

Layer / File(s) Summary
Journal core implementation & BatchTx wiring
server/storage/backend/defrag_journal.go, server/storage/backend/batch_tx.go
Adds an in-memory, thread-safe defragJournal and wires it into batchTxBuffered (defragJournal *defragJournal), making Unlock/Commit conditional and recording create/put/delete events when journaling is active.
Snapshot-to-temp copy + replay & switchover
server/storage/backend/backend.go
Refactors defrag to open temp DB (Options.Mlock = false), take a snapshot tx, copy snapshot contents into the temp DB using defragFromTx(srcTx, tmpdb, limit) (sets FillPercent=0.9, periodic commits), drain the in-memory journal, replay ops with replayJournal(tmpdb, ops, limit) (batched commits, tolerates ErrBucketNotFound on delete), then perform atomic unsafe commit, file swap and transaction reset. Adds errors import and bolt errors alias for sentinel checks.
Concurrency-heavy defrag tests
server/storage/backend/backend_test.go
Adds tests exercising defrag under concurrent writes/reads, deletes during defrag, overwrite/new-bucket races during copy, repeated defrags, empty DB defrag, large journal replay, buffered-read visibility during defrag, batch-limit stress, and commit-during-defrag scenarios.
sequenceDiagram
    participant Caller as Caller (Defrag initiator)
    participant Backend as Backend
    participant BatchTx as BatchTx
    participant Journal as Journal
    participant TempDB as Temp DB
    participant SourceDB as Source DB

    rect rgba(100, 150, 200, 0.5)
    Note over Backend,TempDB: Phase 1: Snapshot & Copy
    Caller->>Backend: Defrag()
    Backend->>Backend: Open TempDB (Options.Mlock=false)
    Backend->>SourceDB: Commit pending writes, take snapshot tx
    Backend->>BatchTx: Install defragJournal
    Backend->>Backend: defragFromTx(snapshot, tmpdb, limit)
    Backend->>SourceDB: Iterate buckets from snapshot tx
    Backend->>TempDB: Create buckets, copy key/value pairs (periodic commits)
    end

    rect rgba(150, 100, 200, 0.5)
    Note over Backend,TempDB: Phase 2: Concurrent Writes & Replay
    par Concurrent writes
        Caller->>BatchTx: UnsafePut/UnsafeDelete (during defrag)
        BatchTx->>Journal: Append ops to journal
    and Journal replay
        Backend->>BatchTx: Lock writes
        Backend->>Journal: Drain accumulated operations
        Backend->>Backend: replayJournal(tmpdb, ops, limit)
        Backend->>TempDB: Apply batched creates/puts/deletes (ignore missing-bucket delete errors)
    end
    end

    rect rgba(200, 150, 100, 0.5)
    Note over Backend,SourceDB: Phase 3: Atomic Switchover
    Backend->>Backend: Lock tx/db paths for switchover
    Backend->>TempDB: Unsafe commit
    Backend->>SourceDB: Close & rename DB files
    Backend->>TempDB: Reopen as primary DB
    Backend->>BatchTx: Reset transaction state
    Backend->>Caller: Return (defrag complete)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 10 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 4.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Topology-Aware Scheduling Compatibility ❓ Inconclusive No result was produced after verification. Marking as INCONCLUSIVE. Re-run the check or adjust instructions to produce a final result.
✅ Passed checks (10 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly addresses the main change: refactoring defragmentation to minimize database lock time, which is the core focus of all file modifications.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All 16 test functions in this PR have static, deterministic names with no dynamic content. The custom check refers to Ginkgo patterns, which don't apply—this codebase uses standard Go testing only.
Test Structure And Quality ✅ Passed PR contains standard Go testing package tests, not Ginkgo tests. Ginkgo-specific patterns (Describe, It, BeforeEach, Eventually, Consistently) are absent. Custom check is inapplicable.
Microshift Test Compatibility ✅ Passed PR adds standard Go unit tests, not Ginkgo e2e tests. The custom check applies only to Ginkgo tests (It(), Describe(), etc.), which are not present here.
Single Node Openshift (Sno) Test Compatibility ✅ Passed This PR adds standard Go unit tests for etcd's storage backend defragmentation, not Ginkgo e2e tests. SNO compatibility checks only apply to e2e tests. No Ginkgo framework or constructs are present.
Ote Binary Stdout Contract ✅ Passed No stdout writes in process-level code. Changes are library/test packages only. No main(), init(), TestMain(), or suite setup functions. No fmt.Print*, klog, or os.Stdout usage.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed New tests are standard Go unit tests, not Ginkgo e2e tests. No IPv4 hardcoding, external URLs, or network connectivity. Check not applicable.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from hasbro17 and tjungblu May 14, 2026 16:00
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 14, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dusk125

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 14, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
server/storage/backend/backend.go (1)

540-541: ⚡ Quick win

Potential resource leak if defragFromTx panics.

If defragFromTx panics at line 540, the snapTx on line 524 is never rolled back, and the cleanup in lines 544-551 is not reached. While Go's bbolt will eventually clean up on backend close, consider wrapping with a deferred rollback:

 	if snapErr != nil {
 		b.batchTx.Unlock()
 		tmpdb.Close()
 		os.Remove(tdbp)
 		return fmt.Errorf("failed to begin snapshot tx for defrag: %w", snapErr)
 	}
+	defer snapTx.Rollback()  // no-op if already rolled back
 
 	journal := newDefragJournal()
 	b.batchTx.defragJournal = journal
 	b.batchTx.Unlock()
 
 	b.lg.Info("defrag: copying data (writes unlocked)")
 
 	// gofail: var defragBeforeCopy struct{}
 	err = defragFromTx(snapTx, tmpdb, defragLimit)
-	snapTx.Rollback()

This ensures cleanup even if defragFromTx panics (though etcd typically uses Fatal on critical errors, making this a defensive safeguard).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/storage/backend/backend.go` around lines 540 - 541, The call to
defragFromTx may panic, which would skip the explicit snapTx.Rollback() and
cause a resource leak; immediately defer snapTx.Rollback() right after snapTx is
created (before calling defragFromTx) so the transaction is rolled back even if
defragFromTx panics, and keep the existing explicit rollback/commit/error
handling logic around defragFromTx (or check/ignore the rollback error if
necessary) to preserve current behavior for tmpdb and defragLimit.
server/storage/backend/batch_tx.go (1)

389-394: 💤 Low value

Verify journal ordering: append happens after the operation.

The journal records the operation after t.batchTx.UnsafeCreateBucket(bucket) completes. If the underlying operation can fail and log fatal (line 118-122 in batchTx), the journal append is never reached. This is correct behavior for the current implementation, but worth documenting.

The same pattern applies to all other unsafe operations (Put, Delete, DeleteBucket).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/storage/backend/batch_tx.go` around lines 389 - 394, The journal
append in UnsafeCreateBucket currently occurs after calling
t.batchTx.UnsafeCreateBucket(bucket) which means if the underlying operation
fails fatally the journal won't record it; add a brief comment above
UnsafeCreateBucket explaining this ordering and rationale (i.e., we
intentionally append to defragJournal only after the operation succeeds to avoid
recording failed operations), and add the same clarifying comment to the other
unsafe methods (UnsafePut, UnsafeDelete, UnsafeDeleteBucket) referencing the
calls t.batchTx.UnsafeCreateBucket / UnsafePut / UnsafeDelete /
UnsafeDeleteBucket and the corresponding j.appendCreateBucket / j.appendPut /
j.appendDelete / j.appendDeleteBucket to make the behavior explicit for future
readers.
server/storage/backend/defrag_journal.go (1)

101-107: 💤 Low value

Consider preserving slice capacity in drain().

Setting j.ops = nil on line 105 discards the allocated capacity. If drain() is called repeatedly during defrag replay batching, this forces reallocation. Consider:

 func (j *defragJournal) drain() []defragJournalOp {
 	j.mu.Lock()
 	defer j.mu.Unlock()
 	ops := j.ops
-	j.ops = nil
+	j.ops = j.ops[:0]
 	return ops
 }

This retains the underlying capacity for the next append batch.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/storage/backend/defrag_journal.go` around lines 101 - 107, The drain()
implementation drops the underlying capacity by setting j.ops = nil; change it
to reset the slice length without releasing capacity (use a slice reset like
j.ops = j.ops[:0]) so repeated drain() calls reuse the allocated backing array;
update the defragJournal.drain method to return the current ops slice content
while preserving capacity for subsequent appends (reference: defragJournal.drain
and the j.ops slice).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@server/storage/backend/backend.go`:
- Around line 540-541: The call to defragFromTx may panic, which would skip the
explicit snapTx.Rollback() and cause a resource leak; immediately defer
snapTx.Rollback() right after snapTx is created (before calling defragFromTx) so
the transaction is rolled back even if defragFromTx panics, and keep the
existing explicit rollback/commit/error handling logic around defragFromTx (or
check/ignore the rollback error if necessary) to preserve current behavior for
tmpdb and defragLimit.

In `@server/storage/backend/batch_tx.go`:
- Around line 389-394: The journal append in UnsafeCreateBucket currently occurs
after calling t.batchTx.UnsafeCreateBucket(bucket) which means if the underlying
operation fails fatally the journal won't record it; add a brief comment above
UnsafeCreateBucket explaining this ordering and rationale (i.e., we
intentionally append to defragJournal only after the operation succeeds to avoid
recording failed operations), and add the same clarifying comment to the other
unsafe methods (UnsafePut, UnsafeDelete, UnsafeDeleteBucket) referencing the
calls t.batchTx.UnsafeCreateBucket / UnsafePut / UnsafeDelete /
UnsafeDeleteBucket and the corresponding j.appendCreateBucket / j.appendPut /
j.appendDelete / j.appendDeleteBucket to make the behavior explicit for future
readers.

In `@server/storage/backend/defrag_journal.go`:
- Around line 101-107: The drain() implementation drops the underlying capacity
by setting j.ops = nil; change it to reset the slice length without releasing
capacity (use a slice reset like j.ops = j.ops[:0]) so repeated drain() calls
reuse the allocated backing array; update the defragJournal.drain method to
return the current ops slice content while preserving capacity for subsequent
appends (reference: defragJournal.drain and the j.ops slice).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: bcd07fa1-6ba1-4075-a8e0-cdb9b4036291

📥 Commits

Reviewing files that changed from the base of the PR and between c543fe1 and 9e6971c.

📒 Files selected for processing (5)
  • server/storage/backend/backend.go
  • server/storage/backend/backend_test.go
  • server/storage/backend/batch_tx.go
  • server/storage/backend/defrag_journal.go
  • server/storage/backend/defrag_journal_test.go

@dusk125 dusk125 changed the title WIP: refactor defrag to minimize database lock time WIP: CNTRLPLANE-3461: refactor defrag to minimize database lock time May 14, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 14, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented May 14, 2026

@dusk125: This pull request references CNTRLPLANE-3461 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the sub-task to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Split the monolithic defrag operation into three phases to allow writes to continue during the slow bulk copy:

Phase 1 (unlocked): Commit pending writes, take a bbolt MVCC snapshot, install a write journal on batchTxBuffered, then unlock. The bulk copy runs against the snapshot while normal writes continue and are captured by the journal.

Phase 2 (locked): Lock batchTx, drain the journal, replay delta writes into the temp DB. This is fast since it's only the writes that occurred during the copy.

Phase 3 (locked): Atomic switchover — close old DB, rename temp file, reopen, create new transactions, unlock.

Write blocking duration changes from O(db_size) to O(writes_during_copy) + fixed overhead for close/rename/reopen.

Summary by CodeRabbit

Release Notes

  • Bug Fixes

  • Improved database defragmentation to safely handle concurrent write and delete operations during maintenance.

  • Enhanced reliability of defragmentation process through better operation tracking.

  • Tests

  • Added test coverage for concurrent operations during database defragmentation scenarios.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

dusk125 and others added 2 commits May 14, 2026 13:15
Add tests that validate the non-blocking defrag implementation:
- TestBackendDefragConcurrentReads: reads work throughout defrag
- TestBackendDefragWriteAvailability: writes stay unblocked during copy
- TestBackendDefragConcurrentReadsAndWrites: simultaneous readers/writers
- TestBackendDefragLogicalConsistency: hash + key-by-key verification

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- TestBackendDefragOverwriteDuringCopy: key overwritten during copy
  gets the new value after replay
- TestBackendDefragNewBucketDuringCopy: bucket created during copy
  is present after defrag with its data
- TestBackendDefragMultipleSequential: 3 sequential defrags with
  mutations between each, hash verified after every round
- TestBackendDefragEmptyDatabase: defrag on an empty db is a no-op
- TestBackendDefragLargeJournalReplay: enough writes during copy to
  exercise the batched commit path in replayJournal

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
server/storage/backend/backend_test.go (2)

649-657: 💤 Low value

Verify deleted keys are gone after defrag.

The test only checks that keys 50-99 survived but doesn't verify that concurrently deleted keys (0-49) are actually removed. This would strengthen confidence that journal replay correctly handles deletes during defrag.

Suggested addition after line 657
 	for i := 50; i < 100; i++ {
 		keys, _ := rtx.UnsafeRange(schema.Test, []byte(fmt.Sprintf("key_%04d", i)), nil, 0)
 		require.Lenf(t, keys, 1, "key_%04d should exist", i)
 	}
+	// verify deleted keys are gone
+	for i := 0; i < 50; i++ {
+		keys, _ := rtx.UnsafeRange(schema.Test, []byte(fmt.Sprintf("key_%04d", i)), nil, 0)
+		assert.Emptyf(t, keys, "key_%04d should be deleted", i)
+	}
 	rtx.Unlock()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/storage/backend/backend_test.go` around lines 649 - 657, Add
assertions after the defrag verification to ensure keys 0-49 were actually
deleted: after calling b.ForceCommit() and obtaining rtx := b.BatchTx() (and
rtx.Lock()), iterate i from 0 to 49 and use rtx.UnsafeRange(schema.Test,
[]byte(fmt.Sprintf("key_%04d", i)), nil, 0) to assert the returned keys slice is
empty (or length 0) for each deleted key; then unlock rtx. This complements the
existing checks for keys 50-99 and verifies deletes survived defrag/journal
replay.

965-974: ⚡ Quick win

Strengthen assertion to verify all journal writes are present.

The test aims to exercise the batched commit path in replayJournal, but only asserts that "at least some" journal keys exist. Since writes during defrag should be captured by the journal and replayed into the new DB, all written keys should be present. The current assertion would pass even if most keys were lost.

Suggested change
 	// verify at least some journal keys exist
 	var found int
 	for i := 0; i < written; i++ {
 		keys, _ := rtx.UnsafeRange(schema.Test, []byte(fmt.Sprintf("journal_%06d", i)), nil, 0)
 		if len(keys) > 0 {
 			found++
 		}
 	}
 	rtx.Unlock()
-	assert.Greater(t, found, 0, "journal writes should be present after defrag")
+	assert.Equal(t, written, found, "all journal writes should be present after defrag")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/storage/backend/backend_test.go` around lines 965 - 974, The test
currently only checks that some journal entries exist after defrag; change it to
assert that every expected journal key written during the test is present. Use
the existing loop that calls rtx.UnsafeRange(schema.Test,
[]byte(fmt.Sprintf("journal_%06d", i)), nil, 0) for i in 0..written-1 and count
missing keys (or increment a present counter) and then assert that present ==
written (or missing == 0). This ensures replayJournal/defrag behavior is
validated for all written keys rather than just >0; keep the same identifiers
(rtx.UnsafeRange, schema.Test, journal_%06d, written) when locating and updating
the assertion.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@server/storage/backend/backend_test.go`:
- Around line 649-657: Add assertions after the defrag verification to ensure
keys 0-49 were actually deleted: after calling b.ForceCommit() and obtaining rtx
:= b.BatchTx() (and rtx.Lock()), iterate i from 0 to 49 and use
rtx.UnsafeRange(schema.Test, []byte(fmt.Sprintf("key_%04d", i)), nil, 0) to
assert the returned keys slice is empty (or length 0) for each deleted key; then
unlock rtx. This complements the existing checks for keys 50-99 and verifies
deletes survived defrag/journal replay.
- Around line 965-974: The test currently only checks that some journal entries
exist after defrag; change it to assert that every expected journal key written
during the test is present. Use the existing loop that calls
rtx.UnsafeRange(schema.Test, []byte(fmt.Sprintf("journal_%06d", i)), nil, 0) for
i in 0..written-1 and count missing keys (or increment a present counter) and
then assert that present == written (or missing == 0). This ensures
replayJournal/defrag behavior is validated for all written keys rather than just
>0; keep the same identifiers (rtx.UnsafeRange, schema.Test, journal_%06d,
written) when locating and updating the assertion.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ebe9587a-2ca3-482e-b145-ec5431475c50

📥 Commits

Reviewing files that changed from the base of the PR and between 9e6971c and 853fb9a.

📒 Files selected for processing (1)
  • server/storage/backend/backend_test.go

dusk125 and others added 2 commits May 14, 2026 14:17
If a panic occurs after batchTx.tx is set to nil during the Phase 3
switchover, inflight requests could use the invalid tx and produce
secondary panics that shadow the real cause. Restore the deferred
recover that catches the original panic and calls Fatal.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
During defrag, writes to the old database are wasted I/O since it is
about to be replaced. When the defrag journal is active, mutation
methods on batchTxBuffered now write to the journal and in-memory
buffer only, skipping the bbolt write. Commits are also skipped during
defrag to prevent readTx.buf from being cleared — since puts are not
in bbolt, clearing the read buffer would make them invisible.

This addresses feedback to decouple transaction buffering from bbolt:
writes are buffered on the etcd side and handed over as a batch to
the new database via journal replay, without writing to a database
that is about to be replaced.

Adds three new tests:
- TestBackendDefragReadsDuringDefragSeeBufferedPuts: verifies
  ConcurrentReadTx sees puts made during defrag via the buffer
- TestBackendDefragExceedBatchLimit: writes exceed batchLimit during
  defrag, exercising the counter-reset path in Unlock
- TestBackendDefragCommitDuringDefragPreservesData: ForceCommit
  during defrag does not clear buffered data

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
server/storage/backend/backend_test.go (2)

535-539: 💤 Low value

Consider aligning the threshold with the documented expectation.

The comment states latency should be "well under a second" but the assertion allows up to 5 seconds. This generous threshold likely exists for CI stability, but the discrepancy could mask real performance regressions. Consider either lowering the threshold (with appropriate CI tolerance) or updating the comment to reflect the actual bound.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/storage/backend/backend_test.go` around lines 535 - 539, The test's
comment claims "well under a second" but the assertion uses 5*time.Second;
update the test to make these consistent by either lowering the assertion bound
on maxWriteLatency in the assert.Less call to 1*time.Second (or another tighter
CI-approved threshold) or change the comment to reflect the 5-second bound;
modify the assert.Less invocation and/or the preceding comment near the
maxWriteLatency assertion so the documented expectation and the actual timeout
match.

1000-1001: ⚖️ Poor tradeoff

Timing-based synchronization may cause test flakiness.

The time.Sleep(10 * time.Millisecond) to wait for defrag to start is timing-dependent and could cause flakiness on slow CI machines or under resource contention. Consider exposing a hook or using a synchronization primitive from the backend to signal when defrag has entered the copy phase, if such a mechanism is feasible without overly complicating the production code.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/storage/backend/backend_test.go` around lines 1000 - 1001, The test
uses a timing-based wait (time.Sleep(10 * time.Millisecond)) to assume defrag
has started; replace this with a deterministic synchronization point by adding a
test hook or synchronization primitive in the backend that signals when defrag
enters the copy phase (e.g., a channel or condition variable exposed as a
test-only field or a method like WaitForDefragCopy/DefragStarted channel) and
have backend_test.go wait on that signal instead of sleeping; update the defrag
code paths (the goroutine that begins copy work) to close/send on that hook when
entering the copy phase and in backend_test.go replace the Sleep with receiving
from the hook with a reasonable timeout to avoid hangs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@server/storage/backend/backend_test.go`:
- Around line 535-539: The test's comment claims "well under a second" but the
assertion uses 5*time.Second; update the test to make these consistent by either
lowering the assertion bound on maxWriteLatency in the assert.Less call to
1*time.Second (or another tighter CI-approved threshold) or change the comment
to reflect the 5-second bound; modify the assert.Less invocation and/or the
preceding comment near the maxWriteLatency assertion so the documented
expectation and the actual timeout match.
- Around line 1000-1001: The test uses a timing-based wait (time.Sleep(10 *
time.Millisecond)) to assume defrag has started; replace this with a
deterministic synchronization point by adding a test hook or synchronization
primitive in the backend that signals when defrag enters the copy phase (e.g., a
channel or condition variable exposed as a test-only field or a method like
WaitForDefragCopy/DefragStarted channel) and have backend_test.go wait on that
signal instead of sleeping; update the defrag code paths (the goroutine that
begins copy work) to close/send on that hook when entering the copy phase and in
backend_test.go replace the Sleep with receiving from the hook with a reasonable
timeout to avoid hangs.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ecb071bb-e474-42cb-9c92-c829a8856968

📥 Commits

Reviewing files that changed from the base of the PR and between 88a9d87 and d794ada.

📒 Files selected for processing (2)
  • server/storage/backend/backend_test.go
  • server/storage/backend/batch_tx.go

Tests that readers holding ConcurrentReadTx snapshots from the old
database are not disrupted when Phase 3 closes and replaces the DB:

- TestBackendDefragConcurrentReadTxSurvivesSwitchover: a single
  ConcurrentReadTx acquired before defrag is held across the entire
  switchover, verifying that Phase 3 blocks on db.Close() until the
  reader releases, and that reads on the old snapshot work throughout
- TestBackendDefragMultipleConcurrentReadTxDuringSwitchover: five
  concurrent readers continuously create and release ConcurrentReadTx
  during defrag, exercising the txWg synchronization across the
  Phase 3 boundary

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@server/storage/backend/backend_test.go`:
- Around line 375-399: Make the test deterministic by gating the concurrent
writer on an explicit "defrag in progress" signal and asserting at least one
write happened; add a defragStarted := make(chan struct{}) and an atomic counter
(e.g., var opsDone int32) that the writer increments after each UnsafePut inside
the goroutine, have the writer wait for <-defragStarted before starting the
loop, start Defrag in a separate goroutine and close(defragStarted) immediately
before launching it so the writer runs during Defrag, then after Defrag returns
close(stop), wg.Wait(), and require.Greater(t, atomic.LoadInt32(&opsDone),
int32(0)) to ensure at least one operation occurred during the defrag run
(referencing b.Defrag(), b.BatchTx(), tx.UnsafePut and the stop/wg
synchronization).
- Around line 924-955: The test only logs how many journal ops were written
during Defrag but doesn't assert that the threshold was crossed, so add an
assertion after reading written (after b.Defrag(), wg.Wait(), written :=
<-totalWritten) that verifies written meets or exceeds DefragLimitForTest()
(e.g. require.GreaterOrEqual(t, written, DefragLimitForTest())) to ensure the
batched replay path in replayJournal is actually exercised; reference the test's
use of b.Defrag(), the written variable, and DefragLimitForTest() when adding
this check.
- Around line 1204-1217: The test currently sleeps 50ms but doesn't prove
Defrag() is actually blocked by the old reader; add a non-blocking receive on
the defragDone channel before calling longReader.RUnlock() to assert it's still
pending. Specifically, before longReader.RUnlock(), perform a select { case err
:= <-defragDone: require.FailNowf(t, "Defrag completed early", "err=%v", err)
default: } (or equivalent using require/assert) to fail if defragDone already
delivered; keep the rest of the test (longReader.UnsafeRange, RUnlock, then
receive from defragDone) unchanged so the test now verifies Defrag() is blocked
by longReader.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 54bf14cd-7e69-440f-aa8b-c5756bc82e8f

📥 Commits

Reviewing files that changed from the base of the PR and between d794ada and 0fedc74.

📒 Files selected for processing (1)
  • server/storage/backend/backend_test.go

Comment thread server/storage/backend/backend_test.go Outdated
Comment thread server/storage/backend/backend_test.go
Comment thread server/storage/backend/backend_test.go
dusk125 and others added 2 commits May 15, 2026 10:20
Address CodeRabbit review feedback:
- Add atomic counter to TestBackendDefragConcurrentWrites to assert
  writes actually occurred during defrag
- Increase pre-populated data in TestBackendDefragLargeJournalReplay
  and assert written ops exceed defragLimit to ensure batched replay
  path is exercised
- Add non-blocking select in TestBackendDefragConcurrentReadTxSurvives
  Switchover to prove defrag is blocked by the outstanding reader

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
TestBackendDefragConcurrentWrites: gate writer on defragStarted
channel so it only runs during defrag, increase pre-populated data
to 5000x1KB keys to ensure Phase 1 takes long enough for the writer
to execute.

TestBackendDefragExceedBatchLimit: increase pre-populated data from
1 seed key to 5000x1KB keys and batch 10 puts per lock cycle. On CI,
the single seed key made Phase 1 trivially fast — only 56 keys were
written, failing the >100 assertion.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 15, 2026

@dusk125: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/upstream-integration b9a0708 link false /test upstream-integration
ci/prow/upstream-e2e b9a0708 link false /test upstream-e2e

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants