Skip to content

[performance-profiler] Reduce diskqueue reader_loop frame metadata read overhead in nextFrame #49629

@github-actions

Description

@github-actions

Hot Path

(*readerLoop).nextFrame in libbeat/publisher/queue/diskqueue/reader_loop.go (L191-L246) repeatedly parses frame metadata with multiple small reads (L201, L229, L240) and uses Read for payload (L222).

Profiling Data

Before:

go test ./libbeat/publisher/queue/diskqueue -run=^$ -bench=BenchmarkSync1k -benchmem -count=5 -cpuprofile=/tmp/gh-aw/agent/dq_cpu.prof -memprofile=/tmp/gh-aw/agent/dq_mem.prof
BenchmarkSync1k-4  99  12022922 ns/op  3466193 B/op  33291 allocs/op
BenchmarkSync1k-4  99  12762895 ns/op  3462574 B/op  32998 allocs/op
BenchmarkSync1k-4  96  12038660 ns/op  3465094 B/op  33219 allocs/op
BenchmarkSync1k-4  99  11884420 ns/op  3466151 B/op  33329 allocs/op
BenchmarkSync1k-4  99  12225757 ns/op  3465529 B/op  33288 allocs/op

CPU profile top (before):

go tool pprof -top /tmp/gh-aw/agent/dq_cpu.prof
... total samples = 23.45s
internal/runtime/syscall.Syscall6 11.58s (49.38%)

Proposed Change

Replace per-field binary.Read calls with io.ReadFull + binary.LittleEndian.Uint32 and read footer in one 8-byte read.

diff --git a/libbeat/publisher/queue/diskqueue/reader_loop.go b/libbeat/publisher/queue/diskqueue/reader_loop.go
@@
-var frameLength uint32
-err := binary.Read(reader, binary.LittleEndian, &frameLength)
+var header [4]byte
+_, err := io.ReadFull(reader, header[:])
+frameLength := binary.LittleEndian.Uint32(header[:])
@@
-_, err = reader.Read(bytes)
+_, err = io.ReadFull(reader, bytes)
@@
-var checksum uint32
-err = binary.Read(reader, binary.LittleEndian, &checksum)
+var footer [8]byte
+_, err = io.ReadFull(reader, footer[:])
+checksum := binary.LittleEndian.Uint32(footer[0:4])
@@
-var duplicateLength uint32
-err = binary.Read(reader, binary.LittleEndian, &duplicateLength)
+duplicateLength := binary.LittleEndian.Uint32(footer[4:8])

Results

After (same command):

go test ./libbeat/publisher/queue/diskqueue -run=^$ -bench=BenchmarkSync1k -benchmem -count=5 -cpuprofile=/tmp/gh-aw/agent/dq_cpu_after.prof -memprofile=/tmp/gh-aw/agent/dq_mem_after.prof
BenchmarkSync1k-4  114  10441470 ns/op  3468320 B/op  31744 allocs/op
BenchmarkSync1k-4  115  11351108 ns/op  3466468 B/op  31614 allocs/op
BenchmarkSync1k-4  106  10575430 ns/op  3466453 B/op  31720 allocs/op
BenchmarkSync1k-4  115  10384787 ns/op  3467907 B/op  31805 allocs/op
BenchmarkSync1k-4  100  10794191 ns/op  3467981 B/op  31906 allocs/op

Averages (5 runs):

  • Time: 12,186,930.8 -> 10,709,397.2 ns/op (12.12% faster)
  • Memory: 3,465,108.2 -> 3,467,425.8 B/op (roughly flat)
  • Allocs: 33,225.0 -> 31,757.8 allocs/op (4.42% fewer allocs)

Improvement: 12.12% latency reduction on a diskqueue hot path benchmark.

Verification

  • Tests run: go test ./libbeat/publisher/queue/diskqueue -run 'Test.*' -count=1ok
  • Behavior preservation: checksum validation, duplicate-length validation, and decode flow are unchanged; only byte-read/parsing mechanics changed.

Evidence

Commands executed:

  • Baseline benchmark/profile:
    • go test ./libbeat/publisher/queue/diskqueue -run=^$ -bench=BenchmarkSync1k -benchmem -count=5 -cpuprofile=/tmp/gh-aw/agent/dq_cpu.prof -memprofile=/tmp/gh-aw/agent/dq_mem.prof
  • Post-change benchmark/profile (same command):
    • go test ./libbeat/publisher/queue/diskqueue -run=^$ -bench=BenchmarkSync1k -benchmem -count=5 -cpuprofile=/tmp/gh-aw/agent/dq_cpu_after.prof -memprofile=/tmp/gh-aw/agent/dq_mem_after.prof
  • Verification tests:
    • go test ./libbeat/publisher/queue/diskqueue -run 'Test.*' -count=1

This appears distinct from existing open issue #49519 (buffer reuse in serialize.go), as this targets metadata parsing in reader_loop.go.


What is this? | From workflow: Performance Profiler

Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.

  • expires on Mar 31, 2026, 2:42 PM UTC

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs_teamIndicates that the issue/PR needs a Team:* label

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions