Hot Path
(*readerLoop).nextFrame in libbeat/publisher/queue/diskqueue/reader_loop.go (L191-L246) repeatedly parses frame metadata with multiple small reads (L201, L229, L240) and uses Read for payload (L222).
Profiling Data
Before:
go test ./libbeat/publisher/queue/diskqueue -run=^$ -bench=BenchmarkSync1k -benchmem -count=5 -cpuprofile=/tmp/gh-aw/agent/dq_cpu.prof -memprofile=/tmp/gh-aw/agent/dq_mem.prof
BenchmarkSync1k-4 99 12022922 ns/op 3466193 B/op 33291 allocs/op
BenchmarkSync1k-4 99 12762895 ns/op 3462574 B/op 32998 allocs/op
BenchmarkSync1k-4 96 12038660 ns/op 3465094 B/op 33219 allocs/op
BenchmarkSync1k-4 99 11884420 ns/op 3466151 B/op 33329 allocs/op
BenchmarkSync1k-4 99 12225757 ns/op 3465529 B/op 33288 allocs/op
CPU profile top (before):
go tool pprof -top /tmp/gh-aw/agent/dq_cpu.prof
... total samples = 23.45s
internal/runtime/syscall.Syscall6 11.58s (49.38%)
Proposed Change
Replace per-field binary.Read calls with io.ReadFull + binary.LittleEndian.Uint32 and read footer in one 8-byte read.
diff --git a/libbeat/publisher/queue/diskqueue/reader_loop.go b/libbeat/publisher/queue/diskqueue/reader_loop.go
@@
-var frameLength uint32
-err := binary.Read(reader, binary.LittleEndian, &frameLength)
+var header [4]byte
+_, err := io.ReadFull(reader, header[:])
+frameLength := binary.LittleEndian.Uint32(header[:])
@@
-_, err = reader.Read(bytes)
+_, err = io.ReadFull(reader, bytes)
@@
-var checksum uint32
-err = binary.Read(reader, binary.LittleEndian, &checksum)
+var footer [8]byte
+_, err = io.ReadFull(reader, footer[:])
+checksum := binary.LittleEndian.Uint32(footer[0:4])
@@
-var duplicateLength uint32
-err = binary.Read(reader, binary.LittleEndian, &duplicateLength)
+duplicateLength := binary.LittleEndian.Uint32(footer[4:8])
Results
After (same command):
go test ./libbeat/publisher/queue/diskqueue -run=^$ -bench=BenchmarkSync1k -benchmem -count=5 -cpuprofile=/tmp/gh-aw/agent/dq_cpu_after.prof -memprofile=/tmp/gh-aw/agent/dq_mem_after.prof
BenchmarkSync1k-4 114 10441470 ns/op 3468320 B/op 31744 allocs/op
BenchmarkSync1k-4 115 11351108 ns/op 3466468 B/op 31614 allocs/op
BenchmarkSync1k-4 106 10575430 ns/op 3466453 B/op 31720 allocs/op
BenchmarkSync1k-4 115 10384787 ns/op 3467907 B/op 31805 allocs/op
BenchmarkSync1k-4 100 10794191 ns/op 3467981 B/op 31906 allocs/op
Averages (5 runs):
- Time:
12,186,930.8 -> 10,709,397.2 ns/op (12.12% faster)
- Memory:
3,465,108.2 -> 3,467,425.8 B/op (roughly flat)
- Allocs:
33,225.0 -> 31,757.8 allocs/op (4.42% fewer allocs)
Improvement: 12.12% latency reduction on a diskqueue hot path benchmark.
Verification
- Tests run:
go test ./libbeat/publisher/queue/diskqueue -run 'Test.*' -count=1 → ok
- Behavior preservation: checksum validation, duplicate-length validation, and decode flow are unchanged; only byte-read/parsing mechanics changed.
Evidence
Commands executed:
- Baseline benchmark/profile:
go test ./libbeat/publisher/queue/diskqueue -run=^$ -bench=BenchmarkSync1k -benchmem -count=5 -cpuprofile=/tmp/gh-aw/agent/dq_cpu.prof -memprofile=/tmp/gh-aw/agent/dq_mem.prof
- Post-change benchmark/profile (same command):
go test ./libbeat/publisher/queue/diskqueue -run=^$ -bench=BenchmarkSync1k -benchmem -count=5 -cpuprofile=/tmp/gh-aw/agent/dq_cpu_after.prof -memprofile=/tmp/gh-aw/agent/dq_mem_after.prof
- Verification tests:
go test ./libbeat/publisher/queue/diskqueue -run 'Test.*' -count=1
This appears distinct from existing open issue #49519 (buffer reuse in serialize.go), as this targets metadata parsing in reader_loop.go.
What is this? | From workflow: Performance Profiler
Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.
Hot Path
(*readerLoop).nextFrameinlibbeat/publisher/queue/diskqueue/reader_loop.go(L191-L246) repeatedly parses frame metadata with multiple small reads (L201,L229,L240) and usesReadfor payload (L222).Profiling Data
Before:
CPU profile top (before):
Proposed Change
Replace per-field
binary.Readcalls withio.ReadFull+binary.LittleEndian.Uint32and read footer in one 8-byte read.Results
After (same command):
Averages (5 runs):
12,186,930.8 -> 10,709,397.2 ns/op(12.12% faster)3,465,108.2 -> 3,467,425.8 B/op(roughly flat)33,225.0 -> 31,757.8 allocs/op(4.42% fewer allocs)Improvement: 12.12% latency reduction on a diskqueue hot path benchmark.
Verification
go test ./libbeat/publisher/queue/diskqueue -run 'Test.*' -count=1→okEvidence
Commands executed:
go test ./libbeat/publisher/queue/diskqueue -run=^$ -bench=BenchmarkSync1k -benchmem -count=5 -cpuprofile=/tmp/gh-aw/agent/dq_cpu.prof -memprofile=/tmp/gh-aw/agent/dq_mem.profgo test ./libbeat/publisher/queue/diskqueue -run=^$ -bench=BenchmarkSync1k -benchmem -count=5 -cpuprofile=/tmp/gh-aw/agent/dq_cpu_after.prof -memprofile=/tmp/gh-aw/agent/dq_mem_after.profgo test ./libbeat/publisher/queue/diskqueue -run 'Test.*' -count=1This appears distinct from existing open issue #49519 (buffer reuse in
serialize.go), as this targets metadata parsing inreader_loop.go.What is this? | From workflow: Performance Profiler
Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.