[performance-profiler] Reduce diskqueue reader_loop frame metadata read overhead in nextFrame

## Hot Path
`(*readerLoop).nextFrame` in `libbeat/publisher/queue/diskqueue/reader_loop.go` (`L191-L246`) repeatedly parses frame metadata with multiple small reads (`L201`, `L229`, `L240`) and uses `Read` for payload (`L222`).

## Profiling Data
**Before:**
```text
go test ./libbeat/publisher/queue/diskqueue -run=^$ -bench=BenchmarkSync1k -benchmem -count=5 -cpuprofile=/tmp/gh-aw/agent/dq_cpu.prof -memprofile=/tmp/gh-aw/agent/dq_mem.prof
BenchmarkSync1k-4  99  12022922 ns/op  3466193 B/op  33291 allocs/op
BenchmarkSync1k-4  99  12762895 ns/op  3462574 B/op  32998 allocs/op
BenchmarkSync1k-4  96  12038660 ns/op  3465094 B/op  33219 allocs/op
BenchmarkSync1k-4  99  11884420 ns/op  3466151 B/op  33329 allocs/op
BenchmarkSync1k-4  99  12225757 ns/op  3465529 B/op  33288 allocs/op
```

CPU profile top (before):
```text
go tool pprof -top /tmp/gh-aw/agent/dq_cpu.prof
... total samples = 23.45s
internal/runtime/syscall.Syscall6 11.58s (49.38%)
```

## Proposed Change
Replace per-field `binary.Read` calls with `io.ReadFull` + `binary.LittleEndian.Uint32` and read footer in one 8-byte read.

```diff
diff --git a/libbeat/publisher/queue/diskqueue/reader_loop.go b/libbeat/publisher/queue/diskqueue/reader_loop.go
@@
-var frameLength uint32
-err := binary.Read(reader, binary.LittleEndian, &frameLength)
+var header [4]byte
+_, err := io.ReadFull(reader, header[:])
+frameLength := binary.LittleEndian.Uint32(header[:])
@@
-_, err = reader.Read(bytes)
+_, err = io.ReadFull(reader, bytes)
@@
-var checksum uint32
-err = binary.Read(reader, binary.LittleEndian, &checksum)
+var footer [8]byte
+_, err = io.ReadFull(reader, footer[:])
+checksum := binary.LittleEndian.Uint32(footer[0:4])
@@
-var duplicateLength uint32
-err = binary.Read(reader, binary.LittleEndian, &duplicateLength)
+duplicateLength := binary.LittleEndian.Uint32(footer[4:8])
```

## Results
**After (same command):**
```text
go test ./libbeat/publisher/queue/diskqueue -run=^$ -bench=BenchmarkSync1k -benchmem -count=5 -cpuprofile=/tmp/gh-aw/agent/dq_cpu_after.prof -memprofile=/tmp/gh-aw/agent/dq_mem_after.prof
BenchmarkSync1k-4  114  10441470 ns/op  3468320 B/op  31744 allocs/op
BenchmarkSync1k-4  115  11351108 ns/op  3466468 B/op  31614 allocs/op
BenchmarkSync1k-4  106  10575430 ns/op  3466453 B/op  31720 allocs/op
BenchmarkSync1k-4  115  10384787 ns/op  3467907 B/op  31805 allocs/op
BenchmarkSync1k-4  100  10794191 ns/op  3467981 B/op  31906 allocs/op
```

Averages (5 runs):
- Time: `12,186,930.8 -> 10,709,397.2 ns/op` (**12.12% faster**)
- Memory: `3,465,108.2 -> 3,467,425.8 B/op` (roughly flat)
- Allocs: `33,225.0 -> 31,757.8 allocs/op` (**4.42% fewer allocs**)

**Improvement:** 12.12% latency reduction on a diskqueue hot path benchmark.

## Verification
- Tests run: `go test ./libbeat/publisher/queue/diskqueue -run 'Test.*' -count=1` → `ok`
- Behavior preservation: checksum validation, duplicate-length validation, and decode flow are unchanged; only byte-read/parsing mechanics changed.

## Evidence
Commands executed:
- Baseline benchmark/profile:
  - `go test ./libbeat/publisher/queue/diskqueue -run=^$ -bench=BenchmarkSync1k -benchmem -count=5 -cpuprofile=/tmp/gh-aw/agent/dq_cpu.prof -memprofile=/tmp/gh-aw/agent/dq_mem.prof`
- Post-change benchmark/profile (same command):
  - `go test ./libbeat/publisher/queue/diskqueue -run=^$ -bench=BenchmarkSync1k -benchmem -count=5 -cpuprofile=/tmp/gh-aw/agent/dq_cpu_after.prof -memprofile=/tmp/gh-aw/agent/dq_mem_after.prof`
- Verification tests:
  - `go test ./libbeat/publisher/queue/diskqueue -run 'Test.*' -count=1`

This appears distinct from existing open issue #49519 (buffer reuse in `serialize.go`), as this targets metadata parsing in `reader_loop.go`.




---
[What is this?](https://ela.st/github-ai-tools) | [From workflow: Performance Profiler](https://github.com/elastic/beats/actions/runs/23494539632)

Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.
> - [x] expires  on Mar 31, 2026, 2:42 PM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[performance-profiler] Reduce diskqueue reader_loop frame metadata read overhead in nextFrame #49629

Hot Path

Profiling Data

Proposed Change

Results

Verification

Evidence

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[performance-profiler] Reduce diskqueue reader_loop frame metadata read overhead in nextFrame #49629

Description

Hot Path

Profiling Data

Proposed Change

Results

Verification

Evidence

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions