feat: support GroupsAccumulator for first_value and last_value with string/binary types by UBarney · Pull Request #21090 · apache/datafusion

UBarney · 2026-03-21T09:53:40Z

Which issue does this PR close?

Closes Implement GroupsAccumulator for first_value aggregate (speed up first_value and DISTINCT ON queries) #17899.

Rationale for this change

ID	SQL	main(s)	thisPR Time(s)	Performance Change
1	select t.id1, first_value(t.id3 order by t.id2, t.id4) as r2 from 'benchmarks/data/h2o/G1_1e8_1e8_100_0.parquet' as t group by t.id1, t.v1;	1.162	0.716	+1.62x faster 🚀
2	select l_shipmode, first_value(l_partkey order by l_orderkey, l_linenumber, l_comment, l_suppkey, l_tax) from 'benchmarks/data/tpch_sf10/lineitem' group by l_shipmode;	0.967	0.823	+1.17x faster 🚀
3	select t.id2, t.id4, first_value(t.v1 order by t.id2, t.id4) as r2 from 'benchmarks/data/h2o/G1_1e8_1e8_100_0.parquet' as t group by t.id2, t.id4;	7.567	6.998	+1.08x faster 🚀
4	select t.id1, last_value(t.id3 order by t.id2, t.id4) as r2 from 'benchmarks/data/h2o/G1_1e8_1e8_100_0.parquet' as t group by t.id1, t.v1;	1.068	0.721	+1.48x faster 🚀
5	select l_shipmode, last_value(l_partkey order by l_orderkey, l_linenumber, l_comment, l_suppkey, l_tax) from 'benchmarks/data/tpch_sf10/lineitem' group by l_shipmode;	0.728	0.714	+1.02x faster 🚀
6	select t.id2, t.id4, last_value(t.v1 order by t.id2, t.id4) as r2 from 'benchmarks/data/h2o/G1_1e8_1e8_100_0.parquet' as t group by t.id2, t.id4;	6.937	7.040	1.01x slower 🐌

Note: SQL queries for Q2, Q3, Q5, and Q6 are sourced from this PR.

Previously, the first_value and last_value aggregate functions only supported GroupsAccumulator for primitive types. For string or binary types (Utf8, LargeUtf8, Binary, etc.), they fell back to the slower row-based Accumulator path.

This change implements a specialized state management for byte-based types, enabling high-performance grouped aggregation for strings and binary data, especially when used with ORDER BY.

What changes are included in this PR?

New ValueState Trait: Abstracted the state management for first_value and last_value to support different storage backends.
PrimitiveValueState : Re-implemented the existing primitive handling using the new trait.
BytesValueState: Added a new state implementation for Utf8, LargeUtf8, Utf8View, Binary, LargeBinary, and BinaryView. It
optimizes memory by reusing Vec<u8> buffers for group updates.
Refactored FirstLastGroupsAccumulator: Migrated the accumulator to use the generic ValueState trait, allowing it to handle both primitive and byte types uniformly.

Are these changes tested?

YES

Are there any user-facing changes?

Dandandan · 2026-03-21T12:37:29Z

datafusion/functions-aggregate/src/first_last/state.rs

+///    to correctly implement `RESPECT NULLS` behavior.
+///
+pub(crate) struct BytesValueState {
+    vals: Vec<Option<Vec<u8>>>,


I think this can be much more efficiently stored as values Vec<u8> and offsets Vec<OffsetType>

I plan to implement it using the following approach and then run some benchmarks:

Data Structures

vals: Vec<u8>: A single, contiguous flat buffer for all raw bytes.

offsets: Vec<usize>: The starting position of each group's data in the buffer.

lengths: Vec<usize>: The logical length of the current value for each group.

capacities: Vec<usize>: The physical space allocated for each group (enables in-place overwrites if new_len <= capacity).

active_bytes: usize: A running counter of the sum of all current lengths (used to track fragmentation and trigger GC).

Update Logic

In-place Overwrite: If new_len <= capacity, we overwrite the existing slot at the current offset. We update the logical length, while capacity and offset remain unchanged.

Append: If new_len > capacity, we append the value to the end of vals and update the offset, length, and capacity to point to the new location.

GC (Compaction) Logic

Trigger: When the buffer grows too large (e.g., vals.len() > active_bytes * 2).

Action: Re-allocate a new buffer and copy only the latest valid data for each group to clear "dead" bytes left behind by the append path.

This is only needed for last_value, no?

Or wait nvm I see the queries having explicit order.

feat: implement FirstValueState with byte handling optimization

330a6e1

github-actions bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Mar 21, 2026

Dandandan reviewed Mar 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support GroupsAccumulator for first_value and last_value with string/binary types#21090

feat: support GroupsAccumulator for first_value and last_value with string/binary types#21090
UBarney wants to merge 1 commit intoapache:mainfrom
UBarney:first_val_group_acc_string

UBarney commented Mar 21, 2026

Uh oh!

Dandandan Mar 21, 2026

Uh oh!

UBarney Mar 24, 2026

Uh oh!

Dandandan Mar 24, 2026

Uh oh!

Dandandan Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

UBarney commented Mar 21, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Dandandan Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

UBarney Mar 24, 2026

Choose a reason for hiding this comment

Data Structures

Update Logic

GC (Compaction) Logic

Uh oh!

Dandandan Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Dandandan Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants