fix attributes dropped when decoding OTAP -> OTLP proto bytes when IDs are out of order by albertlockett · Pull Request #2421 · open-telemetry/otel-arrow

albertlockett · 2026-03-24T12:32:50Z

Change Summary

Fixes issue decoding OTAP to OTLP proto bytes when the ID column of the root record batch is not in sorted order.

When we do this decoding, we initialize cursors for the order in which to visit the rows of each record batch. When we visit some row in the root record batch, we then try to find its attributes by advancing the attributes cursor until the parent_id column matches the ID of the current row in the root record batch.

The whole scheme is predicated on the assumption that we'll visit the id/parent_id column in the same order. However, we initialize the cursor for the attribtues record batch in sorted parent_id order, but for the root record batch we were only initializing it in sorted order of resource/scope ID.

The fix is to add the ID column to the multi-column sort which is used to initialize the cursor for the root record batch.

This PR also makes the change to avoid using RowConverter for the multi-column sort for the root record batch cursor init, instead opting to simply pack the IDs into a Vec of some unsized int (width depending on how many ID columns we're packing), and then sort this. This slightly improves performance (even after adding the additional column to the sort) and I added a benchmark to measure. Bench: main 337µs , after: 305µs

Note: one side-effect of this change (which in my opinion is OK), is that all the rows with a null ID (e.g. rows w/ no attributes) will appear first in the decoded OTLP message. This is b/c arrow typically uses 0 as a placeholder in int arrays in null rows, and the sort we are doing does not bother looking at the null buffer for best performance.

What issue does this PR close?

Relates to Missing attribute after decoding OTAP to OTLP #2270

How are these changes tested?

Unit tests

Are there any user-facing changes?

No

codecov · 2026-03-24T12:35:59Z

Codecov Report

❌ Patch coverage is 93.67089% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.01%. Comparing base (f1dfdbb) to head (f0c6d8c).
⚠️ Report is 8 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2421      +/-   ##
==========================================
- Coverage   88.02%   88.01%   -0.01%     
==========================================
  Files         582      582              
  Lines      204775   204888     +113     
==========================================
+ Hits       180247   180342      +95     
- Misses      24002    24020      +18     
  Partials      526      526

Components	Coverage Δ
otap-dataflow	`90.06% <93.67%> (-0.01%)`	⬇️
query_abstraction	`80.61% <ø> (ø)`
query_engine	`90.61% <ø> (ø)`
syslog_cef_receivers	`∅ <ø> (∅)`
otel-arrow-go	`52.44% <ø> (ø)`
quiver	`91.91% <ø> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

ThomsonTan · 2026-03-24T19:02:50Z

-            [Some(ids), None] => {
-                let ids = ids.as_any().downcast_ref::<UInt16Array>().ok_or_else(|| {
+        if let Some(ids) = record_batch.column_by_name(consts::ID) {
+            sort_columns[sort_columns_idx] =


nit: sort_columns_idx += 1; in case it is used in future?

That would be a clippy error because the value we assign never gets read. If a new ID column gets added to the list of sort columns in the future, we can update the sort_column_index then. However, I doubt we'll add more columns to this multi-column sort.

JakeDern · 2026-03-24T22:25:21Z

+                })?);
+        }
+
+        match sort_columns {


Worth mentioning I think that we do something pretty similar here, but for the two column cases:

otel-arrow/rust/otap-dataflow/crates/pdata/src/otap/transform/util.rs

Lines 136 to 137 in 0dde9c1

pub fn sort_by_parent_then_id(rb: RecordBatch) -> Result<RecordBatch> {

let schema = rb.schema();

I don't think you have to do it now, but I wonder if we can create one "sort by ids" function that always does the best thing based on the inputs

albertlockett · 2026-03-25T16:51:48Z

After some discussion w/ @JakeDern , we've determined this only fixes the issue where there is an invariant such that all the values in the ID column in rows having some scope.id "X" will be less than all the values in the ID column in rows having some scope.id "Y" where X < Y. Or stated more simply, as the scope/resource ID increase, so to do we expect this of the ID column.

We currently maintain this invariant by pure coincidence by virtue of how we encode OTAP data, and by the fact that as we're transforming the data, we have no good reason to start mixing up the values in the ID columns. That said, there's nothing in the OTAP spec that says we must have our ID columns ordered in such a way.

Indeed, it's possible to manually craft by hand a truely pathological record batch that still doesn't decode properly:

        // Build an otap batch where:
        //
        //   Row 0: resource_id=0, scope_id=5   (high scope_id in first resource)
        //   Row 1: resource_id=1, scope_id=2   (low scope_id in second resource)
        //   Row 2: resource_id=1, scope_id=3   (another low scope_id in second resource)
        //
        // After sorting by (resource_id, scope_id), visit order is rows [0, 1, 2]
        // with scope_ids visited: [5, 2, 3] - non-monotonic at the 5 -> 2 boundary.
        let str_type = 1u8; // AttributeValueType::Str
        #[rustfmt::skip]
        let mut otap_batch: OtapArrowRecords = logs!(
            (Logs,
                ("id", UInt16, vec![0u16, 1, 2]),
                ("resource.id", UInt16, vec![0u16, 1, 1]),
                ("scope.id", UInt16, vec![5u16, 2, 3]),
                ("scope.name", Utf8, vec!["scope-five", "scope-two", "scope-three"])),
            (ScopeAttrs,
                ("parent_id", UInt16, vec![2u16, 3, 5]),
                ("type", UInt8, vec![str_type, str_type, str_type]),
                ("key", Utf8, vec!["sk", "sk", "sk"]),
                ("str", Utf8, vec!["scope2-attr", "scope3-attr", "scope5-attr"])),
        ).into();

I'm tempted to handle this additional case as a followup (I don't believe the solution will conflict), as it currently solves an issue that can be easily encountered in OPL/Transform processor. However I throw myself upon the mercy of my reviewers and if anyone feels this all must be fixed at once I will kindly do so.

JakeDern · 2026-03-25T16:57:52Z

After some discussion w/ @JakeDern , we've determined this only fixes the issue where there is an invariant such that all the values in the ID column in rows having some scope.id "X" will be less than all the values in the ID column in rows having some scope.id "Y" where X < Y. Or stated more simply, as the scope/resource ID increase, so to do we expect this of the ID column.

We currently maintain this invariant by pure coincidence by virtue of how we encode OTAP data, and by the fact that as we're transforming the data, we have no good reason to start mixing up the values in the ID columns. That said, there's nothing in the OTAP spec that says we must have our ID columns ordered in such a way.

Indeed, it's possible to manually craft by hand a truely pathological record batch that still doesn't decode properly:
        // Build an otap batch where:
        //
        //   Row 0: resource_id=0, scope_id=5   (high scope_id in first resource)
        //   Row 1: resource_id=1, scope_id=2   (low scope_id in second resource)
        //   Row 2: resource_id=1, scope_id=3   (another low scope_id in second resource)
        //
        // After sorting by (resource_id, scope_id), visit order is rows [0, 1, 2]
        // with scope_ids visited: [5, 2, 3] - non-monotonic at the 5 -> 2 boundary.
        let str_type = 1u8; // AttributeValueType::Str
        #[rustfmt::skip]
        let mut otap_batch: OtapArrowRecords = logs!(
            (Logs,
                ("id", UInt16, vec![0u16, 1, 2]),
                ("resource.id", UInt16, vec![0u16, 1, 1]),
                ("scope.id", UInt16, vec![5u16, 2, 3]),
                ("scope.name", Utf8, vec!["scope-five", "scope-two", "scope-three"])),
            (ScopeAttrs,
                ("parent_id", UInt16, vec![2u16, 3, 5]),
                ("type", UInt8, vec![str_type, str_type, str_type]),
                ("key", Utf8, vec!["sk", "sk", "sk"]),
                ("str", Utf8, vec!["scope2-attr", "scope3-attr", "scope5-attr"])),
        ).into();
I'm tempted to handle this additional case as a followup (I don't believe the solution will conflict), as it currently solves an issue that can be easily encountered in OPL/Transform processor. However I throw myself upon the mercy of my reviewers and if anyone feels this all must be fixed at once I will kindly do so.

Truly pathological record batches are something of an interest of mine 😉 - I think we should file another bug with this test case for follow-up and merge this PR as it's strictly an improvement IMO.

albertlockett added 4 commits March 24, 2026 07:18

fixed bug, but haven't optimized sort

e60fa90

optimize sort and fixed one of the broken tests

734e0dd

fixed more query-engine tests

2593563

added a dedicated ID out of order sort test

6e14349

github-project-automation Bot added this to OTel-Arrow Mar 24, 2026

github-actions Bot added rust Pull requests that update Rust code query-engine Query Engine / Transform related tasks query-engine-columnar Columnar query engine which uses DataFusion to process OTAP Batches labels Mar 24, 2026

fix clippy

c9dc220

albertlockett marked this pull request as ready for review March 24, 2026 15:15

albertlockett requested a review from a team as a code owner March 24, 2026 15:15

albertlockett added 2 commits March 24, 2026 11:16

Fix bench comments

47af85d

fix proto module comments

f0c6d8c

ThomsonTan reviewed Mar 24, 2026

View reviewed changes

JakeDern reviewed Mar 24, 2026

View reviewed changes

JakeDern approved these changes Mar 25, 2026

View reviewed changes

jmacd approved these changes Mar 25, 2026

View reviewed changes

jmacd added this pull request to the merge queue Mar 25, 2026

Merged via the queue into open-telemetry:main with commit 9cdb34e Mar 25, 2026
69 checks passed

github-project-automation Bot moved this to Done in OTel-Arrow Mar 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix attributes dropped when decoding OTAP -> OTLP proto bytes when IDs are out of order#2421

fix attributes dropped when decoding OTAP -> OTLP proto bytes when IDs are out of order#2421
jmacd merged 7 commits intoopen-telemetry:mainfrom
albertlockett:albert/2270

albertlockett commented Mar 24, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Mar 24, 2026 •

edited

Loading

Uh oh!

ThomsonTan Mar 24, 2026

Uh oh!

albertlockett Mar 24, 2026

Uh oh!

JakeDern Mar 24, 2026

Uh oh!

albertlockett commented Mar 25, 2026 •

edited

Loading

Uh oh!

JakeDern commented Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	pub fn sort_by_parent_then_id(rb: RecordBatch) -> Result<RecordBatch> {
	let schema = rb.schema();

Conversation

albertlockett commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Change Summary

What issue does this PR close?

How are these changes tested?

Are there any user-facing changes?

Uh oh!

codecov Bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ThomsonTan Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

albertlockett Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

JakeDern Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

albertlockett commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JakeDern commented Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

albertlockett commented Mar 24, 2026 •

edited

Loading

codecov Bot commented Mar 24, 2026 •

edited

Loading

albertlockett commented Mar 25, 2026 •

edited

Loading