Skip to content

fix attributes dropped when decoding OTAP -> OTLP proto bytes when IDs are out of order#2421

Merged
jmacd merged 7 commits intoopen-telemetry:mainfrom
albertlockett:albert/2270
Mar 25, 2026
Merged

fix attributes dropped when decoding OTAP -> OTLP proto bytes when IDs are out of order#2421
jmacd merged 7 commits intoopen-telemetry:mainfrom
albertlockett:albert/2270

Conversation

@albertlockett
Copy link
Copy Markdown
Member

@albertlockett albertlockett commented Mar 24, 2026

Change Summary

Fixes issue decoding OTAP to OTLP proto bytes when the ID column of the root record batch is not in sorted order.

When we do this decoding, we initialize cursors for the order in which to visit the rows of each record batch. When we visit some row in the root record batch, we then try to find its attributes by advancing the attributes cursor until the parent_id column matches the ID of the current row in the root record batch.

The whole scheme is predicated on the assumption that we'll visit the id/parent_id column in the same order. However, we initialize the cursor for the attribtues record batch in sorted parent_id order, but for the root record batch we were only initializing it in sorted order of resource/scope ID.

The fix is to add the ID column to the multi-column sort which is used to initialize the cursor for the root record batch.

This PR also makes the change to avoid using RowConverter for the multi-column sort for the root record batch cursor init, instead opting to simply pack the IDs into a Vec of some unsized int (width depending on how many ID columns we're packing), and then sort this. This slightly improves performance (even after adding the additional column to the sort) and I added a benchmark to measure. Bench: main 337µs , after: 305µs

Note: one side-effect of this change (which in my opinion is OK), is that all the rows with a null ID (e.g. rows w/ no attributes) will appear first in the decoded OTLP message. This is b/c arrow typically uses 0 as a placeholder in int arrays in null rows, and the sort we are doing does not bother looking at the null buffer for best performance.

What issue does this PR close?

How are these changes tested?

Unit tests

Are there any user-facing changes?

No

@github-actions github-actions Bot added rust Pull requests that update Rust code query-engine Query Engine / Transform related tasks query-engine-columnar Columnar query engine which uses DataFusion to process OTAP Batches labels Mar 24, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 24, 2026

Codecov Report

❌ Patch coverage is 93.67089% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.01%. Comparing base (f1dfdbb) to head (f0c6d8c).
⚠️ Report is 8 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2421      +/-   ##
==========================================
- Coverage   88.02%   88.01%   -0.01%     
==========================================
  Files         582      582              
  Lines      204775   204888     +113     
==========================================
+ Hits       180247   180342      +95     
- Misses      24002    24020      +18     
  Partials      526      526              
Components Coverage Δ
otap-dataflow 90.06% <93.67%> (-0.01%) ⬇️
query_abstraction 80.61% <ø> (ø)
query_engine 90.61% <ø> (ø)
syslog_cef_receivers ∅ <ø> (∅)
otel-arrow-go 52.44% <ø> (ø)
quiver 91.91% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@albertlockett albertlockett marked this pull request as ready for review March 24, 2026 15:15
@albertlockett albertlockett requested a review from a team as a code owner March 24, 2026 15:15
[Some(ids), None] => {
let ids = ids.as_any().downcast_ref::<UInt16Array>().ok_or_else(|| {
if let Some(ids) = record_batch.column_by_name(consts::ID) {
sort_columns[sort_columns_idx] =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: sort_columns_idx += 1; in case it is used in future?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be a clippy error because the value we assign never gets read. If a new ID column gets added to the list of sort columns in the future, we can update the sort_column_index then. However, I doubt we'll add more columns to this multi-column sort.

})?);
}

match sort_columns {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth mentioning I think that we do something pretty similar here, but for the two column cases:

pub fn sort_by_parent_then_id(rb: RecordBatch) -> Result<RecordBatch> {
let schema = rb.schema();

I don't think you have to do it now, but I wonder if we can create one "sort by ids" function that always does the best thing based on the inputs

@albertlockett
Copy link
Copy Markdown
Member Author

albertlockett commented Mar 25, 2026

After some discussion w/ @JakeDern , we've determined this only fixes the issue where there is an invariant such that all the values in the ID column in rows having some scope.id "X" will be less than all the values in the ID column in rows having some scope.id "Y" where X < Y. Or stated more simply, as the scope/resource ID increase, so to do we expect this of the ID column.

We currently maintain this invariant by pure coincidence by virtue of how we encode OTAP data, and by the fact that as we're transforming the data, we have no good reason to start mixing up the values in the ID columns. That said, there's nothing in the OTAP spec that says we must have our ID columns ordered in such a way.

Indeed, it's possible to manually craft by hand a truely pathological record batch that still doesn't decode properly:

        // Build an otap batch where:
        //
        //   Row 0: resource_id=0, scope_id=5   (high scope_id in first resource)
        //   Row 1: resource_id=1, scope_id=2   (low scope_id in second resource)
        //   Row 2: resource_id=1, scope_id=3   (another low scope_id in second resource)
        //
        // After sorting by (resource_id, scope_id), visit order is rows [0, 1, 2]
        // with scope_ids visited: [5, 2, 3] - non-monotonic at the 5 -> 2 boundary.
        let str_type = 1u8; // AttributeValueType::Str
        #[rustfmt::skip]
        let mut otap_batch: OtapArrowRecords = logs!(
            (Logs,
                ("id", UInt16, vec![0u16, 1, 2]),
                ("resource.id", UInt16, vec![0u16, 1, 1]),
                ("scope.id", UInt16, vec![5u16, 2, 3]),
                ("scope.name", Utf8, vec!["scope-five", "scope-two", "scope-three"])),
            (ScopeAttrs,
                ("parent_id", UInt16, vec![2u16, 3, 5]),
                ("type", UInt8, vec![str_type, str_type, str_type]),
                ("key", Utf8, vec!["sk", "sk", "sk"]),
                ("str", Utf8, vec!["scope2-attr", "scope3-attr", "scope5-attr"])),
        ).into();

I'm tempted to handle this additional case as a followup (I don't believe the solution will conflict), as it currently solves an issue that can be easily encountered in OPL/Transform processor. However I throw myself upon the mercy of my reviewers and if anyone feels this all must be fixed at once I will kindly do so.

@JakeDern
Copy link
Copy Markdown
Contributor

After some discussion w/ @JakeDern , we've determined this only fixes the issue where there is an invariant such that all the values in the ID column in rows having some scope.id "X" will be less than all the values in the ID column in rows having some scope.id "Y" where X < Y. Or stated more simply, as the scope/resource ID increase, so to do we expect this of the ID column.

We currently maintain this invariant by pure coincidence by virtue of how we encode OTAP data, and by the fact that as we're transforming the data, we have no good reason to start mixing up the values in the ID columns. That said, there's nothing in the OTAP spec that says we must have our ID columns ordered in such a way.

Indeed, it's possible to manually craft by hand a truely pathological record batch that still doesn't decode properly:

        // Build an otap batch where:
        //
        //   Row 0: resource_id=0, scope_id=5   (high scope_id in first resource)
        //   Row 1: resource_id=1, scope_id=2   (low scope_id in second resource)
        //   Row 2: resource_id=1, scope_id=3   (another low scope_id in second resource)
        //
        // After sorting by (resource_id, scope_id), visit order is rows [0, 1, 2]
        // with scope_ids visited: [5, 2, 3] - non-monotonic at the 5 -> 2 boundary.
        let str_type = 1u8; // AttributeValueType::Str
        #[rustfmt::skip]
        let mut otap_batch: OtapArrowRecords = logs!(
            (Logs,
                ("id", UInt16, vec![0u16, 1, 2]),
                ("resource.id", UInt16, vec![0u16, 1, 1]),
                ("scope.id", UInt16, vec![5u16, 2, 3]),
                ("scope.name", Utf8, vec!["scope-five", "scope-two", "scope-three"])),
            (ScopeAttrs,
                ("parent_id", UInt16, vec![2u16, 3, 5]),
                ("type", UInt8, vec![str_type, str_type, str_type]),
                ("key", Utf8, vec!["sk", "sk", "sk"]),
                ("str", Utf8, vec!["scope2-attr", "scope3-attr", "scope5-attr"])),
        ).into();

I'm tempted to handle this additional case as a followup (I don't believe the solution will conflict), as it currently solves an issue that can be easily encountered in OPL/Transform processor. However I throw myself upon the mercy of my reviewers and if anyone feels this all must be fixed at once I will kindly do so.

Truly pathological record batches are something of an interest of mine 😉 - I think we should file another bug with this test case for follow-up and merge this PR as it's strictly an improvement IMO.

@jmacd jmacd added this pull request to the merge queue Mar 25, 2026
Merged via the queue into open-telemetry:main with commit 9cdb34e Mar 25, 2026
69 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

query-engine Query Engine / Transform related tasks query-engine-columnar Columnar query engine which uses DataFusion to process OTAP Batches rust Pull requests that update Rust code

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants