Skip to content

JS: Add regression test for YAML extraction#21571

Open
henrymercer wants to merge 1 commit intomainfrom
henrymercer/yaml-regression-test
Open

JS: Add regression test for YAML extraction#21571
henrymercer wants to merge 1 commit intomainfrom
henrymercer/yaml-regression-test

Conversation

@henrymercer
Copy link
Contributor

CodeQL 2.25.0 updated SnakeYAML to version 2.3. However this version of SnakeYAML has a bug where it crashes with an IndexOutOfBoundsException when a Unicode surrogate pair (e.g. an emoji) straddles the 1024 character internal buffer boundary. This happens because the high surrogate can end up as the last character in the data window, and the reader tries to read the low surrogate past the end of the buffer.

This caused languages that extract YAML, most notably JavaScript and Actions, to fail when the codebase contained a YAML file with an emoji at an unlucky position in the file.

This PR adds a regression test to validate the fix (when it's in), and help avoid regressions.

SnakeYAML 2.3 has [a bug](https://bitbucket.org/snakeyaml/snakeyaml/issues/1098) where it crashes with an `IndexOutOfBoundsException` when a Unicode surrogate pair (e.g. an emoji) straddles the 1024 character internal buffer boundary.  This happens because the high surrogate can end up as the last character in the data window, and the reader tries to read the low surrogate past the end of the buffer.

This caused languages that extract YAML, most notably JavaScript and Actions, to fail when the codebase contained a YAML file with an emoji at an unlucky position in the file.
@henrymercer henrymercer requested review from a team as code owners March 24, 2026 18:54
Copilot AI review requested due to automatic review settings March 24, 2026 18:54
@github-actions github-actions bot added the JS label Mar 24, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a JavaScript YAML-extractor regression test for the SnakeYAML 2.3 surrogate-pair-at-buffer-boundary crash described in the PR.

Changes:

  • Add a crafted YAML input containing an emoji positioned to exercise the internal 1024-char buffer boundary behavior.
  • Add the corresponding expected TRAP output for YAML extraction.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
javascript/extractor/tests/yaml/input/emoji_buffer_boundary.yml New YAML input intended to reproduce the surrogate-pair boundary condition.
javascript/extractor/tests/yaml/output/trap/emoji_buffer_boundary.yml.trap Expected TRAP output for the new YAML input, used by the extractor TRAP comparison tests.

Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR adds a regression test to validate the fix (when it's in)

So what is the .trap we're seeing now? I was expecting some kind of evidence of a crash.

@henrymercer
Copy link
Contributor Author

It's the TRAP from running the test with the YAML extraction fix applied — once that fix is merged to main internally, we should be able to rerun this test and have it pass.

Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which case I'll approve, obviously this won't be mergeable until the above has happened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants