JS: Add regression test for YAML extraction#21571
Conversation
SnakeYAML 2.3 has [a bug](https://bitbucket.org/snakeyaml/snakeyaml/issues/1098) where it crashes with an `IndexOutOfBoundsException` when a Unicode surrogate pair (e.g. an emoji) straddles the 1024 character internal buffer boundary. This happens because the high surrogate can end up as the last character in the data window, and the reader tries to read the low surrogate past the end of the buffer. This caused languages that extract YAML, most notably JavaScript and Actions, to fail when the codebase contained a YAML file with an emoji at an unlucky position in the file.
There was a problem hiding this comment.
Pull request overview
Adds a JavaScript YAML-extractor regression test for the SnakeYAML 2.3 surrogate-pair-at-buffer-boundary crash described in the PR.
Changes:
- Add a crafted YAML input containing an emoji positioned to exercise the internal 1024-char buffer boundary behavior.
- Add the corresponding expected TRAP output for YAML extraction.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| javascript/extractor/tests/yaml/input/emoji_buffer_boundary.yml | New YAML input intended to reproduce the surrogate-pair boundary condition. |
| javascript/extractor/tests/yaml/output/trap/emoji_buffer_boundary.yml.trap | Expected TRAP output for the new YAML input, used by the extractor TRAP comparison tests. |
javascript/extractor/tests/yaml/output/trap/emoji_buffer_boundary.yml.trap
Show resolved
Hide resolved
javascript/extractor/tests/yaml/output/trap/emoji_buffer_boundary.yml.trap
Show resolved
Hide resolved
geoffw0
left a comment
There was a problem hiding this comment.
This PR adds a regression test to validate the fix (when it's in)
So what is the .trap we're seeing now? I was expecting some kind of evidence of a crash.
|
It's the TRAP from running the test with the YAML extraction fix applied — once that fix is merged to main internally, we should be able to rerun this test and have it pass. |
geoffw0
left a comment
There was a problem hiding this comment.
In which case I'll approve, obviously this won't be mergeable until the above has happened.
CodeQL 2.25.0 updated SnakeYAML to version 2.3. However this version of SnakeYAML has a bug where it crashes with an
IndexOutOfBoundsExceptionwhen a Unicode surrogate pair (e.g. an emoji) straddles the 1024 character internal buffer boundary. This happens because the high surrogate can end up as the last character in the data window, and the reader tries to read the low surrogate past the end of the buffer.This caused languages that extract YAML, most notably JavaScript and Actions, to fail when the codebase contained a YAML file with an emoji at an unlucky position in the file.
This PR adds a regression test to validate the fix (when it's in), and help avoid regressions.