Skip to content

fix(mdxish): properly parse JSX comments in editor deserialization#1419

Merged
maximilianfalco merged 18 commits intonextfrom
falco/rm-15676-does-not-deserialize-magic-block-within-jsx-comment
Apr 15, 2026
Merged

fix(mdxish): properly parse JSX comments in editor deserialization#1419
maximilianfalco merged 18 commits intonextfrom
falco/rm-15676-does-not-deserialize-magic-block-within-jsx-comment

Conversation

@maximilianfalco
Copy link
Copy Markdown
Contributor

@maximilianfalco maximilianfalco commented Apr 1, 2026

🎫 Resolve RM-15676

🎯 What does this PR do?

Warning

This PR should not affect rendering whatsoever. The mdxish() rendering pipeline still strips JSX comments at the string level via removeJSXComments before the tokenizer ever runs, so rendered output is byte-identical.

JSX comments ({/* ... */}) wrapping magic blocks (or any multiline content with blank lines) were being split into separate AST nodes during editor deserialization, causing commented-out content to render as if it were active.

  1. escapeProblematicBraces in preprocessing escaped { to \{ whenever an expression spanned a blank line, which includes JSX comments because they are essentially expressions
  2. Without the mdxExpression flow construct (removed to prevent breaking magic blocks), the parser had no way to recognize {/* ... */} as a single expression

Note

Technically, introducing the flow construct of the mdxExpression would partly fix this issue but we cannot do this because it would break magic block parsing. magic blocks usually contain { \n \n } JSON payloads, and the flow construct would eat them before the magic block tokenizer got a chance to run.

created a dedicated jsxComment micromark tokenizer for JSX comments. enabled for all branches instead of being gated by newEditorTypes.

Note

In the future as a follow up, we can work in integrating this new tokenizer and fully removing the removeJSXComments preprocessing step. similar to what we do in the stripComments

🧪 QA tips

Try pasting this in the editor (both rich and raw):

{/* Dropbox hosted. What was it?

[block:image]
{
  "images": [
    {
      "image": [
        "https://paper-attachments.dropboxusercontent.com/s_924615C008BE616D2ABE30441190A9401D739971D9AD94E8B5B46A426C5F05E8_1683123013834_Dev+Dash+-+hero+docs+shot.png",
        null,
        "The My Developers page in your ReadMe project dashboard"
      ],
      "align": "center",
      "border": true,
      "caption": "The My Developers page in your ReadMe project dashboard"
    }
  ]
}
[/block]

*/}

It should be treated as one big comment block instead of a rendered image.

📸 Screenshot or Loom

Screen.Recording.2026-04-02.at.01.32.19.mov

@maximilianfalco maximilianfalco changed the title fix(mdxish): properly parse jsx comments in editor deserialization fix(mdxish): properly parse JSX comments in editor deserialization Apr 1, 2026
Comment thread package.json Outdated
Comment thread processor/transform/mdxish/preprocess-jsx-expressions.ts Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question, If the opening comment section is indented, should this tokenizer still work? Seems like it does with MDX, example:

   {/*
[block:image]
{
  "images": [{"image": ["https://example.com/img.png", null, "Alt"]}]
}
[/block]
*/}

Hi there!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it should work! are you seeing something different?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a test case for this here 05e3622

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay this looks to be broken in the editor but it looks like its a monorepo issue not an engine issue

Comment thread lib/mdxish.ts Outdated
Comment thread processor/transform/mdxish/preprocess-jsx-expressions.ts Outdated
…rm-15676-does-not-deserialize-magic-block-within-jsx-comment
Comment thread lib/mdxish.ts Outdated
Comment thread __tests__/helpers.ts Outdated
return mdastV6(doc, { rdmd });
};

export const parseMdxishMdast = (md: string, opts: MdxishOpts = {}): MdastRoot => {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be useful for other test files so hoisted this up in the general helpers file

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the ideal path is that sometime in the future, we can revisit this work and remove the removeJsxComment preprocessing so we dont have to keep this file and no longer worry about diverging regex rules. but this should act as a bandaid to keep them in check for the mean time

happy to revisit in the future to unify those two

Copy link
Copy Markdown
Contributor

@kevinports kevinports Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I barely comprehend how what's happening in this PR fits into the greater pipeline. But why are we emitting custom namesjsxComment, jsxCommentLineEnding, jsxCommentMarkerEnd, etc. that you have defined on L7-L15?

Couldn't this tokenizer just emit mdxFlowExpression and mdxFlowExpressionChunk types? I think those are the expected types that'd normally come from micromark-extension-mdx-expression if we weren't excluding detection of flow constructs here:

markdown/lib/mdxish.ts

Lines 152 to 157 in f20d1d0

// Get mdxExpression extension and remove its flow construct to prevent
// `{...}` from interrupting paragraphs (which breaks multiline magic blocks)
const mdxExprExt = mdxExpression({ allowEmpty: true });
const mdxExprTextOnly: Extension = {
text: mdxExprExt.text,
};

That'd altogether remove the need to process these special types downstream with lib/mdast-util/jsx-comment/index.ts and clean this PR quite a bit I think.


Another question is have you considered what it'd look like to refactor the magic block tokenizer to be robust enough to coexist with the full mdxExpression extension from micromark-extension-mdx-expression instead of limiting it to non-flow constructs? I just wonder if we're gonna keep running into issues with multi-line expressions.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this tokenizer just emit mdxFlowExpression and mdxFlowExpressionChunk types? I think those are the expected types that'd normally come from micromark-extension-mdx-expression if we weren't excluding detection of flow constructs here:

yea i think i was overthinking this, thanks for catching this. going to simplify it!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simplified in 4177147

@maximilianfalco
Copy link
Copy Markdown
Contributor Author

Another question is have you considered what it'd look like to refactor the magic block tokenizer to be robust enough to coexist with the full mdxExpression extension from micromark-extension-mdx-expression instead of limiting it to non-flow constructs? I just wonder if we're gonna keep running into issues with multi-line expressions.

@kevinports i did try to consider this wayyy back when i introduced the magic block tokenizer but the main issue was that if say we have this:

[block:image]

{
  "images": [{"image": ["https://files.readme.io/6f52e22-man-eating-pizza-and-making-an-ok-gesture.jpg", null, "Alt"]}]
}

[/block]

this is technically valid magic block and it is rendered atm, but the main issue is that there is a gap between the [block] markers and the actual data (the one encased in {}). the issue is that since there is a new line, the opening and closing {} are considered standalone and hence being consumed by the flowExpression tokenizer when it should be consumed by the magic block tokenizer instead

ill keep trying to make it more robust atm but it might be worth experimenting on that in a follow up PR

Comment thread __tests__/transformers/preprocess-jsx-expressions.test.ts
Comment thread __tests__/helpers.ts
@maximilianfalco maximilianfalco merged commit 95e1969 into next Apr 15, 2026
10 of 11 checks passed
@maximilianfalco maximilianfalco deleted the falco/rm-15676-does-not-deserialize-magic-block-within-jsx-comment branch April 15, 2026 01:15
rafegoldberg pushed a commit that referenced this pull request Apr 16, 2026
## Version 14.0.0
### ⚠ BREAKING CHANGES

* **mdxish:** magic blocks), the parser had no way to recognize `{/* ... */}`
as a single expression

> [!NOTE]
> Technically, introducing the flow construct of the `mdxExpression`
would partly fix this issue but we cannot do this because it would break
magic block parsing. magic blocks usually contain `{ \n \n }` JSON
payloads, and the flow construct would eat them before the magic block
tokenizer got a chance to run.

created a dedicated `jsxComment` micromark tokenizer for JSX comments.
enabled for all branches instead of being gated by `newEditorTypes`.

> [!NOTE]
In the future as a follow up, we can work in integrating this new
tokenizer and fully removing the `removeJSXComments` preprocessing step.
similar to what we do in the `stripComments`

## 🧪 QA tips

Try pasting this in the editor (both rich and raw):
```
{/* Dropbox hosted. What was it?

[block:image]
{
  "images": [
    {
      "image": [
        "https://paper-attachments.dropboxusercontent.com/s_924615C008BE616D2ABE30441190A9401D739971D9AD94E8B5B46A426C5F05E8_1683123013834_Dev+Dash+-+hero+docs+shot.png",
        null,
        "The My Developers page in your ReadMe project dashboard"
      ],
      "align": "center",
      "border": true,
      "caption": "The My Developers page in your ReadMe project dashboard"
    }
  ]
}
[/block]

*/}
```
It should be treated as one big comment block instead of a rendered
image.

## 📸 Screenshot or Loom

https://github.com/user-attachments/assets/9b619f54-3d6e-456f-83be-27d4076a7a3a

### ✨ New & Improved

* **mdxish:** add micromark tokenizer for MDX component parsing ([#1426](#1426)) ([a77707d](a77707d))
* move numbered lists for 1 -> a -> i ([#1438](#1438)) ([7e614d4](7e614d4))

### 🛠 Fixes & Updates

* **mdxish:** properly parse JSX comments in editor deserialization ([#1419](#1419)) ([95e1969](95e1969))

<!--SKIP CI-->
@rafegoldberg
Copy link
Copy Markdown
Contributor

This PR was released!

🚀 Changes included in v14.0.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants