fix(mdxish): properly parse JSX comments in editor deserialization#1419
Conversation
There was a problem hiding this comment.
Question, If the opening comment section is indented, should this tokenizer still work? Seems like it does with MDX, example:
{/*
[block:image]
{
"images": [{"image": ["https://example.com/img.png", null, "Alt"]}]
}
[/block]
*/}
Hi there!
There was a problem hiding this comment.
yes it should work! are you seeing something different?
There was a problem hiding this comment.
okay this looks to be broken in the editor but it looks like its a monorepo issue not an engine issue
…rm-15676-does-not-deserialize-magic-block-within-jsx-comment
| return mdastV6(doc, { rdmd }); | ||
| }; | ||
|
|
||
| export const parseMdxishMdast = (md: string, opts: MdxishOpts = {}): MdastRoot => { |
There was a problem hiding this comment.
might be useful for other test files so hoisted this up in the general helpers file
There was a problem hiding this comment.
the ideal path is that sometime in the future, we can revisit this work and remove the removeJsxComment preprocessing so we dont have to keep this file and no longer worry about diverging regex rules. but this should act as a bandaid to keep them in check for the mean time
happy to revisit in the future to unify those two
There was a problem hiding this comment.
Ok I barely comprehend how what's happening in this PR fits into the greater pipeline. But why are we emitting custom namesjsxComment, jsxCommentLineEnding, jsxCommentMarkerEnd, etc. that you have defined on L7-L15?
Couldn't this tokenizer just emit mdxFlowExpression and mdxFlowExpressionChunk types? I think those are the expected types that'd normally come from micromark-extension-mdx-expression if we weren't excluding detection of flow constructs here:
Lines 152 to 157 in f20d1d0
That'd altogether remove the need to process these special types downstream with lib/mdast-util/jsx-comment/index.ts and clean this PR quite a bit I think.
Another question is have you considered what it'd look like to refactor the magic block tokenizer to be robust enough to coexist with the full mdxExpression extension from micromark-extension-mdx-expression instead of limiting it to non-flow constructs? I just wonder if we're gonna keep running into issues with multi-line expressions.
There was a problem hiding this comment.
Couldn't this tokenizer just emit mdxFlowExpression and mdxFlowExpressionChunk types? I think those are the expected types that'd normally come from micromark-extension-mdx-expression if we weren't excluding detection of flow constructs here:
yea i think i was overthinking this, thanks for catching this. going to simplify it!
@kevinports i did try to consider this wayyy back when i introduced the magic block tokenizer but the main issue was that if say we have this: this is technically valid magic block and it is rendered atm, but the main issue is that there is a gap between the ill keep trying to make it more robust atm but it might be worth experimenting on that in a follow up PR |
…rm-15676-does-not-deserialize-magic-block-within-jsx-comment
## Version 14.0.0
### ⚠ BREAKING CHANGES
* **mdxish:** magic blocks), the parser had no way to recognize `{/* ... */}`
as a single expression
> [!NOTE]
> Technically, introducing the flow construct of the `mdxExpression`
would partly fix this issue but we cannot do this because it would break
magic block parsing. magic blocks usually contain `{ \n \n }` JSON
payloads, and the flow construct would eat them before the magic block
tokenizer got a chance to run.
created a dedicated `jsxComment` micromark tokenizer for JSX comments.
enabled for all branches instead of being gated by `newEditorTypes`.
> [!NOTE]
In the future as a follow up, we can work in integrating this new
tokenizer and fully removing the `removeJSXComments` preprocessing step.
similar to what we do in the `stripComments`
## 🧪 QA tips
Try pasting this in the editor (both rich and raw):
```
{/* Dropbox hosted. What was it?
[block:image]
{
"images": [
{
"image": [
"https://paper-attachments.dropboxusercontent.com/s_924615C008BE616D2ABE30441190A9401D739971D9AD94E8B5B46A426C5F05E8_1683123013834_Dev+Dash+-+hero+docs+shot.png",
null,
"The My Developers page in your ReadMe project dashboard"
],
"align": "center",
"border": true,
"caption": "The My Developers page in your ReadMe project dashboard"
}
]
}
[/block]
*/}
```
It should be treated as one big comment block instead of a rendered
image.
## 📸 Screenshot or Loom
https://github.com/user-attachments/assets/9b619f54-3d6e-456f-83be-27d4076a7a3a
### ✨ New & Improved
* **mdxish:** add micromark tokenizer for MDX component parsing ([#1426](#1426)) ([a77707d](a77707d))
* move numbered lists for 1 -> a -> i ([#1438](#1438)) ([7e614d4](7e614d4))
### 🛠 Fixes & Updates
* **mdxish:** properly parse JSX comments in editor deserialization ([#1419](#1419)) ([95e1969](95e1969))
<!--SKIP CI-->
This PR was released!🚀 Changes included in v14.0.0 |

🎯 What does this PR do?
Warning
This PR should not affect rendering whatsoever. The
mdxish()rendering pipeline still strips JSX comments at the string level viaremoveJSXCommentsbefore the tokenizer ever runs, so rendered output is byte-identical.JSX comments (
{/* ... */}) wrapping magic blocks (or any multiline content with blank lines) were being split into separate AST nodes during editor deserialization, causing commented-out content to render as if it were active.escapeProblematicBracesin preprocessing escaped{to\{whenever an expression spanned a blank line, which includes JSX comments because they are essentially expressionsmdxExpressionflow construct (removed to prevent breaking magic blocks), the parser had no way to recognize{/* ... */}as a single expressionNote
Technically, introducing the flow construct of the
mdxExpressionwould partly fix this issue but we cannot do this because it would break magic block parsing. magic blocks usually contain{ \n \n }JSON payloads, and the flow construct would eat them before the magic block tokenizer got a chance to run.created a dedicated
jsxCommentmicromark tokenizer for JSX comments. enabled for all branches instead of being gated bynewEditorTypes.Note
In the future as a follow up, we can work in integrating this new tokenizer and fully removing the
removeJSXCommentspreprocessing step. similar to what we do in thestripComments🧪 QA tips
Try pasting this in the editor (both rich and raw):
It should be treated as one big comment block instead of a rendered image.
📸 Screenshot or Loom
Screen.Recording.2026-04-02.at.01.32.19.mov