Auto-transform the Bible HTML from getPassage so the consumer doesn't have extra steps#216
Auto-transform the Bible HTML from getPassage so the consumer doesn't have extra steps#216cameronapak wants to merge 8 commits into
Conversation
getPassage now automatically sanitizes and transforms HTML content
before returning — verse wrapping, footnote extraction, nbsp, and
table fixes all happen at the root. Uses native DOMParser in browser,
dynamic import('linkedom') on server. Added data-yv-transformed
idempotency marker so double-transforms are a no-op.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
🦋 Changeset detectedLatest commit: 4494016 The changes in this PR will be included in the next version bump. This PR includes changesets to release 4 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
Run XSS sanitization before idempotency check so data-yv-transformed cannot bypass sanitizeBibleHtmlDocument. Add clear error message when linkedom is missing on server instead of opaque module-not-found error. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
|
||
| expect(result.html).not.toContain('onclick'); | ||
| expect(result.html).toContain('<p>'); | ||
| expect(result.html).toContain('<p'); |
There was a problem hiding this comment.
note: if you're wondering why this tag is seemingly cut off, it's because the tags would contain a data attribute in this new PR, which would then make it where the tag is something like <p data-yv-attribute> versus <p> standalone
There was a problem hiding this comment.
can this expect() call do regular expressions? we don't have pre tags yet, that I know of, but still... it'd be great to tighten this up if that's not too difficult.
davidfedor
left a comment
There was a problem hiding this comment.
Big picture: I love the idea of being helpful, without requiring the developer to have to make another call. My comments and questions are around whether this is the best way to do that. (Maybe it is! I'm not sure yet.)
I notice this would be blurring the lines between Core being merely an API helper-layer, but now it would be doing some of the prep-work of the UI (visualization layer). So at the least having that be optional seems wise.
I'm wondering if that parameter should default to do the transformation, or not... or whether we need to force the dev to make a choice (to attempt to force them to make an informed choice).
| } catch { | ||
| throw new Error( | ||
| 'Server-side HTML transformation requires "linkedom". ' + | ||
| 'Install it as a dependency or pass format: "text" to skip transformation.', |
There was a problem hiding this comment.
This might be better if there was a supported way to get raw html (untransformed), for people who don't want to import linkedom or who (for whatever reason) want the original data. How about a new format option, "rawhtml" or something like that?
There was a problem hiding this comment.
(I'm writing this here because this error path is not something a builder will probably be excited to be in. The fix would mostly be elsewhere.)
There was a problem hiding this comment.
... or add another parameter so that the format can stay "html". That feels like a better idea to me.
There was a problem hiding this comment.
I like what you're processing. I've added a new commit to have the escape hatch allowing users to intentionally seek raw html versus transformed: aece62a
This PR is ready for re-review and re-consideration @davidfedor
There was a problem hiding this comment.
A few things:
- I'm curious, why
linkedomas opposed to a more widely supported library likejsdom? Is there any risk of supply chain pollution with the newer library? - Have the docs been updated to reflect the need for a third party dependency?
- Can the dependency be added as an optional peer dependency so it shows up in install logs?
There was a problem hiding this comment.
I ask as someone who is doing RSC data loading, and will need to have this work server-side :)
There was a problem hiding this comment.
Hey Bryson (@arinthros)!
Based on your comment, I've moved from linkedom to jsdom.
The YV dev docs will need to be updated immediatley after this (I will do that)
The dep has been added as an optional peer dep, so it can show up in install logs.
Anything else blocking this?
|
(FYI I've asked for thoughts from Bryson H; not sure if he's got cycles to contribute or not) |
Add `transform` param to `getPassage` (default: true) so consumers can receive untransformed HTML without needing linkedom on the server. CSS now handles verse label spacing for raw HTML via ::after pseudo-element. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019df363-8170-750b-866b-d30055111f9b Co-authored-by: Amp <amp@ampcode.com>
|
@davidfedor @cameronapak this looks good to me, I just don't see an "approve" button in my UI. Approved by me! |
Summary
Auto-transforms Bible HTML inside
getPassageso consumers never need to calltransformBibleHtmlmanually. Uses nativeDOMParserin browser, dynamicimport('linkedom')on server. Addeddata-yv-transformedidempotency marker so double-transforms are a no-op.5 files changed across core and ui:
bible.ts(getHtmlAdapters + transform in getPassage),bible-html-transformer.ts(idempotency guard),bible-html-transformer.test.ts(3 idempotency tests),bible.test.ts(updated assertions for transformed output),verse.tsx(kept transform as XSS safety net for direct callers).Verse.Htmlretains itstransformBibleHtmlcall as defense-in-depth — the idempotency marker makes it a no-op for HTML that already went throughgetPassage.All 609 tests pass (290 core, 258 hooks, 61 ui). Build, typecheck, lint green.
Context: Why transformBibleHtml Exists — And Where It May Not Be Needed
Test plan
getPassagewithformat: 'html'returns transformed content (data-yv-transformedpresent)getPassagewithformat: 'text'returns raw content (no transformation)data-verse-footnoteattributesVerse.Htmlstill sanitizes raw HTML passed directly (XSS protection)🤖 Generated with Claude Code
Greptile Summary
This PR auto-transforms Bible HTML inside
getPassageso consumers no longer need to calltransformBibleHtmlmanually. Runtime-specific DOM adapters (DOMParserin browser, dynamicimport('jsdom')on server) are resolved insidegetHtmlAdapters(); adata-yv-transformedidempotency marker ensures double-transforms are a no-op, and sanitization always runs first so XSS protection is preserved even for already-transformed HTML.bible.tsgainsgetHtmlAdapters()and invokestransformBibleHtmlat the end ofgetPassage(HTML format only, skipped whentransform: false).bible-html-transformer.tsadds theTRANSFORMED_ATTRconstant and moves the idempotency check aftersanitizeBibleHtmlDocument, correctly addressing the XSS concern from earlier review threads.linkedomtojsdom; the changeset, server wrapper, and lock file are updated consistently, though the PR description summary still mentionslinkedom(leftover text).Confidence Score: 5/5
Safe to merge; the core transform path is well-tested, sanitization runs unconditionally before the idempotency guard, and the jsdom fallback provides a clear error message when the optional peer dep is absent.
All 609 tests pass and the two open findings are a devDep/peer-dep version gap for
@types/jsdomand a stalelinkedomreference in the PR description body — neither affects runtime behavior or the generated changelog entry.No files require special attention beyond the minor
@types/jsdomversion alignment inpackages/core/package.json.Important Files Changed
getHtmlAdapters()helper for runtime-agnostic DOM adapter resolution and auto-transforms HTML ingetPassage; browser path uses native DOMParser, server path uses dynamicimport('jsdom')with a clear error message fallback; addstransform?: booleanescape hatch as 6th positional parameter.TRANSFORMED_ATTR = 'data-yv-transformed'idempotency constant and guard; sanitization correctly runs before the idempotency check, so the previously-flagged XSS bypass concern is now addressed.linkedom@^0.18.12peer dependency withjsdom@^24.0.0(optional); adds@types/jsdom@^28.0.1as a devDependency — the major-version gap between the peer dep range (^24) and the type definitions (28.x) is worth verifying.data-yv-transformedpresence for HTML format, absence for text format andtransform: false, and footnote transformation; covers all new code paths.Verse.Htmlstill appliestransformBibleHtmlclient-side as defense-in-depth; sanitization runs first in the transformer, so XSS protection is preserved even for already-transformed HTML.Sequence Diagram
sequenceDiagram participant C as Consumer participant G as getPassage() participant A as getHtmlAdapters() participant T as transformBibleHtml() participant API as YouVersion API C->>G: "getPassage(versionId, usfm, html)" G->>API: "GET /v1/bibles/{id}/passages/{usfm}" API-->>G: "raw BiblePassage (untransformed HTML)" G->>A: getHtmlAdapters() alt Browser — DOMParser available A-->>G: "{ parseHtml: DOMParser, serializeHtml: body.innerHTML }" else Server — DOMParser absent A->>A: "import('jsdom') [Node module cache]" A-->>G: "{ parseHtml: JSDOM, serializeHtml: body.innerHTML }" end G->>T: "transformBibleHtml(passage.content, adapters)" T->>T: sanitizeBibleHtmlDocument(doc) T->>T: "check [data-yv-transformed] — idempotency guard" alt Already transformed T-->>G: "{ html } — sanitized, no structural changes" else Not yet transformed T->>T: "wrapVerseContent / footnotes / nbsp / tables" T->>T: "set data-yv-transformed on root element" T-->>G: "{ html } — fully transformed" end G-->>C: "BiblePassage { content: transformedHtml }"Prompt To Fix All With AI
Reviews (6): Last reviewed commit: "Merge branch 'main' into transform-bible..." | Re-trigger Greptile