Skip to content

Auto-transform the Bible HTML from getPassage so the consumer doesn't have extra steps#216

Open
cameronapak wants to merge 7 commits intomainfrom
transform-bible-html
Open

Auto-transform the Bible HTML from getPassage so the consumer doesn't have extra steps#216
cameronapak wants to merge 7 commits intomainfrom
transform-bible-html

Conversation

@cameronapak
Copy link
Copy Markdown
Collaborator

@cameronapak cameronapak commented Apr 20, 2026

Summary

Auto-transforms Bible HTML inside getPassage so consumers never need to call transformBibleHtml manually. Uses native DOMParser in browser, dynamic import('linkedom') on server. Added data-yv-transformed idempotency marker so double-transforms are a no-op.

5 files changed across core and ui: bible.ts (getHtmlAdapters + transform in getPassage), bible-html-transformer.ts (idempotency guard), bible-html-transformer.test.ts (3 idempotency tests), bible.test.ts (updated assertions for transformed output), verse.tsx (kept transform as XSS safety net for direct callers).

Verse.Html retains its transformBibleHtml call as defense-in-depth — the idempotency marker makes it a no-op for HTML that already went through getPassage.

All 609 tests pass (290 core, 258 hooks, 61 ui). Build, typecheck, lint green.

Context: Why transformBibleHtml Exists — And Where It May Not Be Needed

Test plan

  • Verify getPassage with format: 'html' returns transformed content (data-yv-transformed present)
  • Verify getPassage with format: 'text' returns raw content (no transformation)
  • Verify double-transform produces identical output (idempotency)
  • Verify footnotes are extracted into data-verse-footnote attributes
  • Verify Verse.Html still sanitizes raw HTML passed directly (XSS protection)

🤖 Generated with Claude Code

Greptile Summary

This PR auto-applies the full Bible HTML transformation pipeline (verse wrapping, footnote extraction, sanitization, table fixes) inside getPassage so consumers no longer need to call transformBibleHtml manually. A data-yv-transformed idempotency marker on the root element makes double-transforms a no-op, and Verse.Html's existing transformBibleHtml call becomes a safe no-op for pre-transformed content.

  • bible.ts gains an async getHtmlAdapters() that picks between the native browser DOMParser and a dynamically-imported jsdom on the server, with a clear error if jsdom is absent; a transform: false escape hatch is provided for callers that need raw HTML or want to avoid the dependency.
  • bible-html-transformer.ts adds the TRANSFORMED_ATTR constant and idempotency guard; sanitization was moved to run unconditionally before the early-return path, correctly addressing the previously flagged XSS concern.
  • linkedom is dropped in favour of jsdom (already present as a vitest peer), and all tests, types, and the lockfile are updated accordingly.

Confidence Score: 5/5

Safe to merge — the XSS and missing-dependency concerns from the prior round have both been addressed; remaining observations are edge-case quality issues that do not affect production Bible HTML paths.

Sanitization now runs unconditionally before the idempotency early-return, closing the prior XSS bypass. The jsdom missing-dependency path throws a clear, actionable error. The only open item is a theoretical idempotency failure when doc.body has no element children — a state that real Bible API HTML never produces. All 609 tests pass and the transform: false escape hatch is verified.

packages/core/src/bible-html-transformer.ts — the root ?? doc.body fallback in the idempotency marker is logically incorrect (body attributes are lost in innerHTML serialization), though the condition is unreachable with real API payloads.

Important Files Changed

Filename Overview
packages/core/src/bible.ts Adds getHtmlAdapters() for env-aware DOM wiring and integrates auto-transform into getPassage; new transform param opt-out works correctly but sits at the 6th positional position
packages/core/src/bible-html-transformer.ts Adds TRANSFORMED_ATTR constant and idempotency guard; sanitization now runs unconditionally before the early-return, correctly addressing the prior XSS concern; edge case where firstElementChild is null causes the marker to be set on body and lost via innerHTML serialization
packages/core/src/bible-html-transformer-server.ts Swaps static linkedom import for static jsdom import; API usage is identical and safe
packages/ui/src/components/verse.tsx Comment update only — Verse.Html logic unchanged; idempotency marker makes the redundant transformBibleHtml call a no-op in practice
packages/core/src/tests/bible.test.ts Tests updated to assert transformed output; new transform: false coverage added; text format also checks absence of marker
packages/core/package.json Replaces linkedom peer dependency with jsdom; both remain optional; @types/jsdom added as devDependency

Sequence Diagram

sequenceDiagram
    participant C as Consumer
    participant GP as getPassage()
    participant API as YouVersion API
    participant GA as getHtmlAdapters()
    participant T as transformBibleHtml()

    C->>GP: getPassage(id, usfm, "html")
    GP->>API: GET /v1/bibles/{id}/passages/{usfm}
    API-->>GP: raw BiblePassage (untransformed HTML)
    GP->>GA: getHtmlAdapters()
    alt Browser (DOMParser available)
        GA-->>GP: { parseHtml: DOMParser, serializeHtml: body.innerHTML }
    else Server (DOMParser absent)
        GA->>GA: await import("jsdom")
        GA-->>GP: { parseHtml: JSDOM, serializeHtml: body.innerHTML }
    end
    GP->>T: transformBibleHtml(passage.content, adapters)
    T->>T: sanitizeBibleHtmlDocument() [always]
    alt data-yv-transformed present
        T-->>GP: { html: sanitized-only HTML }
    else not yet transformed
        T->>T: wrapVerseContent / footnotes / nbsp / tables
        T->>T: root.setAttribute("data-yv-transformed", "")
        T-->>GP: { html: fully transformed HTML }
    end
    GP-->>C: BiblePassage { content: transformed HTML }
Loading

Fix All in Claude Code Fix All in Cursor

Prompt To Fix All With AI
Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 3
packages/core/src/bible-html-transformer.ts:358-360
When `doc.body` has no element children (i.e. `firstElementChild` is `null`), `root` falls back to `doc.body` itself. Setting `TRANSFORMED_ATTR` on the `<body>` element is silently lost when `serializeHtml` returns `doc.body.innerHTML`, which serializes only the *inner* content — the `<body>` tag's own attributes are not included in the string. A subsequent call to `transformBibleHtml` will therefore not detect the idempotency marker and will run all structural transforms again, breaking the double-transform guarantee. Bible API HTML always has a root `<div>`, so this path only triggers on edge-case or synthetic input, but the fallback arm is logically incorrect.

```suggestion
  // Mark as transformed for idempotency.
  // Must target an element that survives `serializeHtml` (i.e. body.innerHTML),
  // so we use firstElementChild only — placing the attribute on <body> itself
  // would be serialized away by body.innerHTML.
  const root = doc.body?.firstElementChild;
  if (root) {
    root.setAttribute(TRANSFORMED_ATTR, '');
  }
```

### Issue 2 of 3
packages/core/src/bible.ts:299-306
**Opt-out requires placeholder `undefined` arguments**

`transform` is the 6th positional parameter. To disable transformation while accepting all other defaults, callers must write `getPassage(id, usfm, 'html', undefined, undefined, false)` — as demonstrated in the JSDoc example and the new test. An options-object overload or trailing options bag would let callers write `getPassage(id, usfm, { transform: false })` without the placeholder chain.

### Issue 3 of 3
packages/core/src/bible.ts:17-40
**`getHtmlAdapters` re-evaluated on every invocation**

`await import('jsdom')` is called inside `getPassage` on every request that reaches the transform branch. Node.js's module cache means the actual file is loaded only once, but the async bookkeeping (Promise allocation, microtask tick) still occurs per call. Since the returned adapters object is stateless, caching it at the module level after the first successful resolution would make subsequent calls synchronous on hot paths.

Reviews (5): Last reviewed commit: "Merge branch 'main' into transform-bible..." | Re-trigger Greptile

getPassage now automatically sanitizes and transforms HTML content
before returning — verse wrapping, footnote extraction, nbsp, and
table fixes all happen at the root. Uses native DOMParser in browser,
dynamic import('linkedom') on server. Added data-yv-transformed
idempotency marker so double-transforms are a no-op.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Apr 20, 2026

🦋 Changeset detected

Latest commit: 753776e

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 4 packages
Name Type
@youversion/platform-core Minor
@youversion/platform-react-hooks Minor
@youversion/platform-react-ui Minor
vite-react Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@cameronapak cameronapak changed the title feat(core): auto-transform Bible HTML in getPassage Auto-transform the Bible HTML from getPassage so the consumer doesn't have extra steps Apr 20, 2026
Comment thread packages/core/src/bible-html-transformer.ts Outdated
Comment thread packages/core/src/bible.ts Outdated
cameronapak and others added 2 commits April 20, 2026 14:10
Run XSS sanitization before idempotency check so data-yv-transformed
cannot bypass sanitizeBibleHtmlDocument. Add clear error message when
linkedom is missing on server instead of opaque module-not-found error.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

expect(result.html).not.toContain('onclick');
expect(result.html).toContain('<p>');
expect(result.html).toContain('<p');
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: if you're wondering why this tag is seemingly cut off, it's because the tags would contain a data attribute in this new PR, which would then make it where the tag is something like <p data-yv-attribute> versus <p> standalone

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this expect() call do regular expressions? we don't have pre tags yet, that I know of, but still... it'd be great to tighten this up if that's not too difficult.

Copy link
Copy Markdown
Member

@davidfedor davidfedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big picture: I love the idea of being helpful, without requiring the developer to have to make another call. My comments and questions are around whether this is the best way to do that. (Maybe it is! I'm not sure yet.)
I notice this would be blurring the lines between Core being merely an API helper-layer, but now it would be doing some of the prep-work of the UI (visualization layer). So at the least having that be optional seems wise.
I'm wondering if that parameter should default to do the transformation, or not... or whether we need to force the dev to make a choice (to attempt to force them to make an informed choice).

Comment thread packages/core/src/bible.ts Outdated
} catch {
throw new Error(
'Server-side HTML transformation requires "linkedom". ' +
'Install it as a dependency or pass format: "text" to skip transformation.',
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be better if there was a supported way to get raw html (untransformed), for people who don't want to import linkedom or who (for whatever reason) want the original data. How about a new format option, "rawhtml" or something like that?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'm writing this here because this error path is not something a builder will probably be excited to be in. The fix would mostly be elsewhere.)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... or add another parameter so that the format can stay "html". That feels like a better idea to me.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like what you're processing. I've added a new commit to have the escape hatch allowing users to intentionally seek raw html versus transformed: aece62a

This PR is ready for re-review and re-consideration @davidfedor

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things:

  • I'm curious, why linkedom as opposed to a more widely supported library like jsdom? Is there any risk of supply chain pollution with the newer library?
  • Have the docs been updated to reflect the need for a third party dependency?
  • Can the dependency be added as an optional peer dependency so it shows up in install logs?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ask as someone who is doing RSC data loading, and will need to have this work server-side :)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Bryson (@arinthros)!

Based on your comment, I've moved from linkedom to jsdom.

The YV dev docs will need to be updated immediatley after this (I will do that)

The dep has been added as an optional peer dep, so it can show up in install logs.

Anything else blocking this?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arinthros following up ^

@davidfedor
Copy link
Copy Markdown
Member

(FYI I've asked for thoughts from Bryson H; not sure if he's got cycles to contribute or not)

cameronapak and others added 2 commits April 24, 2026 11:23
Add `transform` param to `getPassage` (default: true) so consumers can
receive untransformed HTML without needing linkedom on the server.
CSS now handles verse label spacing for raw HTML via ::after pseudo-element.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cameronapak cameronapak requested a review from davidfedor April 24, 2026 16:51
@arinthros
Copy link
Copy Markdown

@davidfedor @cameronapak this looks good to me, I just don't see an "approve" button in my UI. Approved by me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants