Skip to content

fix: stop uppercasing script, style, code, and pre content#2

Merged
jergason merged 1 commit intomainfrom
fix/skip-uppercasing-script-style-code
Mar 23, 2026
Merged

fix: stop uppercasing script, style, code, and pre content#2
jergason merged 1 commit intomainfrom
fix/skip-uppercasing-script-style-code

Conversation

@jergason
Copy link
Copy Markdown
Owner

Summary

  • Bug: the body * HTMLRewriter selector was uppercasing ALL text inside <body>, including:
    • Inline <script> tags → var became VAR, breaking JavaScript
    • Inline <style> tags → display: block became DISPLAY: BLOCK, breaking CSS
    • <code>/<pre> blocks → code examples on Stack Overflow etc. got mangled
  • Fix: SkipElementTracker + onEndTag() depth counting to suppress uppercasing inside script, style, code, pre, textarea, noscript, svg
  • CSS fix: explicit text-transform: none !important on code, pre, textarea, svg to prevent CSS inheritance from parent elements
  • Adds 15 new tests (6 unit + 9 integration via wrangler unstable_dev)
  • Configures knip to ignore @typescript/native-preview (provides tsgo binary)

Test plan

  • All 31 tests pass (16 existing + 15 new)
  • pnpm lint — 0 warnings, 0 errors
  • pnpm format:check — clean
  • npx tsgo --noEmit — no type errors
  • npx knip — no unused exports/deps
  • Manually tested via chrome devtools screenshots: Wikipedia, HN, NPR, BBC, Reuters, Cloudflare Blog, Stack Overflow, Project Gutenberg, Smashing Magazine

🤖 Generated with Claude Code

the `body *` HTMLRewriter selector was uppercasing ALL text inside
<body>, including inline <script> and <style> tags (breaking JS/CSS)
and <code>/<pre> blocks (mangling code examples on sites like SO).

fixes:
- add SkipElementTracker + onEndTag depth counting to suppress
  uppercasing inside script/style/code/pre/textarea/noscript/svg
- add explicit `text-transform: none` CSS reset for code/pre/textarea/svg
  to prevent CSS inheritance from parent elements
- add integration tests via wrangler unstable_dev
- configure knip to ignore @typescript/native-preview (provides tsgo binary)

tested against: wikipedia, HN, NPR, BBC, reuters, cloudflare blog,
stack overflow, project gutenberg, smashing magazine

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 23, 2026 15:23
@jergason jergason merged commit 68c92be into main Mar 23, 2026
3 checks passed
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes the HTML rewriting pipeline so text uppercasing no longer corrupts content inside tags where casing is semantically meaningful (e.g., <script>, <style>, <code>, <pre>), and adds tests/config updates to support the change.

Changes:

  • Introduces a depth-based skip tracker to suppress text uppercasing inside specific elements (script/style/code/pre/textarea/noscript/svg).
  • Updates injected CSS to reset text-transform for certain elements to avoid inherited uppercasing.
  • Adds unit + integration tests and configures knip to ignore @typescript/native-preview.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
src/rewriter.ts Adds skip-depth tracking for selective uppercasing suppression and updates injected CSS rules.
src/rewriter.test.ts Adds unit tests for entity-preserving uppercasing and integration tests using wrangler unstable_dev.
package.json Adds knip config to ignore @typescript/native-preview.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/rewriter.test.ts
Comment on lines +44 to +50
it("uppercases regular text in proxied HTML", async () => {
const resp = await worker.fetch("/browse/https://httpbin.org/html");
if (resp.status !== 200) return; // skip if httpbin is down
const html = await resp.text();
// httpbin /html returns a page with "Herman Melville" — should be uppercased
expect(html).toContain("HERMAN MELVILLE");
});
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several integration tests return early when the proxied fetch doesn’t yield a 200 (e.g. if the upstream site is down or outbound network is blocked). That makes the test pass without asserting anything, which can hide regressions. Prefer making these tests deterministic (fixture HTML + direct HTMLRewriter transform, or mocking fetch in the worker), or explicitly skipping with a clear condition (e.g. it.skip/it.skipIf) rather than returning mid-test.

Copilot uses AI. Check for mistakes.
Comment thread src/rewriter.test.ts
Comment on lines +52 to +67
it("preserves inline script content in body", async () => {
const resp = await worker.fetch("/browse/https://www.wikipedia.org");
if (resp.status !== 200) return;
const html = await resp.text();
// wikipedia has inline scripts with 'var' — should NOT be uppercased
expect(html).toContain("var ");
expect(html).not.toMatch(/\bVAR rtlLangs\b/);
});

it("preserves inline style content in body", async () => {
const resp = await worker.fetch("/browse/https://www.wikipedia.org");
if (resp.status !== 200) return;
const html = await resp.text();
// wikipedia has inline styles — should NOT be uppercased
expect(html).toMatch(/display:\s*block/i);
expect(html).not.toMatch(/DISPLAY:\s*BLOCK/);
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These integration assertions depend on third-party page contents (e.g. Wikipedia having specific inline var/CSS patterns). This is brittle and can fail (or be skipped) if the upstream HTML changes. Consider using a minimal local HTML fixture that includes representative <script>, <style>, <code>/<pre>, etc., and asserting the transformed output against that fixture instead of live sites.

Suggested change
it("preserves inline script content in body", async () => {
const resp = await worker.fetch("/browse/https://www.wikipedia.org");
if (resp.status !== 200) return;
const html = await resp.text();
// wikipedia has inline scripts with 'var' — should NOT be uppercased
expect(html).toContain("var ");
expect(html).not.toMatch(/\bVAR rtlLangs\b/);
});
it("preserves inline style content in body", async () => {
const resp = await worker.fetch("/browse/https://www.wikipedia.org");
if (resp.status !== 200) return;
const html = await resp.text();
// wikipedia has inline styles — should NOT be uppercased
expect(html).toMatch(/display:\s*block/i);
expect(html).not.toMatch(/DISPLAY:\s*BLOCK/);
it("preserves inline script content in body", () => {
const html = `
<html>
<head></head>
<body>
<p>some body text</p>
<script>
var rtlLangs = ["ar", "he"];
</script>
</body>
</html>
`;
const transformed = uppercasePreservingEntities(html);
// inline script contents with 'var' should NOT be uppercased
expect(transformed).toContain("var rtlLangs");
expect(transformed).not.toMatch(/\bVAR rtlLangs\b/);
});
it("preserves inline style content in body", () => {
const html = `
<html>
<head></head>
<body>
<p>some other body text</p>
<style>
.example {
display: block;
}
</style>
</body>
</html>
`;
const transformed = uppercasePreservingEntities(html);
// inline style contents should NOT be uppercased
expect(transformed).toMatch(/display:\s*block/i);
expect(transformed).not.toMatch(/DISPLAY:\s*BLOCK/);

Copilot uses AI. Check for mistakes.
Comment thread src/rewriter.ts
// URLRewriter already resolves all relative URLs to absolute proxy paths
el.append(
`<style>*:not(input):not(textarea):not(select):not(code):not(pre):not(script):not(style) { text-transform: uppercase !important; }</style>`,
`<style>*:not(input):not(textarea):not(select):not(code):not(pre):not(script):not(style) { text-transform: uppercase !important; } code, pre, textarea, svg { text-transform: none !important; }</style>`,
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The injected CSS forces text-transform: uppercase on all elements except a small exclude list, but noscript is not excluded/reset. Even though the HTMLRewriter now skips uppercasing inside <noscript>, the CSS inheritance will still render noscript content uppercased when scripts are disabled. Consider adding :not(noscript) to the uppercase selector and/or adding noscript { text-transform: none !important; } alongside the other resets.

Suggested change
`<style>*:not(input):not(textarea):not(select):not(code):not(pre):not(script):not(style) { text-transform: uppercase !important; } code, pre, textarea, svg { text-transform: none !important; }</style>`,
`<style>*:not(input):not(textarea):not(select):not(code):not(pre):not(script):not(style):not(noscript) { text-transform: uppercase !important; } code, pre, textarea, svg, noscript { text-transform: none !important; }</style>`,

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants