fix: stop uppercasing script, style, code, and pre content#2
Conversation
the `body *` HTMLRewriter selector was uppercasing ALL text inside <body>, including inline <script> and <style> tags (breaking JS/CSS) and <code>/<pre> blocks (mangling code examples on sites like SO). fixes: - add SkipElementTracker + onEndTag depth counting to suppress uppercasing inside script/style/code/pre/textarea/noscript/svg - add explicit `text-transform: none` CSS reset for code/pre/textarea/svg to prevent CSS inheritance from parent elements - add integration tests via wrangler unstable_dev - configure knip to ignore @typescript/native-preview (provides tsgo binary) tested against: wikipedia, HN, NPR, BBC, reuters, cloudflare blog, stack overflow, project gutenberg, smashing magazine Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Fixes the HTML rewriting pipeline so text uppercasing no longer corrupts content inside tags where casing is semantically meaningful (e.g., <script>, <style>, <code>, <pre>), and adds tests/config updates to support the change.
Changes:
- Introduces a depth-based skip tracker to suppress text uppercasing inside specific elements (script/style/code/pre/textarea/noscript/svg).
- Updates injected CSS to reset
text-transformfor certain elements to avoid inherited uppercasing. - Adds unit + integration tests and configures knip to ignore
@typescript/native-preview.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/rewriter.ts |
Adds skip-depth tracking for selective uppercasing suppression and updates injected CSS rules. |
src/rewriter.test.ts |
Adds unit tests for entity-preserving uppercasing and integration tests using wrangler unstable_dev. |
package.json |
Adds knip config to ignore @typescript/native-preview. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| it("uppercases regular text in proxied HTML", async () => { | ||
| const resp = await worker.fetch("/browse/https://httpbin.org/html"); | ||
| if (resp.status !== 200) return; // skip if httpbin is down | ||
| const html = await resp.text(); | ||
| // httpbin /html returns a page with "Herman Melville" — should be uppercased | ||
| expect(html).toContain("HERMAN MELVILLE"); | ||
| }); |
There was a problem hiding this comment.
Several integration tests return early when the proxied fetch doesn’t yield a 200 (e.g. if the upstream site is down or outbound network is blocked). That makes the test pass without asserting anything, which can hide regressions. Prefer making these tests deterministic (fixture HTML + direct HTMLRewriter transform, or mocking fetch in the worker), or explicitly skipping with a clear condition (e.g. it.skip/it.skipIf) rather than returning mid-test.
| it("preserves inline script content in body", async () => { | ||
| const resp = await worker.fetch("/browse/https://www.wikipedia.org"); | ||
| if (resp.status !== 200) return; | ||
| const html = await resp.text(); | ||
| // wikipedia has inline scripts with 'var' — should NOT be uppercased | ||
| expect(html).toContain("var "); | ||
| expect(html).not.toMatch(/\bVAR rtlLangs\b/); | ||
| }); | ||
|
|
||
| it("preserves inline style content in body", async () => { | ||
| const resp = await worker.fetch("/browse/https://www.wikipedia.org"); | ||
| if (resp.status !== 200) return; | ||
| const html = await resp.text(); | ||
| // wikipedia has inline styles — should NOT be uppercased | ||
| expect(html).toMatch(/display:\s*block/i); | ||
| expect(html).not.toMatch(/DISPLAY:\s*BLOCK/); |
There was a problem hiding this comment.
These integration assertions depend on third-party page contents (e.g. Wikipedia having specific inline var/CSS patterns). This is brittle and can fail (or be skipped) if the upstream HTML changes. Consider using a minimal local HTML fixture that includes representative <script>, <style>, <code>/<pre>, etc., and asserting the transformed output against that fixture instead of live sites.
| it("preserves inline script content in body", async () => { | |
| const resp = await worker.fetch("/browse/https://www.wikipedia.org"); | |
| if (resp.status !== 200) return; | |
| const html = await resp.text(); | |
| // wikipedia has inline scripts with 'var' — should NOT be uppercased | |
| expect(html).toContain("var "); | |
| expect(html).not.toMatch(/\bVAR rtlLangs\b/); | |
| }); | |
| it("preserves inline style content in body", async () => { | |
| const resp = await worker.fetch("/browse/https://www.wikipedia.org"); | |
| if (resp.status !== 200) return; | |
| const html = await resp.text(); | |
| // wikipedia has inline styles — should NOT be uppercased | |
| expect(html).toMatch(/display:\s*block/i); | |
| expect(html).not.toMatch(/DISPLAY:\s*BLOCK/); | |
| it("preserves inline script content in body", () => { | |
| const html = ` | |
| <html> | |
| <head></head> | |
| <body> | |
| <p>some body text</p> | |
| <script> | |
| var rtlLangs = ["ar", "he"]; | |
| </script> | |
| </body> | |
| </html> | |
| `; | |
| const transformed = uppercasePreservingEntities(html); | |
| // inline script contents with 'var' should NOT be uppercased | |
| expect(transformed).toContain("var rtlLangs"); | |
| expect(transformed).not.toMatch(/\bVAR rtlLangs\b/); | |
| }); | |
| it("preserves inline style content in body", () => { | |
| const html = ` | |
| <html> | |
| <head></head> | |
| <body> | |
| <p>some other body text</p> | |
| <style> | |
| .example { | |
| display: block; | |
| } | |
| </style> | |
| </body> | |
| </html> | |
| `; | |
| const transformed = uppercasePreservingEntities(html); | |
| // inline style contents should NOT be uppercased | |
| expect(transformed).toMatch(/display:\s*block/i); | |
| expect(transformed).not.toMatch(/DISPLAY:\s*BLOCK/); |
| // URLRewriter already resolves all relative URLs to absolute proxy paths | ||
| el.append( | ||
| `<style>*:not(input):not(textarea):not(select):not(code):not(pre):not(script):not(style) { text-transform: uppercase !important; }</style>`, | ||
| `<style>*:not(input):not(textarea):not(select):not(code):not(pre):not(script):not(style) { text-transform: uppercase !important; } code, pre, textarea, svg { text-transform: none !important; }</style>`, |
There was a problem hiding this comment.
The injected CSS forces text-transform: uppercase on all elements except a small exclude list, but noscript is not excluded/reset. Even though the HTMLRewriter now skips uppercasing inside <noscript>, the CSS inheritance will still render noscript content uppercased when scripts are disabled. Consider adding :not(noscript) to the uppercase selector and/or adding noscript { text-transform: none !important; } alongside the other resets.
| `<style>*:not(input):not(textarea):not(select):not(code):not(pre):not(script):not(style) { text-transform: uppercase !important; } code, pre, textarea, svg { text-transform: none !important; }</style>`, | |
| `<style>*:not(input):not(textarea):not(select):not(code):not(pre):not(script):not(style):not(noscript) { text-transform: uppercase !important; } code, pre, textarea, svg, noscript { text-transform: none !important; }</style>`, |
Summary
body *HTMLRewriter selector was uppercasing ALL text inside<body>, including:<script>tags →varbecameVAR, breaking JavaScript<style>tags →display: blockbecameDISPLAY: BLOCK, breaking CSS<code>/<pre>blocks → code examples on Stack Overflow etc. got mangledSkipElementTracker+onEndTag()depth counting to suppress uppercasing insidescript,style,code,pre,textarea,noscript,svgtext-transform: none !importantoncode, pre, textarea, svgto prevent CSS inheritance from parent elementswrangler unstable_dev)@typescript/native-preview(providestsgobinary)Test plan
pnpm lint— 0 warnings, 0 errorspnpm format:check— cleannpx tsgo --noEmit— no type errorsnpx knip— no unused exports/deps🤖 Generated with Claude Code