Skip to content

fix: rewrite data-src/data-srcset for lazy-loaded images#4

Merged
jergason merged 1 commit intomainfrom
fix/rewrite-data-src-lazy-load
Mar 23, 2026
Merged

fix: rewrite data-src/data-srcset for lazy-loaded images#4
jergason merged 1 commit intomainfrom
fix/rewrite-data-src-lazy-load

Conversation

@jergason
Copy link
Copy Markdown
Owner

Summary

  • Many sites use data-src and data-srcset attributes for lazy loading images (Wired had 41 images using this pattern, all bypassing the proxy)
  • Added data-src rewriting on img, source elements via URLRewriter
  • Added data-srcset rewriting on img, source elements via SrcsetRewriter (now accepts configurable attribute name)
  • Added integration test verifying Wired's data-src attributes get proxied

Test plan

  • 32 tests pass (including new data-src test)
  • All checks clean: lint, format, types, knip
  • Visually verified on Wired — lazy-loaded images now load when scrolled into view
  • Verified no regressions on Wikipedia, HN, NPR, BBC, Reuters, ESPN, Guardian, Verge

🤖 Generated with Claude Code

many sites (wired, conde nast, etc.) use data-src/data-srcset
attributes for lazy loading. these bypassed the proxy because the
HTMLRewriter only handled src/srcset. now we rewrite both variants
on img and source elements.

also makes SrcsetRewriter configurable for which attribute to target.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 23, 2026 15:54
@jergason jergason merged commit 2ad8682 into main Mar 23, 2026
2 checks passed
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the HTML rewriting proxy so that lazy-loaded images using data-src / data-srcset are also rewritten to proxied URLs, preventing them from bypassing the proxy when they load later.

Changes:

  • Make SrcsetRewriter accept a configurable attribute name (default srcset).
  • Rewrite data-src for img, source via URLRewriter.
  • Rewrite data-srcset for img, source via SrcsetRewriter, and add an integration test targeting Wired’s markup.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/rewriter.ts Adds rewriting support for data-src and data-srcset, and generalizes SrcsetRewriter to support non-srcset attribute names.
src/rewriter.test.ts Adds an integration test intended to verify data-src rewriting on a real-world site (Wired).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/rewriter.test.ts
Comment on lines +112 to +117
const resp = await worker.fetch("/browse/https://www.wired.com");
if (resp.status !== 200) return;
const html = await resp.text();
const dataSrcMatches = html.match(/data-src="([^"]*)"/g) || [];
if (dataSrcMatches.length === 0) return; // skip if wired changed their pattern
const allProxied = dataSrcMatches.every((m) => m.includes("/browse/"));
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test can pass without asserting the new behavior: it returns early when Wired is down/non-200, and also returns early when no data-src attributes are present. That makes CI green even if the rewrite regresses or Wired changes/blocks requests. Consider using Vitest's skip mechanism (so it shows as skipped) and/or make the test deterministic by transforming a small HTML fixture through buildRewriter() and asserting the rewritten data-src value(s).

Copilot uses AI. Check for mistakes.
Comment thread src/rewriter.test.ts
Comment on lines +115 to +117
const dataSrcMatches = html.match(/data-src="([^"]*)"/g) || [];
if (dataSrcMatches.length === 0) return; // skip if wired changed their pattern
const allProxied = dataSrcMatches.every((m) => m.includes("/browse/"));
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertion m.includes("/browse/") is quite loose and can produce false positives if the original URL happens to contain /browse/ as part of its path/query. It would be more robust to capture the attribute value and assert it starts with the proxy prefix (e.g., data-src="/browse/), or parse out the value and check value.startsWith("/browse/").

Suggested change
const dataSrcMatches = html.match(/data-src="([^"]*)"/g) || [];
if (dataSrcMatches.length === 0) return; // skip if wired changed their pattern
const allProxied = dataSrcMatches.every((m) => m.includes("/browse/"));
const dataSrcMatches = [...html.matchAll(/data-src="([^"]*)"/g)];
if (dataSrcMatches.length === 0) return; // skip if wired changed their pattern
const allProxied = dataSrcMatches.every((m) => m[1].startsWith("/browse/"));

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants