Skip to content

feat: CDN citation-key resolution, WSL2 auth fix, content_word_match method#386

Merged
bensonwong merged 9 commits intomainfrom
feat/cdn-citation-key-and-wsl-auth
Mar 29, 2026
Merged

feat: CDN citation-key resolution, WSL2 auth fix, content_word_match method#386
bensonwong merged 9 commits intomainfrom
feat/cdn-citation-key-and-wsl-auth

Conversation

@bensonwong
Copy link
Copy Markdown
Collaborator

@bensonwong bensonwong commented Mar 29, 2026

Summary

  • CDN keymap: Extend resolveKeyMap() to resolve human-readable data-citation-key attributes (current /verify workflow) in addition to legacy data-cite attributes. Extracted shared loop logic into a resolveEls helper and fixed a double-processing bug where legacy elements would be re-resolved by the second pass.
  • Auth WSL fix: Bind the OAuth callback server to 0.0.0.0 instead of 127.0.0.1 so Windows browsers can reach it under WSL. WSL_DISTRO_NAME is set in both WSL1 and WSL2; binding is unnecessary but harmless in WSL1. The 64-char random nonce prevents unauthorized callers. Updated a stale CORS comment that still referenced the old 127.0.0.1 binding.
  • New search method: Add content_word_match to the SearchMethod type union with i18n support across all locales (en, es, fr, vi).

Test plan

  • bun test src/__tests__/cdnPopover.test.tsx — 6 new tests for data-citation-key resolution (happy path, unknown value pass-through, mixed elements, double-processing guard, legacy overwrite edge case, hash-like value edge case)
  • bun test src/__tests__/i18nLocales.test.ts — verify locale parity for new contentWordMatch key
  • bun test — full suite passes
  • Manual: run deepcitation login from WSL2 and confirm the browser callback completes

🤖 Generated with Claude Code

Benson and others added 6 commits March 29, 2026 11:58
The /verify workflow now emits data-citation-key directly (instead of
data-cite). Extend resolveKeyMap() to also replace human-readable
data-citation-key values with their hashed counterparts from the key map.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Windows browsers cannot reach 127.0.0.1 inside WSL2. Binding to 0.0.0.0
makes the callback server reachable. The 64-char random nonce prevents
unauthorized callers from exploiting the broader bind address.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The second loop (data-citation-key resolution) used to re-process elements
that the first loop had already set data-citation-key on. Use :not([data-cite])
selector to exclude them. Adds a regression test for the overlap case.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The comment incorrectly stated the callback server only runs on 127.0.0.1.
Updated to reflect the WSL2 binding change and nonce-based security model.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New SearchMethod variant for content-word matching. Adds the type
discriminant, narrative mapping, and translations for all locales.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Both attribute-resolution loops shared identical logic differing only in
selector and source attribute. Extracted into a private helper and dropped
the redundant Object.hasOwn guard (missing keys yield undefined which
already fails the typeof check).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel bot commented Mar 29, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

4 Skipped Deployments
Project Deployment Actions Updated (UTC)
agui-chat-deepcitation Ignored Ignored Preview Mar 29, 2026 5:57am
deepcitation-langchain-rag-chat Ignored Ignored Preview Mar 29, 2026 5:57am
mastra-rag-deepcitation Ignored Ignored Preview Mar 29, 2026 5:57am
nextjs-ai-sdk-deepcitation Ignored Ignored Preview Mar 29, 2026 5:57am

@claude
Copy link
Copy Markdown

claude bot commented Mar 29, 2026

Code Review

Overall this is a well-structured PR with clear separation of concerns. A few things worth discussing:


src/vanilla/runtime/cdn-keymap.tsresolveEls helper

Removed Object.hasOwn check (minor)

The original loop explicitly used Object.hasOwn(keyMap, humanKey) before accessing keyMap[humanKey]. The new resolveEls drops that and relies solely on typeof hashedKey !== "string". In practice this is safe — prototype-inherited properties like toString and constructor are functions, not strings, so they won't pass the type guard. But the Object.hasOwn guard was semantically more precise and explicitly documented intent (own-property-only lookup). Consider restoring it:

const hashedKey = Object.hasOwn(keyMap, humanKey) ? keyMap[humanKey] : undefined;
if (typeof hashedKey !== "string") continue;

Double-processing guard approach

The :not([data-cite]) selector correctly prevents legacy elements from being re-resolved in the second pass. This works because resolveEls for legacy elements adds data-citation-key but never removes data-cite. Clean approach.


src/auth.ts — WSL2 bind fix

Security: expanded network exposure

Changing from 127.0.0.1 to 0.0.0.0 solves the WSL2 problem but now exposes the OAuth callback server on all network interfaces (Wi-Fi, Ethernet, VPN, etc.) for up to 5 minutes. The 64-char nonce is strong protection, but this removes a meaningful defense-in-depth layer that loopback-only binding provided.

A few alternatives worth considering:

  1. WSL2-targeted binding: Only bind to 0.0.0.0 when a WSL2 environment is detected (e.g., check process.env.WSL_DISTRO_NAME or /proc/version), otherwise keep 127.0.0.1.
  2. Document the tradeoff: If 0.0.0.0 is kept, a comment spelling out the exposure (not just "WSL2 compatibility") would help future maintainers understand the risk model.
  3. Short-lived port: The server already uses a random ephemeral port (listen(0, ...)), which limits discoverability. This somewhat mitigates the exposure, but is worth calling out explicitly in the comment.

If the WSL2 use case is common enough that 0.0.0.0 is the pragmatic choice, at minimum the comment should reflect the real tradeoff rather than just "the nonce prevents abuse" — a nonce on an exposed port is weaker than loopback + no nonce.


src/types/search.ts + src/analysis/narrative.ts + locales — content_word_match

Consistent, complete implementation: type union, METHOD_KEY_MAP, and all four locales (en, es, fr, vi) updated in lockstep. No issues here.

The French translation "Correspondance de mots-clés" translates more precisely as "keyword match" rather than "content word match". Whether that's intentional (content words ≈ keywords in this context) or a subtle mistranslation is worth a native-speaker confirmation.


Tests (src/__tests__/cdnPopover.test.tsx)

Good coverage of the new data-citation-key resolution path. The four cases (happy path, unknown value, mixed elements, double-processing guard) map directly to the behavioral requirements.

One gap: the double-processing test covers the legacy path (data-cite element whose resolved hash also appears as a key). There's no symmetric test for the current path — a data-citation-key element whose human-readable value happens to be a hash that maps to something else. By design the current path should resolve it (unlike legacy), so a test making that behavior explicit would help future readers understand the asymmetry.

it("resolves data-citation-key even when value looks like a hash", () => {
  injectKeyMap({ "hash-a": "double-hashed" });
  const el = addCitationKeyEl("hash-a");
  resolveKeyMap();
  // current-path elements are always resolved if the key exists
  expect(el.getAttribute("data-citation-key")).toBe("double-hashed");
});

Summary

Area Verdict
CDN keymap refactor ✅ Approve with nit (restore Object.hasOwn)
Auth WSL2 fix ⚠️ Needs discussion on security exposure
content_word_match ✅ Approve
Tests ✅ Approve, suggested one additional case

The WSL2 change is the only item I'd want resolved before merge. Everything else is low-risk.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 29, 2026

✅ Playwright Test Report

Status: Tests passed

📊 Download Report & Snapshots (see Artifacts section)

What's in the Visual Snapshots

The gallery includes visual snapshots for:

  • 🖥️ Desktop showcase (all variants × all states)
  • 📱 Mobile showcase (iPhone SE viewport)
  • 📟 Tablet showcase (iPad viewport)
  • 🔍 Popover states (verified, partial, not found)
  • 🔗 URL citation variants

Run ID: 23702718837

Benson and others added 2 commits March 29, 2026 12:37
- Restore Object.hasOwn guard in resolveEls for explicit own-property intent
- WSL2-conditional binding: only use 0.0.0.0 when WSL_DISTRO_NAME is set,
  keep 127.0.0.1 loopback-only elsewhere for defense-in-depth
- Update CORS comment to reflect conditional binding logic
- Add symmetric test: current-path elements resolve even when value looks
  like a hash (documents the asymmetry with legacy double-processing guard)
- Fix French translation: "mots de contenu" (content words) not "mots-clés"
  (keywords)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Spanish: "palabras clave" (keywords) → "palabras de contenido" (content
  words) to match en/fr/vi translations
- Test: element with both data-cite and pre-existing data-citation-key
  verifies legacy path overwrites and current path skips via :not([data-cite])

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude bot commented Mar 29, 2026

Code Review

Overall this is a clean, well-scoped PR. A few things worth flagging before merge:


CDN keymap refactor ✅

The resolveEls helper is a nice extraction — it removes ~8 lines of duplication and the signature is clear. The :not([data-cite]) guard on the second pass is the right way to prevent double-processing.

One minor nit: the PR description says "4 new tests" but the diff adds 6 it() blocks. Not a functional problem, just a stale description.


WSL2 auth fix — security surface worth noting

const host = process.env.WSL_DISTRO_NAME ? "0.0.0.0" : "127.0.0.1";

Two observations:

1. WSL_DISTRO_NAME is set in both WSL1 and WSL2. WSL1 shares the Windows loopback adapter, so binding 0.0.0.0 there is unnecessary (though not harmful). If you want WSL2-only detection, WSL_INTEROP is the additional signal. The current check is safe to ship but the comment says "WSL2" when it also affects WSL1 users.

2. Binding 0.0.0.0 exposes the callback server on all interfaces, including LAN/Wi-Fi, for the full 5-minute login window. The 64-char nonce guards the /callback endpoint, but the /health check responds to any caller without a nonce:

if (req.method === "GET" && req.url === "/health") {
  sendJson(res, 200, { ok: true }, origin);
  return;
}

On a shared or corporate network, this lets anyone on the LAN confirm the server is listening on the ephemeral port (though they can't complete auth without the nonce). Reasonable to accept this tradeoff for a developer tool, but it should be documented. Suggesting either:

  • A brief code comment noting the LAN exposure trade-off, or
  • Restricting /health to same-origin/localhost-origin requests when host === "0.0.0.0".

content_word_match addition ✅

Type union, METHOD_KEY_MAP, and all four locales (en, es, fr, vi) are updated consistently. Placement in the union is alphabetically reasonable among the fallback variants. No issues here.


Summary

Area Status
CDN keymap refactor ✅ Approve
Double-processing guard + tests ✅ Approve
content_word_match + i18n ✅ Approve
WSL2 auth: WSL_DISTRO_NAME matches WSL1 too Minor / worth noting in comments
WSL2 auth: /health exposed on 0.0.0.0 Minor / acceptable risk, but worth a comment

Happy to approve once the WSL1 vs WSL2 comment is clarified (or acknowledged as intentional).

Address code review feedback: WSL_DISTRO_NAME is set in both WSL1 and
WSL2, not just WSL2. Also document that binding 0.0.0.0 exposes the
/health endpoint on LAN during the login window (acceptable tradeoff
since auth completion still requires the 64-char nonce).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@bensonwong bensonwong merged commit b48b8c5 into main Mar 29, 2026
13 checks passed
@bensonwong bensonwong deleted the feat/cdn-citation-key-and-wsl-auth branch March 29, 2026 08:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant