feat: CLI validation, CDN drawer integration, and citation data quality#387
feat: CLI validation, CDN drawer integration, and citation data quality#387bensonwong merged 13 commits intomainfrom
Conversation
Adds downloadUrl, urlAccessStatus, and urlVerificationError to the CDN verification data types and mapper so the popover can show download links and URL access diagnostics for injected HTML reports. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- stripExistingInjection removes dc-data, dc-key-map, init script, and CDN bundle before injecting fresh copies, preventing stale duplicates - --indicator <icon|dot|none> flag for both inject and verify-html - Validation warnings from validateCitationData logged in verify-html Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds validateCitationData and detectExtractionArtifacts utilities that catch common PDF/HTML text extraction issues before they reach the API: collapsed spaces, broken hyphens, fi-ligature loss, table fragments, missing punctuation spaces, and broken words. Based on Round 3 QA analysis of 527 citations / 71 partials. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- normalizeSnippetText fixes collapsed spaces, missing punctuation spacing, and quote boundaries in verification API snippets. Uses reference-guided spacing from fullPhrase when available, falls back to regex heuristics. - EvidenceTray shows "Exact location not specified" note when a verified citation lacks page number or line IDs. - i18n: evidence.impreciseLocation added for en, es, fr, vi. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- citationKeyStability: verifies getCitationKey determinism, sensitivity to field changes, URL citation hashing, and regression fixtures - columnDetection: validates detectColumns for single/multi-column PDFs, narrow gaps, and real-world CDC immunization schedule patterns Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Review fixes: - Extract stripExistingInjection to reportUtils.ts (single canonical location, no test duplication, no CLI side-effect import) - Tighten CDN bundle regex to require assignment (window.DeepCitation Popover=) instead of mere mention, avoiding false strips - Use initParts pattern in verifyHtml for consistency with inject - Add validateRegexInput guard in detectExtractionArtifacts (ReDoS) - Add "v." exclusion to missing-space-after-punctuation detector - Document legitimateHyphens multi-segment matching behavior - Populate citationKeyStability fixtures with frozen hash values - Add JSDoc clarifying applyReferenceSpacing preserves garbled casing - Document @filelasso/shared workspace-only import (circular dep) - Auto-format via biome Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add drawer support to the CDN popover API: bind [data-dc-drawer-trigger] elements, expose showDrawer/hideDrawer on window.DeepCitationPopover, and refresh drawer content on update(). Include React-vs-CDN comparison showcase for visual testing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix showDrawer() idempotency: unmount+remove container on hideDrawer() so next showDrawer() re-mounts with initialOpen=true (useState ignores initialProp on reconciliation) - Fix update() not refreshing the programmatic drawer portal - Replace render(null as unknown as ...) with unmountComponentAtNode - Remove nested ternary / no-op reactVariant mapping - Move cdn-comparison-showcase.html to src/vanilla/testing/ Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Map<HTMLElement, true> → Set<HTMLElement> (value was never read) - Extract renderDrawer() helper to deduplicate 4 identical render sites - Compute buildDrawerGroups() once in update() instead of 3 times - Accept pre-built groups in bind/refresh to avoid redundant computation - Remove redundant JSDoc comments restating function names Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub. 4 Skipped Deployments
|
| // Insert space after sentence-ending punctuation + uppercase: "overruled.We" → "overruled. We" | ||
| result = result.replace(/([.;:!?])([A-Z])/g, "$1 $2"); | ||
| // Insert space between letter+quote+letter: 'equal"has' → 'equal" has' | ||
| result = result.replace(/([a-zA-Z])(["'"'"])([a-zA-Z])/g, "$1$2 $3"); |
| // Insert space after sentence-ending punctuation + uppercase: "overruled.We" → "overruled. We" | ||
| result = result.replace(/([.;:!?])([A-Z])/g, "$1 $2"); | ||
| // Insert space between letter+quote+letter: 'equal"has' → 'equal" has' | ||
| result = result.replace(/([a-zA-Z])(["'"'"])([a-zA-Z])/g, "$1$2 $3"); |
✅ Playwright Test ReportStatus: Tests passed 📊 Download Report & Snapshots (see Artifacts section) What's in the Visual SnapshotsThe gallery includes visual snapshots for:
Run ID: 23725157556 |
PR Review — feat: CLI validation, CDN drawer integration, and citation data qualitySolid PR overall. The test coverage (290+ new cases, regression fixtures for hash stability) is excellent and the feature scope is well-motivated by QA data. A few issues worth addressing before merge: Bugs1.
|
- Remove columnDetection test that depends on @filelasso/shared (moved to shared package) - Replace unsafe `render(null as unknown as ...)` with `unmountComponentAtNode` - Remove unused `detectSnippetDisplayArtifacts` alias - Fix biome formatting in test files and EvidenceTray.tsx - Fix pre-existing lint error in pdfWordSpacing.test.ts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The test imports from @filelasso/shared which isn't available in CI where deepcitation runs tests in isolation. This test belongs in the shared package. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
validateCitationData.ts): detect extraction artifacts (garbled text, encoding errors, truncation markers) and flag unreliable citation data before it reaches the UInormalizeSnippetTextinreact/utils.ts): clean up mojibake, collapsed whitespace, and encoding artifacts in snippet text; show "imprecise location" note when snippet quality is degraded--indicatorflag for indicator variant selection, improved report utilitiesCitationDrawerinto the vanilla/CDN runtime with[data-dc-drawer-trigger]binding, programmaticshowDrawer()/hideDrawer()API, and proper lifecycle cleanupdownloadUrlandurlAccessStatusthrough to CDN popover content.deepcitation/CLI output directory, address prior review findingsTest plan
bun run test— all 1654 tests passbunx biome ci ./src— lint clean (pre-existing errors incolumnDetection.test.tsonly)<div data-dc-drawer-trigger></div>to static HTML example, confirm drawer renders and opensshowDrawer()/hideDrawer()programmatic API: call twice to confirm idempotency fixdeepcitation injecttwice on same file, confirm no duplication