Releases: agent-ecosystem/afdocs
v0.14.1
v0.14.0
What's new
Skip specific checks (#43)
New --skip-checks flag: an exclude-list counterpart to --checks. Useful when you want to disable one or two checks without enumerating all the others.
afdocs check https://docs.example.com --skip-checks markdown-content-parityAlso available as skipChecks in the config file and skipCheckIds in the programmatic API. Skipped checks emit status: "skip" and are excluded from scoring.
Thanks to @mvvmm for the implementation.
Auto-detect .md URL pattern (#45)
markdown-url-support tries two URL forms per page (page.md and page/index.md). After the first batch, afdocs now detects which form the site uses and tries the winning form first for subsequent batches. This reduces check runtime by ~60% for sites that use the index.md convention.
Thanks to @mvvmm for the implementation.
Bug fixes
- Normalize .md URLs during page discovery (#44):
.mdURLs fromllms.txtare now normalized to their HTML equivalents before deduplication, preventing the same page from being tested twice. Observed ~50% fewer duplicate pages and ~3x faster runs on affected sites. Thanks to @mvvmm. - Strip code blocks before HTML detection (#42):
looksLikeHtml()no longer false-positives on markdown pages that contain HTML tags inside fenced code blocks or inline code spans. Thanks to @mvvmm. - Boundary-safe stripCode (#47): Fixed a truncation risk in
stripCodewhen code block markers span chunk slice boundaries.
Docs and housekeeping
v0.13.0
What's new
Preview and staging deployment testing (#40, #41)
When running afdocs against a preview deployment (Vercel, Netlify, localhost, etc.), the site's llms.txt and sitemap.xml often contain hardcoded production URLs. Origin comparisons fail because the fetch origin and content origins don't match, causing checks like llms-txt-freshness to skip or report 0% coverage.
The new --canonical-origin flag tells afdocs which production domain to expect, so it can rewrite those URLs automatically:
afdocs check https://preview-xyz.vercel.app/docs --canonical-origin https://example.comAlso available as canonicalOrigin in the config file:
options:
canonicalOrigin: https://example.comThe rewrite happens at the HTTP layer. After every fetch, the canonical origin is replaced with the target origin in text response bodies. All downstream checks see consistent same-origin URLs without any changes to URL extraction, sitemap parsing, or origin comparison logic.
Thanks to @philip for the feature proposal and implementation.
Follow-up hardening
- Boundary-aware rewriting: the origin regex now uses a lookahead (
/, whitespace, quote,>, or end-of-string) sohttps://example.comwon't accidentally match insidehttps://example.com.evil.com - Same-origin warning: if
--canonical-originmatches the target URL, afdocs prints a warning and skips the no-op rewrite instead of silently doing nothing - CLI help text: the
--helpdescription now matches the reference docs ("The production domain your content links to") - Additional test coverage for port-based origins, partial domain protection, and repeated
text()calls
v0.12.0
What's new
Discovery source fallback
When llms.txt yields fewer URLs than --max-links, sitemap discovery now runs automatically and merges the results. llms.txt URLs come first (curated/intentional), sitemap URLs fill the remaining slots (deduplicated). Previously, discovery stopped at llms.txt even when the count was far below the requested limit.
For example, Confluent's llms.txt provides 52 URLs but their sitemap has ~4,970. With --max-links 250, the tool now discovers 550 candidate pages (52 from llms.txt + 498 from sitemap) and samples 250, compared to only 52 previously.
Discovery source tracking
A new discoverySources field in the JSON report shows which methods contributed to page discovery: llms-txt, sitemap, or fallback. This makes it easier to understand where tested URLs came from and diagnose discovery gaps.
{
"discoverySources": ["llms-txt", "sitemap"]
}Internal
- New
DiscoverySourcetype andmergeUrlSets()helper for deduplicating URLs across sources sourcesfield propagated throughPageUrlResult→SampledPages→ReportResult- All existing filtering (path prefix, locale, version dedup) preserved for both URL sources independently before merge
- Filed #38 for robots.txt Disallow filtering as a separate concern
v0.11.1
Content negotiation hardening
Fixes two sources of false positives in the content-negotiation check (#29, #33):
Soft-404 detection (#29)
Some sites return HTTP 200 with text/markdown for error pages (e.g. Next.js returns # Page Not Found as markdown). These were incorrectly counted as successful content negotiation, inflating scores.
The check now inspects the first markdown heading for error-page patterns. Responses whose title indicates a "not found" or 404 page are rejected and excluded from the page cache. The detection is heading-aware to avoid false positives on documentation that legitimately mentions "404" in body content (e.g. GitHub Pages docs about custom 404 pages).
.md URL normalization (#33)
When pages are discovered via llms.txt with .md URLs, the content-negotiation check now strips the extension and tests the canonical HTML URL instead. Previously, sending Accept: text/markdown to a .md endpoint passed trivially without actually testing whether the server supports HTTP content negotiation.
Shared helpers
SOFT_404_PATTERNSextracted tosrc/helpers/detect-soft-404.tsfor reuse across checkstoHtmlUrl()andisMdUrl()consolidated intosrc/helpers/to-md-urls.ts(previously duplicated in two checks)
v0.11.0
What's new
Locale and version preferences
New --doc-locale and --doc-version CLI flags (and corresponding preferredLocale/preferredVersion config options) let you explicitly control which locale and version URLs are selected during sitemap discovery. When not set, both are auto-detected from the base URL.
afdocs check https://docs.example.com --doc-locale fr --doc-version v3Sitemap discovery improvements
- Subpath sitemap discovery: sites that host docs under a subpath (e.g.
/docs/) now have their sitemaps discovered correctly - Sitemap index locale filtering: when a sitemap index contains per-locale sub-sitemaps, only the preferred locale is followed
- Path-prefix filtering before URL cap: URL filtering now runs before the 500-URL collection cap, so large sites get properly scoped results
- Pre-release version ranking: version deduplication now ranks
dev,next,nightly, andcanarybelow stable versions, so you get the latest stable release by default - Broader collection before dedup: URLs are collected more broadly before locale/version filtering and deduplication, then capped afterward, preventing arbitrary sitemap ordering from biasing results
Internal
- Refactored
getUrlsFromSitemapfrom positional arguments to an options object - Freshness check now skips locale/version refinement to avoid double-filtering and distorted coverage measurement
- Improved branch coverage for
get-page-urls.tsand CLI command parsing
v0.10.2
What's Changed
Details
headingFollowedByContent()now recognizes markdown table rows and raw HTML<table>/<tr>tags as content signalshtmlToMarkdown()uses theturndown-plugin-gfmtables plugin so HTML tables convert to proper markdown table syntax instead of being flattened to bare cell text- Fixes false
failresults on pages where content after a heading is in table format (closes #20)
Full Changelog: v0.10.1...v0.10.2