Crawl sites, scrape pages, render PDFs and screenshots, extract structured JSON, and convert local files to markdown using the Cloudflare Browser Rendering API.
The CLI supports three broad workflows:
- Async site crawls via
/crawl - Synchronous single-page operations such as
scrape,markdown,content,links,json,pdf,screenshot, andsnapshot - Local file conversion via Workers AI
/ai/tomarkdown
npm install
cp .env.example .env
# Fill in your Cloudflare credentials in .env
# Install the Claude Code skill
mkdir -p .claude/skills/cf-crawl
cp skill.md .claude/skills/cf-crawl/SKILL.mdYou need:
- CF_ACCOUNT_ID — found in the Cloudflare dashboard under any domain -> Overview -> right sidebar
- CF_API_TOKEN — create at https://dash.cloudflare.com/profile/api-tokens with Browser Rendering: Edit permission
npx tsx index.ts --help
npx tsx index.ts crawl example.com --limit 100
npx tsx index.ts markdown https://example.com/articleURLs can be passed with or without https://. The CLI normalizes them internally.
npx tsx index.ts crawl <url> [<url2> ...] [options]crawl submits async jobs to Cloudflare and, unless --no-wait is passed, polls until completion and downloads the results.
Options:
--render— use full browser rendering; default is fast HTML-only crawl--limit N— max pages to crawl per site, default100000--max_depth N— max link depth to follow--no-wait— submit the job and exit without polling--format json|jsonl— output format when results are downloaded
Examples:
npx tsx index.ts crawl https://example.com
npx tsx index.ts crawl example.com blog.example.com --render --limit 100
npx tsx index.ts crawl site.com --limit 1000 --format jsonl
npx tsx index.ts crawl site.com --no-waitNotes:
- Multiple URLs are submitted concurrently.
- Completed crawl downloads stream directly into the final output file; there is no separate
.partialartifact. - Cloudflare keeps crawl results for 14 days after completion.
npx tsx index.ts status <jobId>
npx tsx index.ts download <jobId> [--format json|jsonl]
npx tsx index.ts jobsstatusshows the current crawl state, finished/skipped counts, and browser seconds used when availabledownloadfetches the current result set for any crawl job ID, including still-running jobsjobsprints the locally tracked crawl job log fromoutput/jobs.jsonl
npx tsx index.ts scrape <url> [<url2> ...] [--wait N]Extracts:
- title
- meta description
- headings (
h1,h2,h3) - paragraphs
- list items
- table cells (
td,th) - links
- images
/scrape always uses browser rendering. --wait N adds a fixed delay before extraction; without it, the command waits for an h1.
npx tsx index.ts markdown <url> [<url2> ...]Converts live pages to clean markdown via /markdown. One .md file is written per URL.
npx tsx index.ts content <url> [<url2> ...]Fetches fully rendered HTML from /content and saves it as .html.
npx tsx index.ts links <url> [<url2> ...] [--visible-only] [--exclude-external]Returns the page hyperlinks as JSON.
--visible-only— include only visible links--exclude-external— exclude off-domain links
npx tsx index.ts json <url> [<url2> ...] --prompt "..." [--schema ./schema.json]Runs the /json endpoint with a required natural-language prompt.
--prompt "..."— required extraction instruction--schema <path>— optional JSON schema file; sent asresponse_format: { type: "json_schema", json_schema: ... }
Example:
npx tsx index.ts json https://example.com --prompt "Extract the page title and main CTA"
npx tsx index.ts json https://example.com --prompt "Extract product details" --schema ./schema.jsonnpx tsx index.ts pdf <url> [<url2> ...]Renders each page to a PDF via /pdf.
npx tsx index.ts screenshot <url> [<url2> ...] [--full-page] [--format png|jpeg|webp]--full-page— capture the full scrollable page--format png|jpeg|webp— image format, defaultpng
When jpeg is selected, the file is saved with a .jpg extension.
npx tsx index.ts snapshot <url> [<url2> ...]Calls /snapshot and saves the JSON response containing rendered HTML plus a base64 screenshot.
npx tsx index.ts tomarkdown <file> [<file2> ...]Uploads local files such as PDFs, DOCX files, images, or HTML to Workers AI /ai/tomarkdown and writes one .md file per successful result.
Notes:
- This command accepts only local file paths
- Live
http(s)URLs are rejected intentionally - Use
markdownfor live webpages
npm run crawl -- <url> [<url2> ...]
npm run crawl:render -- <url> [<url2> ...]
npm run scrape -- <url> [<url2> ...]
npm run markdown -- <url> [<url2> ...]
npm run content -- <url> [<url2> ...]
npm run links -- <url> [<url2> ...] [--visible-only] [--exclude-external]
npm run json -- <url> [<url2> ...] --prompt "..." [--schema ./schema.json]
npm run pdf -- <url> [<url2> ...]
npm run screenshot -- <url> [<url2> ...] [--full-page] [--format png|jpeg|webp]
npm run snapshot -- <url> [<url2> ...]
npm run tomarkdown -- <file> [<file2> ...]
npm run status -- <jobId>
npm run download -- <jobId> [--format json|jsonl]
npm run jobsResults are written to output/ with timestamped names:
crawl_{hostname}_{shortJobId}_{timestamp}.jsoncrawl_{hostname}_{shortJobId}_{timestamp}.jsonlscrape_{slug}_{timestamp}.jsonmarkdown_{slug}_{timestamp}.mdcontent_{slug}_{timestamp}.htmllinks_{slug}_{timestamp}.jsonjson_{slug}_{timestamp}.jsonpdf_{slug}_{timestamp}.pdfscreenshot_{slug}_{timestamp}.pngscreenshot_{slug}_{timestamp}.jpgscreenshot_{slug}_{timestamp}.webpsnapshot_{slug}_{timestamp}.jsontomarkdown_{stem}_{timestamp}.mdjobs.jsonl
npm run build # compile TypeScript to dist/
node dist/index.js --help # run the compiled CLI
npm test
npm run typecheck
npm run lint
npm run format
npm run format:checkStandalone bash equivalents remain in scripts/ for quick one-off reference runs:
./scripts/crawl.sh <url>
./scripts/scrape.sh <url>They are not kept at full feature parity with the TypeScript CLI.