Cloudflare Browser Run CLI — Firecrawl-compatible, cost-efficient at scale.
CLI that wraps Cloudflare's Browser Run API with the same command structure as Firecrawl. Supports scraping, crawling, URL discovery, screenshots, PDFs, and AI-powered data extraction — all running on Cloudflare's headless Chromium infrastructure. Now with direct Chrome DevTools Protocol (CDP) access for persistent browser sessions, real-time debugging, and authenticated scraping. Cost-efficient alternative for high-volume use cases (free 10 min/day, then $0.09/hr).
| Version | Date | Changes |
|---|---|---|
| v0.22.0 | 2026-04-21 | Secure credential storage. OS keyring via flarecrawl[secure] (Forma protocol §07). Auto-migrates legacy plaintext config.json. Priority: env > keyring > .env > legacy. 1112 tests |
| v0.21.0 | 2026-04-20 | Auth + crawl fixes. --browser-cookies on scrape/interact/design (was videos-only). --session on crawl. --ignore-robots made actionable. Live test suite for design extraction |
| v0.20.0 | 2026-04-20 | Actionable CDP errors. _enrich_cdp_error detects bot detection, timeouts, redirects, network errors, auth failures and appends suggestions. CHANGELOG.md restored as source of truth |
| v0.19.0 | 2026-04-19 | Video discovery. flarecrawl videos finds video URLs (mp4, webm, m3u8, YouTube, Vimeo embeds, OpenGraph, JSON-LD). --export-cookies for yt-dlp Netscape cookie format. Works behind login with --session/--interactive. Pipe to yt-dlp for downloading |
| v0.18.0 | 2026-04-18 | Security hardening. Path-traversal fix on --resume JOB_ID, blocks non-http(s) URLs, caps robots.txt/sitemap downloads. Crawl loop refactored. PEP 561 py.typed marker. 1027 tests |
For older releases, see CHANGELOG.md.
| Firecrawl | Flarecrawl | |
|---|---|---|
| Pricing | Per-page credits ($99/100K) | Time-based (free 10 min/day, then $0.09/hr) |
| Crossover | Cheaper below ~8K pages/mo | 11x cheaper at 100K pages/mo |
| JS rendering | Yes | Yes (headless Chromium on edge) |
| Persistent browser sessions | No | Yes (--cdp --keep-alive) |
| Live browser debugging | No | Yes (--live-view via Chrome DevTools) |
| Interactive auth (OAuth/2FA/CAPTCHA) | No | Yes (--interactive — login in DevTools) |
| Form filling | No | Yes (interact command with human-like timing) |
| Session recordings | No | Yes (--record — rrweb format, 30-day retention) |
| WebMCP tool discovery | No | Yes (webmcp discover/call) |
| AI extraction | Spark models | Workers AI (included) |
| Agent-safe sanitisation | No | Yes (--agent-safe — 13 sanitisers, DeepMind taxonomy) |
| Paywall bypass | Limited | Multi-strategy cascade (--paywall --stealth) |
| Content negotiation | No | Yes (auto Accept: text/markdown, zero browser time) |
| PDF generation | No | Yes |
| Web search | Yes | Yes (Jina Search API) |
| Cookie persistence | No | Yes (--save-cookies/--load-cookies) |
| Real HAR capture | No | Yes (CDP Network.enable) |
| Custom CDP backends | No | Yes (FLARECRAWL_CDP_ENDPOINT — Oxylabs, Bright Data, local Chrome) |
| Batch mode | Limited | Yes (file input, NDJSON output, up to 50 workers) |
| Direct HTTP spider | Single-engine | flarecrawl spider — direct HTTP, no browser cost, 50-100 concurrent, built for millions of URLs |
| Resume interrupted crawls | No | Yes (--resume JOB_ID) — kill the process, come back hours later, pick up exactly where you left off |
| Weekly-refresh mode | No | Yes (--refresh-days 7) — only re-fetch pages that changed since last run; unchanged = zero browser cost |
| Fair multi-domain scheduling | No | Yes — round-robin so one fast site can't starve 53,999 others |
| Adaptive politeness | No | Yes (--adaptive-delay) — automatically slows on slow hosts, honours Retry-After |
| Auto-retry with backoff | Limited | Yes — per-URL retry budget, exp backoff, dead-letter inspection |
| Circuit breaker per domain | No | Yes — pauses domains after 10 consecutive failures |
| robots.txt compliance | Partial | Yes (protego, with per-host crawl-delay) |
| URL canonicalisation | No | Yes — strips utm_*, gclid, fbclid, etc. so tracking-param permutations don't duplicate crawls |
| OpenTelemetry tracing | No | Yes (--tracing console|json|otlp) |
| Self-hosted | Yes | Cloudflare infrastructure (or any CDP backend) |
| Favicon extraction | Via branding | Dedicated command |
| Branding extraction | Yes | Not yet |
Different pricing models suit different use cases. Flarecrawl's time-based pricing is particularly cost-efficient for high-volume crawls.
Flarecrawl is often the first thing an AI agent sees when it reads the web.
--agent-safe sanitises scraped content against adversarial attacks
(AI Agent Traps, Google DeepMind 2026)
before it enters an LLM context window or RAG pipeline. 13 sanitisers
defend against hidden text injection, prompt injection, and semantic
manipulation — with findings reported in structured JSON metadata.
The hardest scraping problem isn't parsing HTML — it's getting past the login
page. --interactive opens a real browser in Chrome DevTools where you
complete OAuth flows, solve CAPTCHAs, or handle 2FA manually. Cookies are
auto-saved and reused on subsequent scrapes. No more extracting tokens from
browser DevTools and pasting them into headers.
Batch mode with up to 50 parallel workers (--workers 50) scrapes thousands
of pages concurrently. Combine with --paywall (multi-strategy extraction
cascade), --stealth (browser TLS fingerprinting), and --agent-safe for
production pipelines that handle the real web — paywalls, bot detection, and
adversarial content included.
--diff compares current content against the cached version. Combine with
--har for network-level visibility into what changed. --record saves
full session recordings for audit trails. Use --keep-alive for cost-efficient
repeated checks on the same site.
flarecrawl download saves entire sites as markdown files. flarecrawl discover
finds every URL via sitemaps, RSS feeds, and link crawling. Content negotiation
(Accept: text/markdown) fetches server-rendered markdown directly from
compatible sites — zero browser time, higher quality output.
flarecrawl extract uses Cloudflare Workers AI for natural language data
extraction ("Get all product names and prices"). flarecrawl schema extracts
LD+JSON, OpenGraph, and Twitter Cards. flarecrawl openapi discovers and
downloads API specifications.
Flarecrawl v0.14.1 adds direct Chrome DevTools Protocol (CDP) access via
Cloudflare's Browser Run WebSocket endpoint. This is an opt-in mode (--cdp)
that gives you a persistent, controllable browser session instead of
fire-and-forget REST calls.
REST (default) — each command spins up a fresh browser, navigates, extracts, and destroys. Stateless and cheap. Good for 90% of scraping.
CDP (--cdp) — you get a live browser session that stays open. Navigate,
interact, inspect, then navigate again — all within the same browser context.
The browser remembers cookies, localStorage, and DOM state between operations.
| Scenario | Why CDP |
|---|---|
| Scraping behind login | --interactive opens DevTools, you log in manually (OAuth, 2FA, CAPTCHA), cookies auto-saved for future scrapes |
| Debugging failed scrapes | --live-view opens Chrome DevTools pointed at the remote browser — see console errors, inspect DOM, watch network |
| Complex JS execution | --js-eval via CDP uses real Runtime.evaluate — async/await works, returns typed objects, not the REST addScriptTag hack |
| Multi-step workflows | --keep-alive 60 holds the browser open — scrape, screenshot, extract without re-navigating (1/3 the browser time cost) |
| Network debugging | --har captures real browser network traffic (redirects, blocked resources, timing waterfall), not just flarecrawl API metadata |
| Audit trails | --record saves rrweb session recordings — replay exactly what the browser did |
# Interactive auth: login in DevTools, cookies saved, then scrape
flarecrawl scrape https://private-app.example.com --interactive --json
# Debug a scrape in real-time
flarecrawl scrape https://broken-site.com --live-view
# Proper JS execution (async, typed returns)
flarecrawl scrape https://spa.example.com --cdp --js-eval "await fetch('/api/data').then(r => r.json())"
# Persistent session: navigate once, do multiple things
flarecrawl scrape https://example.com --cdp --keep-alive 120 \
--save-cookies session.json --har traffic.har --json
# Record a session for debugging later
flarecrawl scrape https://example.com --record --record-output debug.json
# Reuse saved cookies for authenticated scraping
flarecrawl scrape https://private-app.example.com --load-cookies session.json --cdp --json
# Manage CDP sessions
flarecrawl cdp sessions --json
flarecrawl cdp closeCDP requires the websockets package (optional dependency):
uv pip install websockets
# or with extras
uv pip install flarecrawl[cdp]Without websockets, all non-CDP features work normally. The --cdp flag
will print a clear install instruction if the dependency is missing.
git clone https://github.com/0xDarkMatter/flarecrawl.git
cd flarecrawl
uv tool install --editable .
flarecrawl auth login
flarecrawl scrape https://example.com
flarecrawl scrape https://example.com --json | jq '.data.content'| Command | Description |
|---|---|
flarecrawl scrape URL |
Scrape page to markdown (or html/links/images/summary) |
flarecrawl crawl URL --wait --limit N |
Crawl site with async job system |
flarecrawl download URL --limit N |
Save site pages to disk as markdown/html |
flarecrawl extract PROMPT --urls URL |
AI-powered structured data extraction |
flarecrawl fetch URL -o file |
Content-type aware download (binary, JSON, HTML) |
flarecrawl openapi URL --probe |
Discover OpenAPI/Swagger specs on a site |
flarecrawl screenshot URL -o page.png |
Capture full or partial page screenshots |
flarecrawl pdf URL -o page.pdf |
Render page as PDF |
flarecrawl map URL |
List all links on a page |
flarecrawl discover URL |
Discover URLs via sitemaps, feeds, and links |
flarecrawl search QUERY |
Web search via Jina API |
flarecrawl schema URL |
Extract LD+JSON, OpenGraph, Twitter Cards |
flarecrawl favicon URL |
Extract favicon/icon URLs |
flarecrawl session list |
Manage saved cookie sessions |
git clone https://github.com/0xDarkMatter/flarecrawl.git
cd flarecrawl
uv tool install --editable .By default, flarecrawl stores credentials in ~/.config/flarecrawl/config.json (plaintext). For OS-native keyring storage:
uv tool install --editable . --with keyring
# or: uv pip install flarecrawl[secure]With keyring installed, tokens move to:
- Windows: Credential Manager
- macOS: Keychain
- Linux: Secret Service / GNOME Keyring / KWallet
Existing plaintext credentials are migrated automatically on first use.
- Go to https://dash.cloudflare.com/profile/api-tokens
- Click Create Token
- Select Create Custom Token
- Configure:
- Token name:
Flarecrawl(or anything) - Permissions: Account → Browser Rendering → Edit
- Account Resources: Include → your account
- Token name:
- Click Continue to summary → Create Token
- Copy the token (shown only once)
- Go to https://dash.cloudflare.com
- Click any domain (or the account overview)
- Look in the right sidebar under Account ID
- Copy the 32-character hex string
# Interactive — opens browser to Cloudflare dashboard for guided setup
flarecrawl auth login
# Non-interactive (CI/CD)
flarecrawl auth login --account-id YOUR_ACCOUNT_ID --token YOUR_TOKENflarecrawl auth status
flarecrawl --status # Shows auth + pricing infoexport FLARECRAWL_ACCOUNT_ID="your-account-id"
export FLARECRAWL_API_TOKEN="your-api-token"# Default: markdown output to stdout
flarecrawl scrape https://example.com
# HTML output
flarecrawl scrape https://example.com --format html
# JSON envelope (for piping)
flarecrawl scrape https://example.com --json
# Multiple URLs (scraped concurrently)
flarecrawl scrape https://a.com https://b.com https://c.com --json
# Batch mode: file input with NDJSON output and configurable workers
flarecrawl scrape --batch urls.txt --workers 5
# From a file of URLs (backward-compatible alias for --batch)
flarecrawl scrape --input urls.txt --json
# With timing info
flarecrawl scrape https://example.com --timing
# Filter JSON fields
flarecrawl scrape https://example.com --json --fields url,content
# Extract links only
flarecrawl scrape https://example.com --format links --json
# Take screenshot via scrape
flarecrawl scrape https://example.com --screenshot -o page.png
# Wait for JS rendering (SPAs, Swagger UIs)
flarecrawl scrape https://example.com --js
# Bypass response cache
flarecrawl scrape https://example.com --no-cache
# Custom page load strategy
flarecrawl scrape https://example.com --wait-until networkidle2Formats: markdown (default), html, links, screenshot, json (AI extraction)
All commands support --auth user:password for sites protected by HTTP Basic Auth:
flarecrawl scrape https://intranet.example.com --auth admin:secret
flarecrawl crawl https://intranet.example.com --wait --limit 50 --auth admin:secret
flarecrawl download https://intranet.example.com --limit 20 --auth user:pass
flarecrawl screenshot https://intranet.example.com --auth user:pass -o page.png# Strip nav/header/footer, keep main article content
flarecrawl scrape https://example.com --only-main-content
# Keep only specific CSS selectors
flarecrawl scrape https://example.com --include-tags "article,.post"
# Remove specific elements
flarecrawl scrape https://example.com --exclude-tags "nav,footer,.sidebar"# Custom HTTP headers
flarecrawl scrape https://example.com --headers "Accept-Language: fr"
flarecrawl scrape https://example.com --headers '{"X-Api-Key": "abc123"}'
# Custom User-Agent (identify your crawler, or try bypassing paywalls)
flarecrawl scrape https://example.com --user-agent "MyBot/1.0 (contact@example.com)"
flarecrawl scrape https://paywalled.example.com --user-agent "Googlebot/2.1"
# Mobile device emulation (iPhone 14 Pro viewport)
flarecrawl scrape https://example.com --mobile
flarecrawl screenshot https://example.com --mobile -o mobile.png# Extract all image URLs from a page
flarecrawl scrape https://example.com --format images --json
# AI-powered content summary
flarecrawl scrape https://example.com --format summary --json
# Extract LD+JSON, OpenGraph, Twitter Cards
flarecrawl scrape https://example.com --format schema --json
# Dedicated schema command with type filtering
flarecrawl schema https://example.com --json
flarecrawl schema https://example.com --type ld-json --json
flarecrawl schema https://example.com --type opengraph --json# POST crawl results to a URL when complete
flarecrawl crawl https://example.com --wait --limit 10 --webhook https://hooks.example.com/crawl
# With custom headers (e.g. auth token)
flarecrawl crawl https://example.com --wait --limit 10 \
--webhook https://hooks.example.com/crawl \
--webhook-headers "Authorization: Bearer token123"# Extract content from specific CSS selector
flarecrawl scrape https://example.com --selector "main" --json
# Wait for a CSS element before capturing (SPAs, lazy-load)
flarecrawl scrape https://example.com --wait-for-selector ".loaded" --json
# Run JavaScript and return the result
flarecrawl scrape https://example.com --js-eval "document.title" --json
flarecrawl scrape https://example.com --js-eval "document.querySelectorAll('a').length" --json# Process local HTML without API call
cat page.html | flarecrawl scrape --stdin --only-main-content
curl https://example.com | flarecrawl scrape --stdin --format schema --json
# Save request metadata to HAR file
flarecrawl scrape https://example.com --har requests.har --json
# Save raw HTML alongside output (for archival/reprocessing)
flarecrawl scrape https://example.com --backup-dir ./html-backup
flarecrawl download https://example.com --limit 20 --backup-dir ./html-backup# Discover all URLs via sitemaps, RSS feeds, and page links
flarecrawl discover https://example.com --json
# Sitemaps only
flarecrawl discover https://example.com --no-feed --no-links --json
# With limit
flarecrawl discover https://example.com --limit 100 --json# Remove cookie banners, GDPR modals, newsletter popups
flarecrawl scrape https://eu-site.example.com --magic
# Request content in a specific language
flarecrawl scrape https://example.com --language de
# Fallback to Internet Archive if page returns 404
flarecrawl scrape https://dead-link.example.com --archivedMulti-strategy content extraction that applies per-site optimisations before falling back to standard browser rendering. Useful for sites that serve content in SSR HTML but hide it with client-side JavaScript, or sites behind bot detection that block default request patterns.
# Enhanced extraction with multi-strategy cascade
flarecrawl scrape https://example.com/article --paywall
# JSON output shows which extraction strategy worked
flarecrawl scrape https://example.com/article --paywall --json
# metadata.source indicates the strategy used
# Batch mode
flarecrawl scrape --batch urls.txt --paywall --workers 5
# No Cloudflare auth required (direct HTTP strategies)
# With auth: falls through to browser rendering if direct strategies failStrategies are tried in order of speed and cost. Direct HTTP strategies consume zero browser time. When Cloudflare auth is configured, per-site header optimisations are also applied to browser rendering requests as a fallback.
Optional dependency for improved compatibility: pip install curl_cffi
Opt-in browser TLS fingerprint impersonation for direct HTTP requests. Uses
curl_cffi to send requests with a real Safari/Chrome TLS handshake, avoiding
bot detection systems that fingerprint JA3/JA4 hashes.
# Stealth mode for content negotiation and direct fetches
flarecrawl scrape https://example.com --stealth
# Combine with paywall extraction
flarecrawl scrape https://example.com --paywall --stealth --json
# Batch mode
flarecrawl scrape --batch urls.txt --stealth --workers 5Requires: pip install curl_cffi
Without --stealth, requests use Python's default TLS stack (httpx/OpenSSL)
which is identifiable by bot detection systems. With --stealth, the TLS
Client Hello is indistinguishable from a real Safari browser.
Markdown output is automatically cleaned of common ad placeholders, share buttons, newsletter prompts, copyright lines, and navigation chrome. No flag needed - applied to all markdown output by default.
For HTML output, use --clean to strip ad/promo DOM elements:
# Strip ad containers, social share widgets, cookie banners from HTML
flarecrawl scrape https://example.com --format html --clean --jsonSanitise scraped content against adversarial AI agent traps. See Agent Safety for full details.
flarecrawl scrape https://example.com --agent-safe --json
flarecrawl scrape https://example.com --agent-safe --paywall --stealth --json
flarecrawl crawl https://example.com --wait --limit 50 --agent-safeSearch the web and optionally scrape each result. Uses Jina Search API.
# Search and get results as JSON
flarecrawl search "python web scraping" --json
# Search and scrape each result through the normal pipeline
flarecrawl search "topic" --scrape --limit 5 --paywall --json
# Pipeline: search -> extract URLs -> batch scrape
flarecrawl search "query" --json | jq -r '.data[].url' > urls.txt
flarecrawl scrape --batch urls.txt --paywall --stealth --workers 5Requires JINA_API_KEY env var (free at https://jina.ai/api-key).
With --scrape, all scrape flags apply (--paywall, --stealth, --only-main-content, --clean).
Route all HTTP requests through a proxy. Affects CLI-to-Cloudflare API connections and direct HTTP (stealth, negotiate, paywall). Does NOT affect the Cloudflare browser-to-target connection (CF's browser uses its own IP).
# HTTP proxy
flarecrawl scrape https://example.com --proxy http://proxy:8080
# SOCKS5 proxy
flarecrawl scrape https://example.com --proxy socks5://localhost:9050
# Set default via env var
export FLARECRAWL_PROXY=socks5://localhost:9050
flarecrawl scrape https://example.com # uses env var proxySupported on all commands: scrape, crawl, download, extract, search.
Customisable per-site header rules for enhanced extraction. Default rules ship
with the package; user overrides are loaded from ~/.config/flarecrawl/rules.yaml.
# List all loaded rules
flarecrawl rules list --json
# Show rules for a domain
flarecrawl rules show www.nytimes.com
# Add a custom rule
flarecrawl rules add example.com --referer https://www.google.com/ --cookie "auth=1"
# Show file paths
flarecrawl rules pathRules use Ladder-compatible YAML format:
- domain: www.example.com
headers:
Referer: "https://www.google.com/"
Cookie: "session=abc"
- domains:
- a.com
- b.com
headers:
Referer: "https://t.co/x?amp=1"Sites on Cloudflare (Pro+) can serve markdown directly via Accept: text/markdown
content negotiation. Flarecrawl auto-detects this on every scrape — when a site
supports it, content is fetched via a simple HTTP GET instead of headless Chromium.
# Auto-detect (default) — tries content negotiation first
flarecrawl scrape https://blog.cloudflare.com/some-post
# Force browser rendering (skip negotiation)
flarecrawl scrape https://blog.cloudflare.com/some-post --no-negotiate
# JSON output shows the source
flarecrawl scrape https://blog.cloudflare.com/some-post --json
# metadata.source: "content-negotiation" (no browser) or "browser-rendering"
# metadata.markdownTokens: 1234 (from x-markdown-tokens header)
# metadata.contentSignal: {"ai-train": "yes", ...}Benefits when negotiation succeeds:
- Faster — ~100-200ms vs 2-3s for browser rendering
- Cheaper — zero browser time consumed
- Higher quality — server-side conversion by the site owner
- Domain cached — one probe per domain, batch-friendly
# Compare current content against cached version
flarecrawl scrape https://example.com --diff --json# Start crawl and wait for results
flarecrawl crawl https://example.com --wait --limit 50
# With progress indicator
flarecrawl crawl https://example.com --wait --progress --limit 100
# Fire and forget (returns job ID)
flarecrawl crawl https://example.com --limit 50
# Check status of running crawl
flarecrawl crawl JOB_ID --status
# Get results from completed crawl
flarecrawl crawl JOB_ID
# Filter paths
flarecrawl crawl https://docs.example.com --wait --limit 200 \
--include-paths "/docs,/api" --exclude-paths "/zh,/ja"
# Stream results as NDJSON (one record per line)
flarecrawl crawl JOB_ID --ndjson --fields url,markdown
# Skip JS rendering for faster crawl
flarecrawl crawl https://example.com --wait --limit 100 --no-render
# Follow subdomains
flarecrawl crawl https://example.com --wait --allow-subdomains
# Save to file
flarecrawl crawl https://example.com --wait --limit 50 -o results.json# List all links on a page
flarecrawl map https://example.com
# JSON output
flarecrawl map https://example.com --json
# Include subdomains
flarecrawl map https://example.com --include-subdomains
# Limit results
flarecrawl map https://example.com --limit 20 --json# Download as markdown files to .flarecrawl/
flarecrawl download https://docs.example.com --limit 50
# Download as HTML
flarecrawl download https://example.com --limit 20 --format html
# Filter paths
flarecrawl download https://docs.example.com --limit 100 \
--include-paths "/docs" --exclude-paths "/changelog"
# Skip confirmation prompt
flarecrawl download https://example.com --limit 10 -yFiles are saved to .flarecrawl/<domain>/ with sanitized filenames.
# Extract structured data with a natural language prompt
flarecrawl extract "Get all product names and prices" \
--urls https://shop.example.com --json
# With JSON schema for structured output
flarecrawl extract "Extract article metadata" \
--urls https://blog.example.com \
--schema '{"type":"json_schema","schema":{"type":"object","properties":{"title":{"type":"string"},"date":{"type":"string"}}}}'
# Schema from file
flarecrawl extract "Extract data" --urls https://example.com --schema-file schema.json
# Multiple URLs
flarecrawl extract "Get page title" --urls https://a.com,https://b.com --json
# Batch mode: parallel extraction with NDJSON output
flarecrawl extract "Get page title" --batch urls.txt --workers 5Uses Cloudflare Workers AI for extraction (no additional cost).
# Default: saves to screenshot.png
flarecrawl screenshot https://example.com
# Custom output path
flarecrawl screenshot https://example.com -o hero.png
# Full page
flarecrawl screenshot https://example.com -o full.png --full-page
# Specific element
flarecrawl screenshot https://example.com --selector "main" -o main.png
# Custom viewport
flarecrawl screenshot https://example.com --width 1440 --height 900 -o wide.png
# JPEG format
flarecrawl screenshot https://example.com --format jpeg -o page.jpg
# JSON output (base64 encoded)
flarecrawl screenshot https://example.com --json# Default: saves to page.pdf
flarecrawl pdf https://example.com
# Custom output
flarecrawl pdf https://example.com -o report.pdf
# Landscape A4
flarecrawl pdf https://example.com -o report.pdf --landscape --format a4
# JSON output (base64 encoded)
flarecrawl pdf https://example.com --json# Get the best (largest) favicon
flarecrawl favicon https://example.com
# Show all found icons
flarecrawl favicon https://example.com --all
# JSON output
flarecrawl favicon https://example.com --json
flarecrawl favicon https://example.com --all --jsonRenders the page, parses <link rel="icon">, <link rel="apple-touch-icon">, and related tags. Returns the largest icon found. Falls back to /favicon.ico if no <link> tags found.
# Show today's usage and history
flarecrawl usage
# JSON output
flarecrawl usage --jsonTracks the X-Browser-Ms-Used header from each API response locally. Free tier is 600,000ms (10 minutes) per day.
flarecrawl auth login # Interactive
flarecrawl auth login --account-id ID --token TOKEN # Non-interactive
flarecrawl auth status # Human-readable
flarecrawl auth status --json # Machine-readable
flarecrawl auth logout # Clear credentialsflarecrawl cache status # Show entries, size, path
flarecrawl cache status --json # Machine-readable
flarecrawl cache clear # Remove all cached responsesResponses are cached for 1 hour by default. Use --no-cache on any command to bypass.
flarecrawl negotiate status # Show domains that support text/markdown
flarecrawl negotiate status --json # Machine-readable
flarecrawl negotiate clear # Reset domain cacheTracks which domains respond to Accept: text/markdown. Positive results cached 7 days, negative 24 hours.
- Response caching — 1-hour TTL, saves redundant browser renders
- Connection pooling — persistent httpx session with HTTP/2 support
- Resource rejection — skips images/fonts/media/stylesheets for text extraction
- JS rendering — opt-in via
--jsflag (waits for networkidle0)
| Variable | Default | Description |
|---|---|---|
FLARECRAWL_ACCOUNT_ID |
— | Cloudflare account ID |
FLARECRAWL_API_TOKEN |
— | Cloudflare API token |
FLARECRAWL_CACHE_TTL |
3600 | Cache TTL in seconds |
FLARECRAWL_MAX_RETRIES |
3 | Max retry attempts |
FLARECRAWL_MAX_WORKERS |
10 | Max parallel workers |
FLARECRAWL_TIMEOUT |
120 | Request timeout in seconds |
FLARECRAWL_PROXY |
- | Default proxy URL (http/https/socks5) |
JINA_API_KEY |
- | Jina API key for search command |
Flarecrawl follows the same command structure as the firecrawl CLI:
| firecrawl command | flarecrawl equivalent | Notes |
|---|---|---|
firecrawl scrape URL |
flarecrawl scrape URL |
Same flags |
firecrawl scrape URL1 URL2 |
flarecrawl scrape URL1 URL2 |
Concurrent |
firecrawl crawl URL --wait |
flarecrawl crawl URL --wait |
Same flags |
firecrawl map URL |
flarecrawl map URL |
Same flags |
firecrawl download URL |
flarecrawl download URL |
Saves to .flarecrawl/ |
firecrawl agent PROMPT |
flarecrawl extract PROMPT |
Uses Workers AI |
firecrawl credit-usage |
flarecrawl usage |
Local tracking |
firecrawl search QUERY |
flarecrawl search QUERY |
Via Jina Search API |
firecrawl --status |
flarecrawl --status |
Same |
searchcommand — uses Jina Search API (requiresJINA_API_KEY), with--scrapeto scrape resultsextractinstead ofagent— same concept, different name to avoid confusionfaviconcommand — bonus: extract favicon/apple-touch-icon URLs from pagesschemacommand — bonus: extract LD+JSON, OpenGraph, Twitter Cards- PDF command — bonus: Cloudflare supports PDF rendering, Firecrawl doesn't
- Output directory —
.flarecrawl/instead of.firecrawl/ --only-main-content— client-side via BeautifulSoup (Firecrawl uses server-side extraction)
All --json output follows a consistent envelope:
{
"data": { ... },
"meta": { "format": "markdown", "count": 1 }
}Errors:
{
"error": { "code": "AUTH_REQUIRED", "message": "Not authenticated..." }
}| Code | Meaning | Action |
|---|---|---|
| 0 | Success | Continue |
| 1 | Error | Check stderr for details |
| 2 | Auth required | Run flarecrawl auth login |
| 3 | Not found | Check job ID |
| 4 | Validation | Fix arguments |
| 5 | Forbidden | Check token permissions |
| 7 | Rate limited | Wait and retry |
Commands that operate on multiple URLs support batch mode with configurable parallelism.
# Plain text file (one URL per line, # comments supported)
flarecrawl scrape --batch urls.txt --workers 5
# JSON array
flarecrawl scrape --batch urls.json
# NDJSON (one JSON object per line)
flarecrawl extract "Get title" --batch urls.ndjson --workers 3Input format is auto-detected: starts with [ → JSON array, starts with { → NDJSON, otherwise plain text.
Batch mode outputs NDJSON (one JSON object per line) with index correlation:
{"index": 0, "status": "ok", "data": {"url": "https://a.com", "content": "...", "elapsed": 1.2}}
{"index": 1, "status": "error", "error": {"code": "TIMEOUT", "message": "Request timed out..."}}
{"index": 2, "status": "ok", "data": {"url": "https://c.com", "content": "...", "elapsed": 0.8}}Results are sorted by index. Failed URLs don't stop processing — errors are reported inline.
| Flag | Default | Max | Notes |
|---|---|---|---|
--workers / -w |
3 | 10 | Matches CF paid tier concurrency limit |
# Conservative (free tier: 3 concurrent browsers)
flarecrawl scrape --batch urls.txt --workers 3
# Aggressive (paid tier: up to 10)
flarecrawl scrape --batch urls.txt --workers 10| Command | --batch |
--workers |
Notes |
|---|---|---|---|
scrape |
Yes | Yes | Also supports --input (alias) |
extract |
Yes | Yes | Supplements --urls |
crawl |
No | No | Has its own async job system |
screenshot |
No | No | Single URL |
pdf |
No | No | Single URL |
Every command supports --body to send a raw JSON payload directly to the CF API, bypassing all flag processing:
flarecrawl scrape --body '{
"url": "https://example.com",
"gotoOptions": {"waitUntil": "networkidle0", "timeout": 60000},
"rejectResourceTypes": ["image", "media"]
}' --json# Map URLs then batch scrape them
flarecrawl map https://docs.example.com --json | \
jq -r '.data[]' | head -10 > urls.txt
flarecrawl scrape --batch urls.txt --workers 5
# Crawl and extract just the markdown
flarecrawl crawl https://example.com --wait --limit 10 --json | \
jq -r '.data.records[] | select(.status=="completed") | .markdown'
# Stream crawl results through jq
flarecrawl crawl JOB_ID --ndjson --fields url,markdown | \
jq -r '.url + "\t" + (.markdown | length | tostring)'Requests automatically retry up to 3 times on HTTP 429 (rate limited), 502, and 503 errors with exponential backoff. Timeouts also trigger retries.
Flarecrawl is often the perception layer for AI agent workflows - scraped
content flows directly into LLM context windows and RAG systems. The
--agent-safe flag sanitises content against adversarial attacks before it
reaches the consuming agent.
This implementation is informed by AI Agent Traps (Franklin, Tomasev, Jacobs, Leibo, Osindero - Google DeepMind, March 2026), which identifies six categories of adversarial attacks against autonomous AI agents navigating the web:
| Category | Target | Flarecrawl Defence |
|---|---|---|
| Content Injection | Agent perception | Defended - hidden text, comments, attributes, unicode tricks, iframes, hidden inputs, CSS class hiding, meta tags |
| Semantic Manipulation | Agent reasoning | Defended - urgency clusters and authority claims flagged (not removed) |
| Behavioural Control | Agent actions | Defended - prompt injection patterns detected and stripped |
| Cognitive State | Agent memory/RAG | Upstream - content provenance via metadata.source |
| Systemic | Multi-agent networks | Upstream - outside scraping layer |
| Human-in-the-Loop | Human supervisor | Upstream - outside scraping layer |
# Sanitise content for safe agent consumption
flarecrawl scrape https://example.com --agent-safe --json
# Combine with extraction flags
flarecrawl scrape https://example.com --agent-safe --paywall --stealth --json
# Batch mode
flarecrawl scrape --batch urls.txt --agent-safe --workers 5
# All commands support --agent-safe
flarecrawl crawl https://example.com --wait --limit 50 --agent-safe
flarecrawl download https://docs.example.com --limit 50 --agent-safe
flarecrawl extract "Get products" --urls https://shop.example.com --agent-safe --json
# Stdin pipe (sanitises local HTML)
cat page.html | flarecrawl scrape --stdin --agent-safe --jsonContent passes through 13 sanitisers in two phases:
Phase 1 (HTML) - 8 sanitisers run on the DOM before markdown conversion:
| Sanitiser | Attack Vector | Action |
|---|---|---|
| Hidden text | CSS hiding (display:none, opacity:0, off-screen, clip-path, color:transparent, etc.) |
Removed |
| HTML comments | Instruction-like content in <!-- --> comments |
Removed |
| Suspicious attributes | Injection patterns in data-*, aria-label, alt, title |
Removed |
| Unicode tricks | Zero-width characters, bidirectional text overrides | Removed |
| Hidden iframes | <iframe src="..."> with external content |
Removed |
| Hidden inputs | <input type="hidden" value="..."> with instruction patterns |
Removed |
| CSS class hiding | .d-none, .hidden, [hidden] attribute with injection content |
Removed |
| Meta injection | Custom <meta> tags with adversarial content (standard tags preserved) |
Removed |
Phase 2 (Text) - 5 sanitisers run on markdown after conversion:
| Sanitiser | Attack Vector | Action |
|---|---|---|
| Prompt injection | "ignore previous instructions", "SYSTEM:", role-play, delimiter injection | Removed |
| Semantic manipulation | Urgency clusters, authority claims | Flagged only |
| Homoglyph evasion | Cyrillic/Greek lookalike characters bypassing patterns | Removed |
| Markdown exfiltration | Image URLs with suspicious query params (IP addresses, exfil params) | Flagged only |
| HTML entity evasion | Ignore entity-encoded injection patterns |
Removed |
| Technique | How it works |
|---|---|
| Short-line bias | Prompt injection only strips lines <200 chars - articles about injection are preserved |
| Accessibility preservation | .sr-only/.visually-hidden text kept unless it matches injection patterns |
| Standard meta allowlist | description, og:*, twitter:*, charset meta tags never touched |
| Mixed-script detection | Homoglyph normalisation only on lines with both Latin and Cyrillic/Greek chars |
| Threshold gates | Hidden elements need 20+ chars; hidden inputs need 50+ chars + pattern match |
| Flag vs remove | Semantic manipulation and exfiltration are flagged, never removed |
When --json is used, metadata.agentSafety reports what was detected:
{
"metadata": {
"agentSafety": {
"sanitised": true,
"findings": [
{"category": "content_injection", "severity": "high",
"description": "Hidden text via CSS (2 elements)", "action": "removed", "count": 2},
{"category": "prompt_injection", "severity": "high",
"description": "Prompt injection patterns removed (1 lines)", "action": "removed", "count": 1},
{"category": "semantic_manipulation", "severity": "medium",
"description": "Urgency language clusters (1 lines)", "action": "flagged", "count": 1}
],
"stats": {
"removed": 3,
"flagged": 1,
"byCategory": {
"content_injection": 2,
"prompt_injection": 1,
"semantic_manipulation": 1
}
}
}
}
}New attack vectors are added by writing a function and decorating it:
from flarecrawl.sanitise import register_html, register_text, Finding
@register_html
def sanitise_new_vector(soup):
"""HTML-level sanitiser - mutates soup, returns findings."""
# ... detection and removal logic ...
return [Finding(category="content_injection", severity="high",
description="New vector (N elements)", action="removed", count=n)]
@register_text
def sanitise_new_text_vector(text):
"""Text-level sanitiser - returns (cleaned_text, findings)."""
# ... detection logic ...
return cleaned_text, findingsAdd a corpus fixture to tests/corpus/attacks/ and the parametrised test
suite picks it up automatically.
The tests/corpus/ directory contains 61 fixture files that serve as both
test data and a living catalogue of known attack vectors:
attacks/(45 files) - adversarial fixtures across 12 categoriesbenign/(16 files) - false-positive traps: security articles, responsive CSS, ARIA accessibility, admin UIs, medical urgency, i18n content, code tutorials, system docs, journalism
Parametrised tests validate that every attack file produces findings and every benign file produces zero removals.
| Tier | Browser Time | Concurrent Browsers |
|---|---|---|
| Free | 10 min/day | Up to 120 (default) |
| Paid | 10 hr/month included, then $0.09/hr | Up to 120 (default), higher on request |
Browser time is shared between REST API calls and Workers bindings. Track your usage with flarecrawl usage.
flarecrawl/
├── pyproject.toml # Package config
├── AGENTS.md # AI agent context
├── README.md # This file
├── src/flarecrawl/
│ ├── __init__.py # Version
│ ├── authcrawl.py # Authenticated BFS crawler (session cookie propagation)
│ ├── batch.py # Batch processing (parse + parallel workers)
│ ├── cache.py # File-based response cache
│ ├── cli.py # Typer CLI (all commands)
│ ├── cdp.py # CDP WebSocket client (persistent sessions, DevTools Protocol)
│ ├── client.py # CF Browser Run REST API client (httpx pooling, HTTP/2)
│ ├── config.py # Credentials, usage tracking, env-var config, session storage
│ ├── cookies.py # Cookie loading (Puppeteer/Netscape/Chrome), conversion, validation
│ ├── design.py # Design system extraction (tokens, coherence scoring, formatting)
│ ├── extract.py # HTML extraction (main content, images, schema, tags)
│ ├── fetch.py # Content-type aware download (binary, JSON, HTML)
│ ├── negotiate.py # Markdown content negotiation (Accept: text/markdown)
│ ├── openapi.py # OpenAPI/Swagger spec discovery and validation
│ ├── paywall.py # Paywall bypass cascade (SSR, Referer, Wayback, Jina)
│ ├── rules.py # Per-site YAML rulesets (load, merge, cache)
│ ├── search.py # Web search via Jina Search API
│ ├── sanitise.py # Agent-safety sanitisation (hidden text, injection, manipulation)
│ ├── stealth.py # Stealth HTTP (curl_cffi TLS impersonation)
│ └── default_rules.yaml # Shipped per-site header rules
└── tests/
├── conftest.py # Test fixtures
├── corpus.py # Feature test corpus (80 live tests x 8 sites)
├── corpus/ # Agent-safety attack/benign fixture corpus (48 files)
│ ├── attacks/ # Adversarial fixtures (35 files, 6 categories)
│ └── benign/ # False-positive traps (13 files)
├── test_batch.py # Batch module tests
├── test_cache.py # Cache module tests
├── test_cli.py # CLI tests
├── test_client.py # Client tests
├── test_design.py # Design extraction tests
├── test_extract.py # Extract module tests
├── test_paywall.py # Paywall bypass tests
├── test_rules.py # Per-site rules tests
├── test_cdp.py # CDP client tests (69 tests)
├── test_sanitise.py # Agent-safety tests (137 tests + corpus validation)
└── test_search.py # Search module tests
# Install dev dependencies
uv tool install --editable . --with pytest --with ruff
# Install CDP support (optional — needed for --cdp, interact, webmcp)
uv pip install websockets
# Run unit tests (723 tests, no network)
pytest tests/ -v
# Run live tests against public sites (needs CF auth)
PYTHONPATH=src pytest tests/live/ -v -m live
# Run CDP live tests (needs CF auth + websockets)
PYTHONPATH=src pytest tests/live/ -v -m cdp
# Lint
ruff check src/
# Reinstall after changes
uv tool install --editable .This tool follows the Forma Protocol.
MIT