The Kryfto REST API (v3.8.0) provides programmable access to the headless browser fleet, extraction engine, federated search, domain crawling, recipe management, and admin dashboard.
http://localhost:8080/v1
All endpoints (except health checks, metrics, and /docs) require a Bearer token in the Authorization header:
Authorization: Bearer <your_api_token>
Authentication is enforced via a Fastify preHandler hook that runs before every request:
- Public bypass — Requests to
/v1/healthz,/v1/readyz,/v1/metrics,/docs, and/docs/openapi.yamlskip auth entirely. - Token resolution — The
Authorization: Bearer <token>header is parsed. The raw token is SHA-256 hashed and looked up in theapi_tokensPostgres table. Only non-revoked tokens (revoked_at IS NULL) are accepted. - Auth context — A matched token yields an
AuthContextcontainingtokenId,projectId,role, andtokenHash. This is attached torequest.authfor downstream route handlers. - Artifact download fallback — Requests to
/v1/artifacts/:idwith a validdownloadTokenquery param are allowed through without a Bearer token (for embedding in<img>tags, etc.). - Unauthorized — If none of the above match, the request gets a
401 AUTH_UNAUTHORIZEDresponse.
Tokens are scoped to one of three roles. Each route calls requireRole(req.auth, [...allowedRoles]) which returns the auth context or throws AUTH_FORBIDDEN:
| Role | Description |
|---|---|
admin |
Full access. Can create tokens, manage recipes, and perform all operations. |
developer |
Can create/cancel jobs, run searches, start crawls, extract data, and validate recipes. Cannot create tokens or upload recipes. |
readonly |
Can read job status, stream logs, list artifacts, download artifacts, run searches, list recipes, and view crawl status. Cannot create or mutate resources. |
| Endpoint | admin | developer | readonly |
|---|---|---|---|
POST /v1/admin/tokens |
x | ||
GET /v1/admin/tokens |
x | ||
GET /v1/admin/tokens/:tokenId |
x | ||
DELETE /v1/admin/tokens/:tokenId |
x | ||
PATCH /v1/admin/tokens/:tokenId |
x | ||
POST /v1/admin/tokens/:tokenId/rotate |
x | ||
GET /v1/admin/projects |
x | ||
POST /v1/admin/projects |
x | ||
GET /v1/admin/stats |
x | ||
GET /v1/admin/jobs |
x | ||
GET /v1/admin/crawls |
x | ||
GET /v1/admin/audit-logs |
x | ||
GET /v1/admin/rate-limits |
x | ||
PUT /v1/admin/rate-limits |
x | ||
POST /v1/jobs |
x | x | |
GET /v1/jobs/:id |
x | x | x |
POST /v1/jobs/:id/cancel |
x | x | |
GET /v1/jobs/:id/logs |
x | x | x |
GET /v1/jobs/:id/artifacts |
x | x | x |
GET /v1/artifacts/:id |
x | x | x |
POST /v1/extract |
x | x | |
POST /v1/search |
x | x | x |
POST /v1/crawl |
x | x | |
GET /v1/crawl/:id |
x | x | x |
GET /v1/recipes |
x | x | x |
POST /v1/recipes |
x | ||
POST /v1/recipes/validate |
x | x |
Every token belongs to a project (via projectId). Jobs, crawls, artifacts, and recipes are project-scoped — a token can only access resources within its own project. Cross-project access is rejected with 403 AUTH_FORBIDDEN.
- Tokens are generated as 24 random bytes encoded as hex (48 chars).
- Only the SHA-256 hash is stored in the database — the raw token is returned once at creation and never stored or logged.
- Log output redacts
req.headers.authorization,*.token,*.secret,*.password, and*.apiKeyfields. - Revoked tokens (
revoked_at IS NOT NULL) are excluded from lookups.
On startup, the API inserts a bootstrap admin token for the default project if one doesn't already exist. Set via:
KRYFTO_BOOTSTRAP_ADMIN_TOKEN=<your-initial-token>
# or
KRYFTO_API_TOKEN=<your-initial-token>
This bootstrap token gets admin role on the default project and is idempotent — re-running the API won't duplicate it.
Tokens can have an optional expiresAt timestamp. Expired tokens are rejected during authentication — the resolveAuth() function checks expiresAt against the current time before granting access.
Requests are rate-limited per token_hash:ip pair with per-role defaults:
| Role | Default RPM |
|---|---|
admin |
500 |
developer |
120 |
readonly |
60 |
Per-role limits are stored in the rate_limit_config database table and can be updated via the Admin Rate Limits API. The global fallback is also configurable via:
KRYFTO_RATE_LIMIT_RPM=120 # fallback requests per minute (default: 120)
All mutating operations are recorded in the audit_logs table with:
| Field | Description |
|---|---|
project_id |
Scoped project |
token_id |
Token that performed the action |
actor_role |
Role at time of action |
action |
e.g., job.create, admin.token.create, artifact.download |
resource_type |
e.g., job, token, artifact, recipe, crawl |
resource_id |
UUID of affected resource |
request_id |
Correlation ID from x-request-id header |
ip_address |
Client IP |
details |
JSON object with action-specific metadata |
All errors follow a consistent shape:
{
"error": {
"code": "NOT_FOUND",
"message": "Job not found",
"requestId": "6301784a-5e1b-42f4-bb2c-62a707da8c7d"
}
}
Every response includes an x-request-id header for tracing.
GET /v1/healthz
Returns 200 if the API process is alive.
{ "ok": true, "service": "collector-api" }
GET /v1/readyz
Checks database and Redis connectivity. Returns 200 when ready, 503 otherwise.
{ "ok": true }
GET /v1/metrics
Returns metrics in OpenMetrics format (application/openmetrics-text).
GET /docs/openapi.yaml
Returns the raw OpenAPI 3.1 YAML specification.
Required role:
admin
POST /v1/admin/tokens
Request Body:
| Field | Type | Required | Description |
|---|---|---|---|
name |
string | Yes | Token display name (1–255 chars) |
role |
string | Yes | admin, developer, or readonly |
projectId |
string | Yes | Project to scope the token to (1–255 chars) |
{
"name": "ci-pipeline",
"role": "developer",
"projectId": "default"
}
Response (201):
{
"token": "kryfto_abc123...",
"tokenId": "550e8400-e29b-41d4-a716-446655440000"
}
Store the
tokenvalue immediately — it is not retrievable after creation.
GET /v1/admin/tokens
Returns all tokens (without raw values) for the authenticated project.
GET /v1/admin/tokens/:tokenId
Returns token metadata (name, role, project, creation date, expiration, revocation status).
DELETE /v1/admin/tokens/:tokenId
Sets revoked_at on the token, immediately invalidating it.
PATCH /v1/admin/tokens/:tokenId
Request Body:
| Field | Type | Required | Description |
|---|---|---|---|
name |
string | No | New display name |
role |
string | No | New role (admin, developer, readonly) |
expiresAt |
string (ISO 8601) | No | Token expiration timestamp |
POST /v1/admin/tokens/:tokenId/rotate
Revokes the existing token and generates a new one with the same name, role, and project.
Response (200):
{
"token": "kryfto_new_token...",
"tokenId": "new-uuid"
}
Required role:
admin
GET /v1/admin/projects
POST /v1/admin/projects
Request Body:
| Field | Type | Required | Description |
|---|---|---|---|
name |
string | Yes | Project name |
Required role:
admin
GET /v1/admin/stats
Returns aggregate counts for jobs, tokens, artifacts, and crawls.
GET /v1/admin/jobs
Query Parameters:
| Param | Type | Default | Description |
|---|---|---|---|
page |
integer | 1 |
Page number |
limit |
integer | 20 |
Results per page |
state |
string | — | Filter by job state |
GET /v1/admin/crawls
Query Parameters:
| Param | Type | Default | Description |
|---|---|---|---|
page |
integer | 1 |
Page number |
limit |
integer | 20 |
Results per page |
GET /v1/admin/audit-logs
Query Parameters:
| Param | Type | Default | Description |
|---|---|---|---|
page |
integer | 1 |
Page number |
limit |
integer | 50 |
Results per page |
action |
string | — | Filter by action type |
GET /v1/admin/rate-limits
Returns the per-role RPM configuration.
PUT /v1/admin/rate-limits
Request Body:
{
"limits": [
{ "role": "admin", "rpm": 500 },
{ "role": "developer", "rpm": 120 },
{ "role": "readonly", "rpm": 60 }
]
}
Required role:
adminordeveloper
POST /v1/jobs
Creates a background task to navigate to a URL, execute browser steps, and extract data.
Headers:
| Header | Required | Description |
|---|---|---|
Idempotency-Key |
No | UUID to prevent duplicate job creation. Returns 409 if the same key is reused with a different payload. |
Request Body:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
url |
string (URL) | Yes | — | Target URL to collect |
recipeId |
string | No | — | Apply a pre-defined recipe |
options |
object | No | {} |
See Job Options |
steps |
array | No | — | Browser automation steps (max 500). See Step Types |
extract |
object | No | — | Extraction config. See Extraction Config |
privacy_mode |
string | No | "normal" |
"normal" or "zero_trace" (bypasses database persistence) |
freshness_mode |
string | No | "preferred" |
"always", "preferred", "fallback", or "never" |
Example:
{
"url": "https://example.com",
"options": {
"browserEngine": "chromium",
"requiresBrowser": true,
"timeoutMs": 30000
},
"steps": [
{ "type": "waitForNetworkIdle", "args": { "timeoutMs": 15000 } }
],
"extract": {
"mode": "selectors",
"selectors": {
"title": "title",
"main_heading": "h1"
}
}
}
Response (202):
{
"jobId": "e10c6c92-d85a-40fa-be36-ee240f687927",
"state": "queued",
"requestId": "6301784a-5e1b-42f4-bb2c-62a707da8c7d",
"idempotencyKey": "demo-example-1"
}
Required role:
admin,developer, orreadonly
GET /v1/jobs/:jobId
Response:
{
"id": "e10c6c92-d85a-40fa-be36-ee240f687927",
"projectId": "default",
"state": "succeeded",
"url": "https://example.com",
"requestId": "6301784a-...",
"attempts": 1,
"maxAttempts": 3,
"createdAt": "2026-03-08T10:00:00.000Z",
"updatedAt": "2026-03-08T10:00:05.000Z",
"resultSummary": {
"title": "Example Domain",
"main_heading": "Example Domain"
}
}
Job states: queued, running, succeeded, failed, cancelled, expired.
Required role:
adminordeveloper
POST /v1/jobs/:jobId/cancel
Cancels a queued or running job.
Response (202):
{
"jobId": "e10c6c92-...",
"state": "cancelled"
}
Required role:
admin,developer, orreadonly
GET /v1/jobs/:jobId/logs
Returns a real-time Server-Sent Events stream of log entries. The connection polls every 1 second for new log lines.
Event data shape:
{
"id": 42,
"level": "info",
"message": "Navigating to https://example.com",
"meta": {},
"createdAt": "2026-03-08T10:00:01.000Z"
}
Required role:
admin,developer, orreadonly
GET /v1/jobs/:jobId/artifacts
Response:
{
"items": [
{
"id": "a92b3c4d-...",
"jobId": "e10c6c92-...",
"projectId": "default",
"type": "screenshot",
"fileName": "page.png",
"byteSize": 245760,
"sha256": "abc123...",
"contentType": "image/png",
"createdAt": "2026-03-08T10:00:05.000Z",
"downloadToken": "temp-uuid-token",
"downloadTokenExpiresAt": "2026-03-08T10:05:05.000Z",
"signedUrl": "https://minio.example.com/..."
}
]
}
Required role:
admin,developer,readonly— OR use adownloadToken
GET /v1/artifacts/:artifactId
Query Parameters:
| Param | Required | Description |
|---|---|---|
downloadToken |
No | Time-limited token (useful for <img> tags or unauthenticated contexts) |
Returns the binary file with appropriate Content-Type and Content-Disposition headers.
GET /v1/artifacts/a92b3c4d-...?downloadToken=temp-uuid-token
Required role:
adminordeveloper
POST /v1/extract
Extracts structured data from HTML, text, or a previously-stored artifact using CSS selectors, JSON schemas, or custom plugins.
Request Body:
| Field | Type | Required | Description |
|---|---|---|---|
mode |
string | Yes | "selectors", "schema", or "plugin" |
html |
string | One of | Raw HTML content |
text |
string | One of | Plain text content |
artifactId |
string | One of | ID of a stored artifact |
selectors |
object | If mode=selectors | Map of name → CSS selector |
jsonSchema |
object | If mode=schema | JSON Schema to extract against |
plugin |
string | If mode=plugin | Plugin path or name |
Exactly one of
html,text, orartifactIdmust be provided.
Example (selectors):
{
"mode": "selectors",
"html": "<html><head><title>Test</title></head><body><h1>Hello</h1></body></html>",
"selectors": {
"title": "title",
"heading": "h1"
}
}
Response:
{
"data": {
"title": "Test",
"heading": "Hello"
},
"mode": "selectors"
}
Required role:
admin,developer, orreadonly
POST /v1/search
Executes a live search query across native search engine HTML interfaces with stealth headers and domain-authority boosting. Google searches use a Playwright browser with full anti-bot evasion; other engines use optimized HTTP scraping.
Request Body:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
query |
string | Yes | — | Search query (1–512 chars) |
limit |
integer | No | 10 |
Results to return (1–20) |
engine |
string | No | "duckduckgo" |
"duckduckgo", "bing", "yahoo", "google", or "brave" |
safeSearch |
string | No | "moderate" |
"strict", "moderate", or "off" |
locale |
string | No | "us-en" |
Locale code (2–16 chars) |
topic |
string | No | "general" |
"general", "news", or "finance" |
include_images |
boolean | No | false |
Include image results |
include_image_descriptions |
boolean | No | false |
Include image alt text |
privacy_mode |
string | No | "normal" |
"normal" or "zero_trace" |
freshness_mode |
string | No | "preferred" |
"always", "preferred", "fallback", "never" |
location |
string | No | — | Granular geolocation (e.g., "us-ny") |
proxy_profile |
string | No | — | Proxy rotation profile |
country |
string | No | — | Country code |
session_affinity |
boolean | No | — | Reuse browser session |
rotation_strategy |
string | No | — | "per_request", "sticky", or "random" |
Example:
{
"query": "playwright browser automation",
"limit": 5,
"engine": "duckduckgo",
"safeSearch": "moderate",
"topic": "general"
}
Response:
{
"query": "playwright browser automation",
"limit": 5,
"engine": "duckduckgo",
"safeSearch": "moderate",
"locale": "us-en",
"results": [
{
"title": "Fast and reliable end-to-end testing for modern web apps",
"url": "https://playwright.dev/",
"snippet": "Playwright enables reliable end-to-end testing for modern web apps.",
"rank": 1
}
],
"requestId": "6301784a-..."
}
Required role:
adminordeveloper
POST /v1/crawl
Initiates a site-wide crawl starting from a seed URL.
Request Body:
| Field | Type | Required | Description |
|---|---|---|---|
seed |
string (URL) | Yes | Starting URL |
rules |
object | No | See Crawl Rules |
recipeId |
string | No | Recipe to apply to each crawled page |
extract |
object | No | Extraction config. See Extraction Config |
Crawl Rules:
| Field | Type | Default | Description |
|---|---|---|---|
allowPatterns |
string[] | [] |
URL patterns to include |
denyPatterns |
string[] | [] |
URL patterns to exclude |
maxDepth |
integer | 1 |
Maximum link-follow depth (0–5) |
maxPages |
integer | 20 |
Maximum pages to crawl (1–500) |
sameDomainOnly |
boolean | true |
Restrict to same domain |
politenessDelayMs |
integer | 500 |
Delay between requests in ms (0–30000) |
Example:
{
"seed": "https://docs.example.com",
"rules": {
"maxDepth": 2,
"maxPages": 50,
"sameDomainOnly": true,
"politenessDelayMs": 1000
}
}
Response (202):
{
"crawlId": "c8f2b...",
"state": "queued",
"requestId": "6301784a-..."
}
Required role:
admin,developer, orreadonly
GET /v1/crawl/:crawlId
Response:
{
"id": "c8f2b...",
"projectId": "default",
"state": "running",
"seed": "https://docs.example.com",
"stats": {
"queued": 12,
"running": 3,
"succeeded": 35,
"failed": 0
},
"createdAt": "2026-03-08T10:00:00.000Z",
"updatedAt": "2026-03-08T10:02:15.000Z"
}
Crawl states: queued, running, succeeded, failed, cancelled.
Recipes are reusable extraction templates that auto-match URLs by pattern.
Required role:
admin,developer, orreadonly
GET /v1/recipes
Returns both built-in and custom recipes.
Response:
{
"items": [
{
"id": "hackernews",
"name": "Hacker News Front Page",
"version": "1.0.0",
"description": "Extracts top stories from HN",
"match": { "patterns": ["*://news.ycombinator.com/*"] },
"requiresBrowser": false,
"extraction": {
"mode": "selectors",
"selectors": { "stories": ".titleline > a" }
}
}
]
}
Required role:
admin
POST /v1/recipes
Request Body:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
id |
string | Yes | — | Unique recipe ID (1–128 chars) |
name |
string | Yes | — | Display name (1–255 chars) |
version |
string | Yes | — | Version string (1–64 chars) |
description |
string | No | — | Human-readable description |
match |
object | Yes | — | { "patterns": ["glob1", "glob2"] } (min 1 pattern) |
requiresBrowser |
boolean | No | false |
Whether extraction needs Playwright |
steps |
array | No | — | Browser steps (max 500) |
extraction |
object | No | — | Extraction config |
throttling |
object | No | — | { "minDelayMs": number, "concurrencyHint": number } |
pluginPath |
string | No | — | Custom plugin module path |
Response (201):
{ "id": "hackernews" }
Required role:
adminordeveloper
POST /v1/recipes/validate
Validates a recipe schema without persisting it.
Request Body: Same as Create Recipe (can optionally be wrapped in { "recipe": { ... } }).
Response:
{
"valid": true,
"recipe": { ... }
}
If invalid:
{
"valid": false,
"issues": [
{
"code": "invalid_type",
"path": ["match", "patterns"],
"message": "Expected array, received undefined"
}
]
}
| Field | Type | Default | Description |
|---|---|---|---|
requiresBrowser |
boolean | — | Force browser-based extraction |
browserEngine |
string | "chromium" |
"chromium", "firefox", or "webkit" |
respectRobotsTxt |
boolean | true |
Honor robots.txt directives |
timeoutMs |
integer | 60000 |
Job timeout in ms (1000–300000) |
interactiveLogin |
boolean | false |
Enable interactive login flow |
proxy_profile |
string | — | Proxy rotation profile name |
country |
string | — | Geolocation country code |
session_affinity |
boolean | — | Reuse browser session for domain |
rotation_strategy |
string | — | "per_request", "sticky", or "random" |
Browser automation steps are defined as { "type": "<name>", "args": { ... } }.
| Step Type | Args | Description |
|---|---|---|
goto |
{ url: string } |
Navigate to URL |
setHeaders |
{ headers: { [key]: string } } |
Set custom request headers |
setCookies |
{ cookies: CookieInput[] } |
Inject cookies into browser context |
exportCookies |
{} |
Export current cookies as artifact |
waitForSelector |
{ selector: string, timeoutMs?: number } |
Wait for CSS selector to appear |
click |
{ selector: string } |
Click an element |
type |
{ selector: string, text: string, secret?: bool } |
Type text into an input |
scroll |
{ direction: "up"|"down", amount: number } |
Scroll the page |
wait |
{ ms: number } |
Wait a fixed duration |
waitForNetworkIdle |
{ timeoutMs?: number } |
Wait for network activity to settle |
paginate |
{ nextSelector: string, maxPages?: number (1–100, default 10), stopCondition?: string } |
Auto-paginate through pages |
screenshot |
{ name: string } |
Capture a screenshot artifact |
extract |
ExtractionConfig | Run extraction mid-flow |
{
"name": "session_id",
"value": "abc123",
"domain": ".example.com",
"path": "/",
"expires": 1735689600,
"httpOnly": true,
"secure": true,
"sameSite": "Lax"
}
Only name and value are required; all other fields are optional.
| Field | Type | Required | Description |
|---|---|---|---|
mode |
string | Yes | "selectors", "schema", or "plugin" |
selectors |
object | If mode=selectors | Map of { name: "css-selector" } |
jsonSchema |
object | If mode=schema | JSON Schema to extract structured data |
plugin |
string | If mode=plugin | Plugin module path |
| Variable | Default | Description |
|---|---|---|
KRYFTO_PORT |
8080 |
API server port |
KRYFTO_LOG_LEVEL |
"info" |
Log verbosity |
KRYFTO_API_TOKEN |
— | Bootstrap API token |
KRYFTO_PROJECT_ID |
"default" |
Default project ID |
KRYFTO_JOB_MAX_ATTEMPTS |
3 |
Max job retry attempts |
KRYFTO_SSRF_BLOCK_PRIVATE_RANGES |
"true" |
Block SSRF to private IPs |
KRYFTO_ALLOWED_HOSTS |
— | Comma-separated allowlist |
KRYFTO_STEALTH_MODE |
— | Enable stealth headers |
KRYFTO_ROTATE_USER_AGENT |
— | Rotate UA per request |
KRYFTO_PROXY_URLS |
— | Comma-separated proxy list |
KRYFTO_HUMANIZE |
"true" |
Humanized mouse/keyboard |
KRYFTO_BROWSER_POOL |
"true" |
Per-domain session reuse |
REDIS_HOST |
"redis" |
Redis hostname |
REDIS_PORT |
6379 |
Redis port |
| Method | Path | Auth Role | Description |
|---|---|---|---|
| GET | /v1/healthz |
Public | Health check |
| GET | /v1/readyz |
Public | Readiness probe |
| GET | /v1/metrics |
Public | Prometheus metrics |
| GET | /docs/openapi.yaml |
Public | OpenAPI spec |
| POST | /v1/admin/tokens |
admin | Create API token |
| GET | /v1/admin/tokens |
admin | List all tokens |
| GET | /v1/admin/tokens/:tokenId |
admin | Get token details |
| DELETE | /v1/admin/tokens/:tokenId |
admin | Revoke token |
| PATCH | /v1/admin/tokens/:tokenId |
admin | Update token |
| POST | /v1/admin/tokens/:tokenId/rotate |
admin | Rotate token |
| GET | /v1/admin/projects |
admin | List projects |
| POST | /v1/admin/projects |
admin | Create project |
| GET | /v1/admin/stats |
admin | Dashboard stats |
| GET | /v1/admin/jobs |
admin | List all jobs (paginated) |
| GET | /v1/admin/crawls |
admin | List all crawls (paginated) |
| GET | /v1/admin/audit-logs |
admin | Query audit logs |
| GET | /v1/admin/rate-limits |
admin | Get rate limit config |
| PUT | /v1/admin/rate-limits |
admin | Update rate limits |
| POST | /v1/jobs |
admin, developer | Create job |
| GET | /v1/jobs/:jobId |
admin, developer, readonly | Get job status |
| POST | /v1/jobs/:jobId/cancel |
admin, developer | Cancel job |
| GET | /v1/jobs/:jobId/logs |
admin, developer, readonly | Stream logs (SSE) |
| GET | /v1/jobs/:jobId/artifacts |
admin, developer, readonly | List job artifacts |
| GET | /v1/artifacts/:artifactId |
admin, developer, readonly (or downloadToken) | Download artifact |
| POST | /v1/extract |
admin, developer | Extract from content |
| POST | /v1/search |
admin, developer, readonly | Federated search |
| POST | /v1/crawl |
admin, developer | Start crawl |
| GET | /v1/crawl/:crawlId |
admin, developer, readonly | Get crawl status |
| GET | /v1/recipes |
admin, developer, readonly | List recipes |
| POST | /v1/recipes |
admin | Create/update recipe |
| POST | /v1/recipes/validate |
admin, developer | Validate recipe |