Configuration Guide

CiberWebScan uses a flexible configuration system that allows customization of various aspects of the application behavior.

Configuration Sources
- Environment Variable Limitations
Configuration File
Configuration Sections
- HTTP Client
- Scraping
- Analysis
- Attack
- Export
- Cache
CLI Commands
Validation & Troubleshooting
Development Roadmap

Configuration Sources

Configuration values are loaded from multiple sources in order of precedence:

Environment variables
User configuration file (~/.ciberwebscan/config.yaml)
Default values (lowest priority)

Environment variable overrides (prefix & mapping)

Environment overrides use the prefix CIBERWEBSCAN_ by default (see ConfigLoader.env_prefix).
After the prefix the name is lowercased and underscores are converted to dots to form the config key. Example:
- CIBERWEBSCAN_HTTP_TIMEOUT_CONNECT -> http.timeout.connect
Parsing rules used by ConfigLoader._load_env (src/ciberwebscan/config/loader.py):
- Booleans: true|yes|1 → true, false|no|0 → false
- Numbers: values containing . → float, otherwise int
- Lists: comma-separated strings → parsed as arrays
Examples:
- CIBERWEBSCAN_HTTP_TIMEOUT_CONNECT=15 → http.timeout.connect: 15
- CIBERWEBSCAN_SCRAPING_DYNAMIC_HEADLESS=false → scraping.dynamic.headless: false
- CIBERWEBSCAN_USER_AGENT_AGENTS="a,b" → user_agent.agents: ["a","b"]

See implementation: ConfigLoader._load_env (src/ciberwebscan/config/loader.py).

Environment Variable Limitations

Our current ConfigLoader maps every underscore (_) in the environment variable name to a dot (.) when building the config path. That works for many simple keys (for example CIBERWEBSCAN_HTTP_TIMEOUT_CONNECT → http.timeout.connect), but it prevents overriding model fields that themselves contain underscores (for example user_agent, rate_limit, include_screenshots).

What this means in practice:

Supported via CIBERWEBSCAN_ envs (examples):
- CIBERWEBSCAN_HTTP_TIMEOUT_CONNECT → http.timeout.connect
- CIBERWEBSCAN_HTTP_TIMEOUT_READ → http.timeout.read
- CIBERWEBSCAN_HTTP_PROXY_ROTATE → http.proxy.rotate
- CIBERWEBSCAN_SCRAPING_DYNAMIC_ENABLED → scraping.dynamic.enabled
- CIBERWEBSCAN_SCRAPING_DYNAMIC_HEADLESS → scraping.dynamic.headless
- CIBERWEBSCAN_ATTACK_ENABLED → attack.enabled
- CIBERWEBSCAN_ATTACK_XSS → attack.xss
- CIBERWEBSCAN_CACHE_ENABLED → cache.enabled
- NVD_API_KEY, VULNERS_API_KEY (read directly by CVE clients)
NOT supported via CIBERWEBSCAN_ envs (must use config.yaml or change loader):
- CIBERWEBSCAN_USER_AGENT_AGENTS / CIBERWEBSCAN_USER_AGENT_MODE → user_agent.*
- CIBERWEBSCAN_HTTP_RATE_LIMIT_REQUESTS_PER_SECOND → http.rate_limit.requests_per_second
- CIBERWEBSCAN_EXPORT_INCLUDE_SCREENSHOTS → export.include_screenshots
- CIBERWEBSCAN_ANALYSIS_CVE_NVD_API_KEY → analysis.cve.nvd_api_key
- CIBERWEBSCAN_ATTACK_USER_CONSENT → attack.user_consent

Recommendation: for complex/underscore-containing fields, set them in ~/.ciberwebscan/config.yaml. If you prefer env-based overrides for those fields, we can update ConfigLoader to support a double-underscore convention (e.g. CIBERWEBSCAN_HTTP__RATE_LIMIT__REQUESTS_PER_SECOND) — tell us if you want that behavior added.

Note: Command-line options are specific to individual commands and do not override global configuration. They are used to customize behavior for that particular command execution.

Configuration File

The configuration file is automatically created in your user directory when you first run CiberWebScan. You can also create it manually.

Location

Linux/macOS: ~/.ciberwebscan/config.yaml
Windows: %USERPROFILE%\.ciberwebscan\config.yaml

Format

Configuration files can be in JSON or YAML format.

JSON Example:

{
  "http": {
    "timeout": {
      "connect": 15.0,
      "read": 45.0
    },
    "rate_limit": {
      "requests_per_second": 3.0
    }
  },
  "scraping": {
    "dynamic": {
      "enabled": true,
      "headless": false
    }
  }
}

YAML Example:

http:
  timeout:
    connect: 15.0
    read: 45.0
  rate_limit:
    requests_per_second: 3.0

scraping:
  dynamic:
    headless: false

Configuration Sections

HTTP Client

Configure HTTP request behavior.

{
  "http": {
    "timeout": {
      "connect": 10.0,
      "read": 30.0,
      "write": 30.0,
      "pool": 10.0
    },
    "retry": {
      "max_attempts": 3,
      "backoff_factor": 0.5,
      "retryable_status_codes": [429, 500, 502, 503, 504]
    },
    "rate_limit": {
      "requests_per_second": 5.0,
      "per_domain": true
    },
    "proxy": {
      "http": null,
      "https": null,
      "socks5": null,
      "rotate": false,
      "rotation_interval": 10,
      "proxy_list": null
    },
    "http2": true,
    "follow_redirects": true,
    "max_redirects": 10,
    "verify_ssl": true
  }
}

Default values (quick reference)

Key	Default	Description
`http.timeout.connect`	`10.0`	Connection timeout (seconds)
`http.timeout.read`	`30.0`	Read timeout (seconds)
`http.timeout.write`	`30.0`	Write timeout (seconds)
`http.timeout.pool`	`10.0`	Connection pool timeout (seconds)
`http.retry.max_attempts`	`3`	Retry attempts
`http.retry.backoff_factor`	`0.5`	Exponential backoff factor
`http.rate_limit.requests_per_second`	`5.0`	Requests per second
`http.rate_limit.per_domain`	`true`	Rate limit per domain
`http.proxy.rotate`	`false`	Proxy rotation disabled by default
`http.proxy.rotation_interval`	`10`	Requests per proxy when rotating
`http.http2`	`true`	Enable HTTP/2 by default
`http.follow_redirects`	`true`	Follow redirects
`http.max_redirects`	`10`	Max redirects to follow
`http.verify_ssl`	`true`	Verify TLS certificates

Proxy Rotation

When rotate is true, CiberWebScan cycles through available proxies using a round-robin strategy. The proxy changes every rotation_interval requests. Proxies can be supplied through proxy_list (recommended) or will be collected from the individual http, https, and socks5 fields.

proxy_list accepts either:

A JSON array of proxy URLs: ["http://p1:8080", "http://p2:8080"]
A comma/newline-separated string: "http://p1:8080, http://p2:8080"

Field	Type	Default	Description
`http`	string \| null	null	Single HTTP proxy URL
`https`	string \| null	null	Single HTTPS proxy URL
`socks5`	string \| null	null	Single SOCKS5 proxy URL
`rotate`	bool	false	Enable proxy rotation
`rotation_interval`	int (≥ 1)	10	Number of requests before switching proxy
`proxy_list`	list/string/null	null	List of proxy URLs for rotation

Example with rotation enabled:

{
  "http": {
    "proxy": {
      "rotate": true,
      "rotation_interval": 5,
      "proxy_list": [
        "http://proxy1.example.com:8080",
        "http://proxy2.example.com:8080",
        "socks5://proxy3.example.com:1080"
      ]
    }
  }
}

User Agent

Configure user agent rotation.

{
  "user_agent": {
    "mode": "rotate",
    "custom": null,
    "rotate_interval": 10,
    "agents": [
      "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
      "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0"
    ]
  }
}

Default values (quick reference)

Key	Default	Description
`user_agent.mode`	`rotate`	Default rotation mode
`user_agent.custom`	`null`	No custom UA by default
`user_agent.rotate_interval`	`10`	Requests before rotating UA
`user_agent.agents`	`default list (6 agents)`	Default UA list used for rotation

Scraping

Configure web scraping behavior.

{
  "scraping": {
    "dynamic": {
      "wait_timeout": 10.0,
      "wait_selector": null,
      "headless": true,
      "browser_type": "chromium"
    },
    "pagination": {
      "max_pages": 10,
      "next_selector": null,
      "page_param": null
    },
    "extract_links": true,
    "extract_images": true,
    "extract_scripts": true,
    "extract_forms": true,
    "max_content_length": 10485760
  }
}

Default values (quick reference)

Key	Default	Description
`scraping.dynamic.enabled`	`false`	Dynamic (browser) scraping disabled by default
`scraping.dynamic.wait_timeout`	`10.0`	Wait timeout for dynamic scraping (s)
`scraping.dynamic.headless`	`true`	Playwright runs headless by default
`scraping.dynamic.browser_type`	`chromium`	Default browser engine
`scraping.pagination.max_pages`	`10`	Max pages to follow in pagination
`scraping.extract_links`	`true`	Extract links by default
`scraping.extract_images`	`true`	Extract images by default
`scraping.extract_scripts`	`true`	Extract scripts by default
`scraping.extract_forms`	`true`	Extract forms by default
`scraping.max_content_length`	`10485760 (10 MB)`	Max response size handled by scrapers (model default)

Implementation status — scraping options

scraping.max_content_length: present in the config model but not enforced consistently across all scrapers (see src/ciberwebscan/core/scraping/static.py and src/ciberwebscan/core/scraping/dynamic.py).

scraping.extract_* flags (extract_links, extract_images, extract_scripts, extract_forms) exist in the model but are only partially applied by some scrapers.

See the Development Notes section for recommended fixes and test coverage.

Analysis

Configure security analysis settings.

{
  "analysis": {
    "ssl": {
      "enabled": true,
      "check_expiry": true,
      "check_chain": true,
      "check_revocation": true,
      "warning_days": 30
    },
    "fingerprint": {
      "enabled": true,
      "check_headers": true,
      "check_cookies": true,
      "check_html": true,
      "check_scripts": true,
      "check_dns": false
    },
    "cve": {
      "enabled": true,
      "api": "all",
      "nvd_api_key": null,
      "vulners_api_key": null,
      "cache_ttl": 86400
    },
    "headers": {
      "enabled": true,
      "required_headers": [
        "Strict-Transport-Security",
        "X-Content-Type-Options",
        "X-Frame-Options",
        "Content-Security-Policy"
      ]
    }
  }
}

Default values (quick reference)

Key	Default	Description
`analysis.ssl.enabled`	`true`	SSL/TLS analysis enabled
`analysis.ssl.warning_days`	`30`	Days before expiry to warn
`analysis.fingerprint.enabled`	`true`	Technology fingerprinting enabled
`analysis.fingerprint.check_dns`	`false`	DNS checks disabled by default
`analysis.cve.api`	`all`	CVE data sources used by default
`analysis.cve.cache_ttl`	`86400`	CVE cache TTL (seconds)
`analysis.headers.required_headers`	default list	Security headers checked by default

Implementation status — analysis.fingerprint.deep_scan

analysis.fingerprint.deep_scan is proposed but not available in the persistent configuration model (FingerprintConfig).

A runtime option deep_scan exists on AnalyzeOptions (see src/ciberwebscan/services/analyze_service.py) and can be passed via CLI, but there is no analysis.fingerprint.deep_scan field to persist that behavior in the config file.

Attack

Configure attack simulation settings.

{
  "attack": {
    "enabled": false,
    "user_consent": false,
    "whitelist": ["127.0.0.1", "localhost"],
    "xss": true,
    "sqli": true,
    "traversal": true,
    "enumeration": true,
    "max_payloads": 50
  }
}

Default values (quick reference)

Key	Default	Description
`attack.enabled`	`false`	Attack simulation disabled by default
`attack.user_consent`	`false`	User consent required to run attacks
`attack.whitelist`	`["127.0.0.1","localhost"]`	Default allowed targets for attack testing
`attack.xss`	`true`	Run XSS checks by default
`attack.sqli`	`true`	Run SQLi checks by default
`attack.traversal`	`true`	Run path traversal checks by default
`attack.enumeration`	`true`	Run enumeration by default
`attack.max_payloads`	`50`	Default max payloads per target

Export

Configure export behavior.

{
  "export": {
    "format": "jsonl",
    "output_dir": "exports",
    "include_raw_html": false,
    "include_screenshots": false,
    "streaming": true,
    "buffer_size": 100,
    "pretty": true
  }
}

Default values (quick reference)

Key	Default	Description
`export.format`	`jsonl`	Default export format
`export.output_dir`	`exports`	Default export directory
`export.include_raw_html`	`false`	Do not include raw HTML by default
`export.include_screenshots`	`false`	Screenshots not included by default (not implemented)
`export.streaming`	`true`	Use streaming exporter by default
`export.buffer_size`	`100`	Export buffer size
`export.pretty`	`true`	Pretty-print JSON by default

Implementation status — include_screenshots

include_screenshots is defined in ExportConfig (src/ciberwebscan/config/models.py) and exposed in API models, but it is not implemented by the export pipeline (unused by BaseService._export_result and exporter classes).

Cache

Configure caching behavior.

{
  "cache": {
    "enabled": true,
    "directory": ".cache",
    "ttl": 3600,
    "max_size_mb": 100
  }
}

Default values (quick reference)

Key	Default	Description
`cache.enabled`	`true`	Caching enabled by default
`cache.directory`	`.cache`	Default cache directory
`cache.ttl`	`3600`	Cache TTL (seconds)
`cache.max_size_mb`	`100`	Max cache size (MB)

Logging

Configure logging behavior.

{
  "logging": {
    "level": "INFO",
    "format": "%(asctime)s - %(name)s - %(levelname)s - %(message)s",
    "file": null,
    "max_size": 10485760,
    "backup_count": 5
  }
}

Default values (quick reference)

Key	Default	Description
`logging.level`	`INFO`	Default log level
`logging.format`	`%(asctime)s - %(name)s - %(levelname)s - %(message)s`	Default log format
`logging.file`	`null`	No log file by default
`logging.max_size`	`10485760`	Max size for rotated log file (bytes)
`logging.backup_count`	`5`	Number of rotated log files to keep

CLI Configuration Commands

View Current Configuration

ciberwebscan config show

View Specific Section

ciberwebscan config show http

Set Configuration Value

ciberwebscan config set http.timeout.connect 15.0

Reset Configuration

ciberwebscan config reset
ciberwebscan config reset http

Get Configuration Value

ciberwebscan config get http.timeout.connect

List Configuration Keys

ciberwebscan config keys
ciberwebscan config keys --section http

Export Configuration

ciberwebscan config export config.yaml  # Exports to YAML (default format)
ciberwebscan config export config.json --format json

Load Configuration

ciberwebscan config load config.yaml

Persistent configuration vs CLI / runtime options

Persistent configuration (config.*) is stored in the user config file (~/.ciberwebscan/config.yaml) and loaded by ConfigLoader at startup (or via get_config()). Environment variables with the CIBERWEBSCAN_ prefix and the config file are merged; environment variables have higher precedence.
CLI/runtime options (for example AttackOptions, AnalyzeOptions) are dataclasses used only for the current execution. CLI flags are converted into these option objects and override behavior for that run but do not modify the persistent configuration file.
When an options field is omitted (or set to None), the service may fall back to the value from get_config() — see AttackOptions.__post_init__ (src/ciberwebscan/services/attack_service.py) and AnalyzeOptions handling (src/ciberwebscan/services/analyze_service.py).

Programmatic Access

You can access configuration in your code:

from ciberwebscan.config.loader import get_config

config = get_config()
timeout = config.http.timeout.connect

Validation

Configuration values are validated by Pydantic when loaded by the ConfigLoader.

Invalid values in the user configuration file are reported as a Pydantic ValidationError. When this happens, ConfigLoader logs the validation error and falls back to the default configuration — the process continues running with defaults (the invalid file is not applied).
CLI configuration commands surface user-friendly error messages and will exit with a non-zero status when an operation fails (for example, ciberwebscan config load will print the validation error and return a non-zero exit code).

Example (logged Pydantic validation error):

ERROR ciberwebscan.config.loader: Invalid configuration: 1 validation error for AppConfig http -> timeout -> connect ensure this value is greater than or equal to 0.1 (type=value_error.number.not_ge; limit_value=0.1)

Example (CLI):

$ ciberwebscan config load bad-config.yaml
Error: Invalid configuration: 1 validation error for AppConfig
http -> timeout -> connect
  ensure this value is greater than or equal to 0.1 (type=value_error.number.not_ge; limit_value=0.1)

Troubleshooting tips:

Run ciberwebscan config show --config <path> to inspect the file the CLI is loading.
Set LOG_LEVEL=DEBUG (or check application logs) to see the full validation details and stack trace.
The Pydantic error includes the dotted path to the offending field and a short explanation — fix that field in your config.yaml and retry.

Migration

When upgrading CiberWebScan, your existing configuration will be preserved. New default values will be used for any missing settings.

Development Notes

[PROPOSED · NOT IMPLEMENTED] analysis.fingerprint.deep_scan: Runtime option deep_scan exists on AnalyzeOptions but there is no persistent analysis.fingerprint.deep_scan field in the config model. If required, add the field to FingerprintConfig and wire it into the fingerprinter initialization in AnalyzeService.
[PARTIAL] scraping.max_content_length: Present in ScrapingConfig but not enforced consistently across scrapers. Suggested action: enforce/max-truncate responses in src/ciberwebscan/core/scraping/static.py and src/ciberwebscan/core/scraping/dynamic.py, add unit + integration tests and document whether responses are rejected or truncated.
[PARTIAL] scraping.extract_* (extract_links, extract_images, extract_scripts, extract_forms): Flags exist in the config model but are only partially applied by some scrapers; implement conditional extraction where applicable and add tests.
[NOT IMPLEMENTED] include_screenshots: Defined in ExportConfig and API models but not implemented by the export pipeline (BaseService._export_result / exporter classes). Implement screenshot capture/storage and wire into exporters if this feature is desired.
[PROPOSED] cache: CacheConfig exists but its practical usage (e.g., CVE caching) is limited in places; add integration points and tests where caching is expected.

FilesExpand file tree

CONFIGURATION.md

Latest commit

History

CONFIGURATION.md

File metadata and controls

Configuration Guide

Table of Contents

Configuration Sources

Environment Variable Limitations

Configuration File

Location

Format

Configuration Sections

HTTP Client

Default values (quick reference)

Proxy Rotation

User Agent

Default values (quick reference)

Scraping

Default values (quick reference)

Analysis

Default values (quick reference)

Attack

Default values (quick reference)

Export

Default values (quick reference)

Cache

Default values (quick reference)

Logging

Default values (quick reference)

CLI Configuration Commands

View Current Configuration

View Specific Section

Set Configuration Value

Reset Configuration

Get Configuration Value

List Configuration Keys

Export Configuration

Load Configuration

Persistent configuration vs CLI / runtime options

Programmatic Access

Validation

Migration

Development Notes