CiberWebScan uses a flexible configuration system that allows customization of various aspects of the application behavior.
- Configuration Sources
- Configuration File
- Configuration Sections
- CLI Commands
- Validation & Troubleshooting
- Development Roadmap
Configuration values are loaded from multiple sources in order of precedence:
- Environment variables
- User configuration file (
~/.ciberwebscan/config.yaml) - Default values (lowest priority)
Environment variable overrides (prefix & mapping)
- Environment overrides use the prefix
CIBERWEBSCAN_by default (seeConfigLoader.env_prefix). - After the prefix the name is lowercased and underscores are converted to dots to form the config key. Example:
CIBERWEBSCAN_HTTP_TIMEOUT_CONNECT->http.timeout.connect
- Parsing rules used by
ConfigLoader._load_env(src/ciberwebscan/config/loader.py):- Booleans:
true|yes|1→ true,false|no|0→ false - Numbers: values containing
.→ float, otherwise int - Lists: comma-separated strings → parsed as arrays
- Booleans:
- Examples:
CIBERWEBSCAN_HTTP_TIMEOUT_CONNECT=15→http.timeout.connect: 15CIBERWEBSCAN_SCRAPING_DYNAMIC_HEADLESS=false→scraping.dynamic.headless: falseCIBERWEBSCAN_USER_AGENT_AGENTS="a,b"→user_agent.agents: ["a","b"]
See implementation: ConfigLoader._load_env (src/ciberwebscan/config/loader.py).
Our current ConfigLoader maps every underscore (_) in the environment variable name to a dot (.) when building the config path. That works for many simple keys (for example CIBERWEBSCAN_HTTP_TIMEOUT_CONNECT → http.timeout.connect), but it prevents overriding model fields that themselves contain underscores (for example user_agent, rate_limit, include_screenshots).
What this means in practice:
-
Supported via
CIBERWEBSCAN_envs (examples):CIBERWEBSCAN_HTTP_TIMEOUT_CONNECT→http.timeout.connectCIBERWEBSCAN_HTTP_TIMEOUT_READ→http.timeout.readCIBERWEBSCAN_HTTP_PROXY_ROTATE→http.proxy.rotateCIBERWEBSCAN_SCRAPING_DYNAMIC_ENABLED→scraping.dynamic.enabledCIBERWEBSCAN_SCRAPING_DYNAMIC_HEADLESS→scraping.dynamic.headlessCIBERWEBSCAN_ATTACK_ENABLED→attack.enabledCIBERWEBSCAN_ATTACK_XSS→attack.xssCIBERWEBSCAN_CACHE_ENABLED→cache.enabledNVD_API_KEY,VULNERS_API_KEY(read directly by CVE clients)
-
NOT supported via
CIBERWEBSCAN_envs (must useconfig.yamlor change loader):CIBERWEBSCAN_USER_AGENT_AGENTS/CIBERWEBSCAN_USER_AGENT_MODE→user_agent.*CIBERWEBSCAN_HTTP_RATE_LIMIT_REQUESTS_PER_SECOND→http.rate_limit.requests_per_secondCIBERWEBSCAN_EXPORT_INCLUDE_SCREENSHOTS→export.include_screenshotsCIBERWEBSCAN_ANALYSIS_CVE_NVD_API_KEY→analysis.cve.nvd_api_keyCIBERWEBSCAN_ATTACK_USER_CONSENT→attack.user_consent
Recommendation: for complex/underscore-containing fields, set them in ~/.ciberwebscan/config.yaml. If you prefer env-based overrides for those fields, we can update ConfigLoader to support a double-underscore convention (e.g. CIBERWEBSCAN_HTTP__RATE_LIMIT__REQUESTS_PER_SECOND) — tell us if you want that behavior added.
Note: Command-line options are specific to individual commands and do not override global configuration. They are used to customize behavior for that particular command execution.
The configuration file is automatically created in your user directory when you first run CiberWebScan. You can also create it manually.
- Linux/macOS:
~/.ciberwebscan/config.yaml - Windows:
%USERPROFILE%\.ciberwebscan\config.yaml
Configuration files can be in JSON or YAML format.
JSON Example:
{
"http": {
"timeout": {
"connect": 15.0,
"read": 45.0
},
"rate_limit": {
"requests_per_second": 3.0
}
},
"scraping": {
"dynamic": {
"enabled": true,
"headless": false
}
}
}YAML Example:
http:
timeout:
connect: 15.0
read: 45.0
rate_limit:
requests_per_second: 3.0
scraping:
dynamic:
headless: falseConfigure HTTP request behavior.
{
"http": {
"timeout": {
"connect": 10.0,
"read": 30.0,
"write": 30.0,
"pool": 10.0
},
"retry": {
"max_attempts": 3,
"backoff_factor": 0.5,
"retryable_status_codes": [429, 500, 502, 503, 504]
},
"rate_limit": {
"requests_per_second": 5.0,
"per_domain": true
},
"proxy": {
"http": null,
"https": null,
"socks5": null,
"rotate": false,
"rotation_interval": 10,
"proxy_list": null
},
"http2": true,
"follow_redirects": true,
"max_redirects": 10,
"verify_ssl": true
}
}| Key | Default | Description |
|---|---|---|
http.timeout.connect |
10.0 |
Connection timeout (seconds) |
http.timeout.read |
30.0 |
Read timeout (seconds) |
http.timeout.write |
30.0 |
Write timeout (seconds) |
http.timeout.pool |
10.0 |
Connection pool timeout (seconds) |
http.retry.max_attempts |
3 |
Retry attempts |
http.retry.backoff_factor |
0.5 |
Exponential backoff factor |
http.rate_limit.requests_per_second |
5.0 |
Requests per second |
http.rate_limit.per_domain |
true |
Rate limit per domain |
http.proxy.rotate |
false |
Proxy rotation disabled by default |
http.proxy.rotation_interval |
10 |
Requests per proxy when rotating |
http.http2 |
true |
Enable HTTP/2 by default |
http.follow_redirects |
true |
Follow redirects |
http.max_redirects |
10 |
Max redirects to follow |
http.verify_ssl |
true |
Verify TLS certificates |
When rotate is true, CiberWebScan cycles through available proxies
using a round-robin strategy. The proxy changes every rotation_interval
requests. Proxies can be supplied through proxy_list (recommended) or
will be collected from the individual http, https, and socks5 fields.
proxy_list accepts either:
- A JSON array of proxy URLs:
["http://p1:8080", "http://p2:8080"] - A comma/newline-separated string:
"http://p1:8080, http://p2:8080"
| Field | Type | Default | Description |
|---|---|---|---|
http |
string | null | null | Single HTTP proxy URL |
https |
string | null | null | Single HTTPS proxy URL |
socks5 |
string | null | null | Single SOCKS5 proxy URL |
rotate |
bool | false | Enable proxy rotation |
rotation_interval |
int (≥ 1) | 10 | Number of requests before switching proxy |
proxy_list |
list/string/null | null | List of proxy URLs for rotation |
Example with rotation enabled:
{
"http": {
"proxy": {
"rotate": true,
"rotation_interval": 5,
"proxy_list": [
"http://proxy1.example.com:8080",
"http://proxy2.example.com:8080",
"socks5://proxy3.example.com:1080"
]
}
}
}Configure user agent rotation.
{
"user_agent": {
"mode": "rotate",
"custom": null,
"rotate_interval": 10,
"agents": [
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0"
]
}
}| Key | Default | Description |
|---|---|---|
user_agent.mode |
rotate |
Default rotation mode |
user_agent.custom |
null |
No custom UA by default |
user_agent.rotate_interval |
10 |
Requests before rotating UA |
user_agent.agents |
default list (6 agents) |
Default UA list used for rotation |
Configure web scraping behavior.
{
"scraping": {
"dynamic": {
"wait_timeout": 10.0,
"wait_selector": null,
"headless": true,
"browser_type": "chromium"
},
"pagination": {
"max_pages": 10,
"next_selector": null,
"page_param": null
},
"extract_links": true,
"extract_images": true,
"extract_scripts": true,
"extract_forms": true,
"max_content_length": 10485760
}
}| Key | Default | Description |
|---|---|---|
scraping.dynamic.enabled |
false |
Dynamic (browser) scraping disabled by default |
scraping.dynamic.wait_timeout |
10.0 |
Wait timeout for dynamic scraping (s) |
scraping.dynamic.headless |
true |
Playwright runs headless by default |
scraping.dynamic.browser_type |
chromium |
Default browser engine |
scraping.pagination.max_pages |
10 |
Max pages to follow in pagination |
scraping.extract_links |
true |
Extract links by default |
scraping.extract_images |
true |
Extract images by default |
scraping.extract_scripts |
true |
Extract scripts by default |
scraping.extract_forms |
true |
Extract forms by default |
scraping.max_content_length |
10485760 (10 MB) |
Max response size handled by scrapers (model default) |
Implementation status — scraping options
scraping.max_content_length: present in the config model but not enforced consistently across all scrapers (seesrc/ciberwebscan/core/scraping/static.pyandsrc/ciberwebscan/core/scraping/dynamic.py).scraping.extract_*flags (extract_links,extract_images,extract_scripts,extract_forms) exist in the model but are only partially applied by some scrapers.See the Development Notes section for recommended fixes and test coverage.
Configure security analysis settings.
{
"analysis": {
"ssl": {
"enabled": true,
"check_expiry": true,
"check_chain": true,
"check_revocation": true,
"warning_days": 30
},
"fingerprint": {
"enabled": true,
"check_headers": true,
"check_cookies": true,
"check_html": true,
"check_scripts": true,
"check_dns": false
},
"cve": {
"enabled": true,
"api": "all",
"nvd_api_key": null,
"vulners_api_key": null,
"cache_ttl": 86400
},
"headers": {
"enabled": true,
"required_headers": [
"Strict-Transport-Security",
"X-Content-Type-Options",
"X-Frame-Options",
"Content-Security-Policy"
]
}
}
}| Key | Default | Description |
|---|---|---|
analysis.ssl.enabled |
true |
SSL/TLS analysis enabled |
analysis.ssl.warning_days |
30 |
Days before expiry to warn |
analysis.fingerprint.enabled |
true |
Technology fingerprinting enabled |
analysis.fingerprint.check_dns |
false |
DNS checks disabled by default |
analysis.cve.api |
all |
CVE data sources used by default |
analysis.cve.cache_ttl |
86400 |
CVE cache TTL (seconds) |
analysis.headers.required_headers |
default list | Security headers checked by default |
Implementation status —
analysis.fingerprint.deep_scan
analysis.fingerprint.deep_scanis proposed but not available in the persistent configuration model (FingerprintConfig).- A runtime option
deep_scanexists onAnalyzeOptions(seesrc/ciberwebscan/services/analyze_service.py) and can be passed via CLI, but there is noanalysis.fingerprint.deep_scanfield to persist that behavior in the config file.
Configure attack simulation settings.
{
"attack": {
"enabled": false,
"user_consent": false,
"whitelist": ["127.0.0.1", "localhost"],
"xss": true,
"sqli": true,
"traversal": true,
"enumeration": true,
"max_payloads": 50
}
}| Key | Default | Description |
|---|---|---|
attack.enabled |
false |
Attack simulation disabled by default |
attack.user_consent |
false |
User consent required to run attacks |
attack.whitelist |
["127.0.0.1","localhost"] |
Default allowed targets for attack testing |
attack.xss |
true |
Run XSS checks by default |
attack.sqli |
true |
Run SQLi checks by default |
attack.traversal |
true |
Run path traversal checks by default |
attack.enumeration |
true |
Run enumeration by default |
attack.max_payloads |
50 |
Default max payloads per target |
Configure export behavior.
{
"export": {
"format": "jsonl",
"output_dir": "exports",
"include_raw_html": false,
"include_screenshots": false,
"streaming": true,
"buffer_size": 100,
"pretty": true
}
}| Key | Default | Description |
|---|---|---|
export.format |
jsonl |
Default export format |
export.output_dir |
exports |
Default export directory |
export.include_raw_html |
false |
Do not include raw HTML by default |
export.include_screenshots |
false |
Screenshots not included by default (not implemented) |
export.streaming |
true |
Use streaming exporter by default |
export.buffer_size |
100 |
Export buffer size |
export.pretty |
true |
Pretty-print JSON by default |
Implementation status —
include_screenshots
include_screenshotsis defined inExportConfig(src/ciberwebscan/config/models.py) and exposed in API models, but it is not implemented by the export pipeline (unused byBaseService._export_resultand exporter classes).
Configure caching behavior.
{
"cache": {
"enabled": true,
"directory": ".cache",
"ttl": 3600,
"max_size_mb": 100
}
}| Key | Default | Description |
|---|---|---|
cache.enabled |
true |
Caching enabled by default |
cache.directory |
.cache |
Default cache directory |
cache.ttl |
3600 |
Cache TTL (seconds) |
cache.max_size_mb |
100 |
Max cache size (MB) |
Configure logging behavior.
{
"logging": {
"level": "INFO",
"format": "%(asctime)s - %(name)s - %(levelname)s - %(message)s",
"file": null,
"max_size": 10485760,
"backup_count": 5
}
}| Key | Default | Description |
|---|---|---|
logging.level |
INFO |
Default log level |
logging.format |
%(asctime)s - %(name)s - %(levelname)s - %(message)s |
Default log format |
logging.file |
null |
No log file by default |
logging.max_size |
10485760 |
Max size for rotated log file (bytes) |
logging.backup_count |
5 |
Number of rotated log files to keep |
ciberwebscan config showciberwebscan config show httpciberwebscan config set http.timeout.connect 15.0ciberwebscan config reset
ciberwebscan config reset httpciberwebscan config get http.timeout.connectciberwebscan config keys
ciberwebscan config keys --section httpciberwebscan config export config.yaml # Exports to YAML (default format)
ciberwebscan config export config.json --format jsonciberwebscan config load config.yaml- Persistent configuration (
config.*) is stored in the user config file (~/.ciberwebscan/config.yaml) and loaded byConfigLoaderat startup (or viaget_config()). Environment variables with theCIBERWEBSCAN_prefix and the config file are merged; environment variables have higher precedence. - CLI/runtime options (for example
AttackOptions,AnalyzeOptions) are dataclasses used only for the current execution. CLI flags are converted into these option objects and override behavior for that run but do not modify the persistent configuration file. - When an options field is omitted (or set to
None), the service may fall back to the value fromget_config()— seeAttackOptions.__post_init__(src/ciberwebscan/services/attack_service.py) andAnalyzeOptionshandling (src/ciberwebscan/services/analyze_service.py).
You can access configuration in your code:
from ciberwebscan.config.loader import get_config
config = get_config()
timeout = config.http.timeout.connectConfiguration values are validated by Pydantic when loaded by the ConfigLoader.
-
Invalid values in the user configuration file are reported as a Pydantic
ValidationError. When this happens,ConfigLoaderlogs the validation error and falls back to the default configuration — the process continues running with defaults (the invalid file is not applied). -
CLI configuration commands surface user-friendly error messages and will exit with a non-zero status when an operation fails (for example,
ciberwebscan config loadwill print the validation error and return a non-zero exit code).
Example (logged Pydantic validation error):
ERROR ciberwebscan.config.loader: Invalid configuration: 1 validation error for AppConfig http -> timeout -> connect ensure this value is greater than or equal to 0.1 (type=value_error.number.not_ge; limit_value=0.1)
Example (CLI):
$ ciberwebscan config load bad-config.yaml
Error: Invalid configuration: 1 validation error for AppConfig
http -> timeout -> connect
ensure this value is greater than or equal to 0.1 (type=value_error.number.not_ge; limit_value=0.1)Troubleshooting tips:
- Run
ciberwebscan config show --config <path>to inspect the file the CLI is loading. - Set
LOG_LEVEL=DEBUG(or check application logs) to see the full validation details and stack trace. - The Pydantic error includes the dotted path to the offending field and a short explanation — fix that field in your
config.yamland retry.
When upgrading CiberWebScan, your existing configuration will be preserved. New default values will be used for any missing settings.
- [PROPOSED · NOT IMPLEMENTED]
analysis.fingerprint.deep_scan: Runtime optiondeep_scanexists onAnalyzeOptionsbut there is no persistentanalysis.fingerprint.deep_scanfield in the config model. If required, add the field toFingerprintConfigand wire it into the fingerprinter initialization inAnalyzeService. - [PARTIAL]
scraping.max_content_length: Present inScrapingConfigbut not enforced consistently across scrapers. Suggested action: enforce/max-truncate responses insrc/ciberwebscan/core/scraping/static.pyandsrc/ciberwebscan/core/scraping/dynamic.py, add unit + integration tests and document whether responses are rejected or truncated. - [PARTIAL]
scraping.extract_*(extract_links,extract_images,extract_scripts,extract_forms): Flags exist in the config model but are only partially applied by some scrapers; implement conditional extraction where applicable and add tests. - [NOT IMPLEMENTED]
include_screenshots: Defined inExportConfigand API models but not implemented by the export pipeline (BaseService._export_result/ exporter classes). Implement screenshot capture/storage and wire into exporters if this feature is desired. - [PROPOSED]
cache:CacheConfigexists but its practical usage (e.g., CVE caching) is limited in places; add integration points and tests where caching is expected.