This guide covers all configuration options for UPMEX, including configuration files, environment variables, and programmatic configuration.
UPMEX can be configured through multiple methods (in order of precedence):
- Command-line arguments (highest priority)
- Environment variables
- Configuration file
- Default values (lowest priority)
UPMEX looks for configuration files in the following order:
- Path specified via
--configCLI option $UPMEX_CONFIG_PATHenvironment variable~/.upmex/config.json/etc/upmex/config.json
Configuration files use JSON format:
{
"api": {
"clearlydefined": {
"enabled": true,
"api_key": null,
"base_url": "https://api.clearlydefined.io",
"timeout": 30,
"retry_count": 3
},
"ecosystems": {
"enabled": true,
"api_key": null,
"base_url": "https://ecosyste.ms/api/v1",
"timeout": 30
},
"purldb": {
"enabled": false,
"api_key": null,
"base_url": "https://purldb.com/api",
"timeout": 30
},
"vulnerablecode": {
"enabled": false,
"api_key": null,
"base_url": "https://vulnerablecode.io/api",
"timeout": 30
}
},
"cache": {
"enabled": true,
"directory": "~/.cache/upmex",
"ttl": 86400,
"max_size": 1073741824
},
"output": {
"format": "json",
"pretty_print": false,
"include_files": false,
"include_provenance": true
},
"processing": {
"max_file_size": 524288000,
"timeout": 60,
"enable_registry": false,
"parallel_workers": 4
},
"logging": {
"level": "INFO",
"file": null,
"format": "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
}
}All configuration options can be set via environment variables using the prefix PME_ (Package Metadata Extractor).
# ClearlyDefined
export PME_CLEARLYDEFINED_ENABLED=true
export PME_CLEARLYDEFINED_API_KEY=your-api-key
export PME_CLEARLYDEFINED_BASE_URL=https://api.clearlydefined.io
export PME_CLEARLYDEFINED_TIMEOUT=30
# Ecosyste.ms
export PME_ECOSYSTEMS_ENABLED=true
export PME_ECOSYSTEMS_API_KEY=your-api-key
export PME_ECOSYSTEMS_BASE_URL=https://ecosyste.ms/api/v1
# PurlDB
export PME_PURLDB_ENABLED=false
export PME_PURLDB_API_KEY=your-api-key
# VulnerableCode
export PME_VULNERABLECODE_ENABLED=false
export PME_VULNERABLECODE_API_KEY=your-api-keyexport PME_CACHE_ENABLED=true
export PME_CACHE_DIR=~/.cache/upmex
export PME_CACHE_TTL=86400 # 24 hours in seconds
export PME_CACHE_MAX_SIZE=1073741824 # 1GB in bytesexport PME_OUTPUT_FORMAT=json # or "text"
export PME_PRETTY_PRINT=true
export PME_INCLUDE_FILES=false
export PME_INCLUDE_PROVENANCE=trueexport PME_MAX_FILE_SIZE=524288000 # 500MB in bytes
export PME_PROCESSING_TIMEOUT=60 # seconds
export PME_ENABLE_REGISTRY=false
export PME_PARALLEL_WORKERS=4export PME_LOG_LEVEL=DEBUG # DEBUG, INFO, WARNING, ERROR, CRITICAL
export PME_LOG_FILE=/var/log/upmex.log
export PME_LOG_FORMAT="%(asctime)s - %(levelname)s - %(message)s"| Option | Type | Default | Description |
|---|---|---|---|
enabled |
boolean | true | Enable ClearlyDefined API enrichment |
api_key |
string | null | API key for authentication |
base_url |
string | https://api.clearlydefined.io | API base URL |
timeout |
integer | 30 | Request timeout in seconds |
retry_count |
integer | 3 | Number of retry attempts |
| Option | Type | Default | Description |
|---|---|---|---|
enabled |
boolean | true | Enable Ecosyste.ms API enrichment |
api_key |
string | null | API key for authentication |
base_url |
string | https://ecosyste.ms/api/v1 | API base URL |
timeout |
integer | 30 | Request timeout in seconds |
| Option | Type | Default | Description |
|---|---|---|---|
enabled |
boolean | false | Enable PurlDB API enrichment |
api_key |
string | null | API key for authentication |
base_url |
string | https://purldb.com/api | API base URL |
timeout |
integer | 30 | Request timeout in seconds |
| Option | Type | Default | Description |
|---|---|---|---|
enabled |
boolean | false | Enable VulnerableCode API |
api_key |
string | null | API key for authentication |
base_url |
string | https://vulnerablecode.io/api | API base URL |
timeout |
integer | 30 | Request timeout in seconds |
| Option | Type | Default | Description |
|---|---|---|---|
enabled |
boolean | true | Enable response caching |
directory |
string | ~/.cache/upmex | Cache directory path |
ttl |
integer | 86400 | Cache time-to-live in seconds |
max_size |
integer | 1073741824 | Maximum cache size in bytes |
| Option | Type | Default | Description |
|---|---|---|---|
format |
string | json | Output format (json, text) |
pretty_print |
boolean | false | Pretty print JSON output |
include_files |
boolean | false | Include file listings in output |
include_provenance |
boolean | true | Include data provenance information |
| Option | Type | Default | Description |
|---|---|---|---|
max_file_size |
integer | 524288000 | Maximum file size in bytes (500MB) |
timeout |
integer | 60 | Processing timeout in seconds |
enable_registry |
boolean | false | Enable package registry lookups |
parallel_workers |
integer | 4 | Number of parallel workers |
| Option | Type | Default | Description |
|---|---|---|---|
level |
string | INFO | Log level (DEBUG, INFO, WARNING, ERROR) |
file |
string | null | Log file path (null for stdout) |
format |
string | %(asctime)s... | Log message format |
from upmex.config import Config
# Create config
config = Config()
# Modify settings
config.api.clearlydefined.enabled = True
config.api.clearlydefined.api_key = os.getenv('PME_CLEARLYDEFINED_API_KEY')
config.cache.directory = "/custom/cache"
config.output.pretty_print = True
# Use with extractor
from upmex import PackageExtractor
extractor = PackageExtractor(config=config)from upmex.config import Config
# Load from file
config = Config()
config.load_from_file("~/.upmex/config.json")
# Modify and save
config.api.ecosystems.enabled = False
config.save("~/.upmex/config.json")from upmex.config import Config
# Load from environment variables
config = Config()
config.load_from_env()
# Environment variables override file config
config.load_from_file("config.json")
config.load_from_env() # Overrides file settingsFor offline-only extraction:
{
"api": {
"clearlydefined": {"enabled": false},
"ecosystems": {"enabled": false},
"purldb": {"enabled": false},
"vulnerablecode": {"enabled": false}
},
"processing": {
"enable_registry": false
}
}For maximum performance:
{
"cache": {
"enabled": true,
"ttl": 604800,
"max_size": 5368709120
},
"processing": {
"parallel_workers": 8,
"timeout": 120
},
"api": {
"clearlydefined": {"timeout": 10, "retry_count": 1},
"ecosystems": {"timeout": 10}
}
}For security scanning:
{
"api": {
"vulnerablecode": {
"enabled": true,
"api_key": "your-key"
},
"clearlydefined": {
"enabled": true,
"api_key": "your-key"
}
},
"output": {
"include_provenance": true
}
}For continuous integration:
{
"output": {
"format": "json",
"pretty_print": false,
"include_provenance": false
},
"logging": {
"level": "WARNING",
"file": "/var/log/upmex-ci.log"
},
"processing": {
"timeout": 30,
"max_file_size": 104857600
}
}Command-line arguments override all other configuration:
# Override output format
upmex extract --format text package.whl
# Override API settings
upmex extract --no-api package.jar
# Override cache
upmex extract --no-cache package.gem
# Use custom config file
upmex extract --config /path/to/config.json package.whlUPMEX validates configuration on startup:
from upmex.config import Config, validate_config
config = Config()
errors = validate_config(config)
if errors:
for error in errors:
print(f"Configuration error: {error}")-
Store API keys securely: Use environment variables or secure vaults
export PME_CLEARLYDEFINED_API_KEY=$(vault read secret/api-key)
-
Restrict config file permissions:
chmod 600 ~/.upmex/config.json -
Don't commit secrets:
# .gitignore config.json *.key
-
Enable caching for repeated extractions:
{ "cache": { "enabled": true, "ttl": 604800 } } -
Adjust parallel workers based on CPU:
import multiprocessing config.processing.parallel_workers = multiprocessing.cpu_count()
-
Set appropriate timeouts:
{ "processing": { "timeout": 60 }, "api": { "clearlydefined": { "timeout": 10 } } }
-
Configure retry logic:
{ "api": { "clearlydefined": { "retry_count": 3 } } } -
Set file size limits:
{ "processing": { "max_file_size": 104857600 } } -
Enable comprehensive logging:
{ "logging": { "level": "INFO", "file": "/var/log/upmex.log" } }
import logging
logging.basicConfig(level=logging.DEBUG)
from upmex.config import Config
config = Config()
config.load_from_file("config.json") # Will log loading details# Check effective configuration
upmex config show
# Validate configuration file
upmex config validate config.json
# Test API connectivity
upmex config test-api# Check if key is loaded
echo $PME_CLEARLYDEFINED_API_KEY
# Test API directly
curl -H "Authorization: Bearer $PME_CLEARLYDEFINED_API_KEY" \
https://api.clearlydefined.io/definitions# Check cache directory permissions
ls -la ~/.cache/upmex
# Clear corrupted cache
rm -rf ~/.cache/upmex/*# Debug config file path
from upmex.config import Config
import os
config = Config()
print(f"Looking for config at: {config.get_config_path()}")
print(f"File exists: {os.path.exists(config.get_config_path())}")