NGINX Caching Proxy for Owlery

A high-performance caching proxy server that sits in front of OWL reasoning services to dramatically speed up query responses. Built on NGINX Alpine with a 6-month cache TTL, stale-while-revalidate pattern, and 5-year disk retention so a cached response is always available.

The proxy also includes security guardrails to refuse common scanner/probing requests before they reach Owlery, with optional IP block and whitelist files under /logs.

Usage Examples

Basic Usage

# Start the proxy (both ports 80 and 8080 are available)
docker run -d --name owl-cache -p 80:80 -p 8080:8080 virtualflybrain/owl_cache:latest

# Make a query on port 80 (will be slow first time)
curl "http://localhost/kbs/vfb/instances?object=<http://purl.obolibrary.org/obo/FBbt_00005106>"

# Same query on port 8080 (will be fast from cache)
curl "http://localhost:8080/kbs/vfb/instances?object=<http://purl.obolibrary.org/obo/FBbt_00005106>"

With Docker Compose

version: '3.8'
services:
  owl-cache:
    image: virtualflybrain/owl_cache:latest
    ports:
      - "80:80"
      - "8080:8080"
    volumes:
      - /cache:/var/cache/nginx
      - /logs:/logs
    environment:
      - UPSTREAM_SERVER=owl:8080  # For production with owl service
      - CACHE_MAX_SIZE=1t         # 1TB cache size for high-traffic deployments
      - DNS_RESOLVER=169.254.169.250  # Rancher internal DNS (check /etc/resolv.conf)

Health Check

curl http://localhost/health
# Returns: OK

/health is a lightweight liveness check for NGINX itself. Use /status for upstream reachability, cache totals, and connection counters.

Status Endpoint

curl http://localhost/status

Example response:

{
  "updated_at": "2026-03-24T12:00:00Z",
  "health": {
    "nginx": true,
    "upstream": true
  },
  "upstream": {
    "host": "owl.virtualflybrain.org",
    "port": 80
  },
  "cache": {
    "source": "access_log",
    "total": 120,
    "hit": 113,
    "miss": 7
  },
  "connections": {
    "active": 3,
    "reading": 0,
    "writing": 1,
    "waiting": 2
  }
}

/status is refreshed by a background monitor that reads /var/log/nginx/cache-access.log for cache totals and samples NGINX stub_status for connection counters.

Health Monitoring: A background process logs warnings when the upstream server becomes unreachable, but the container continues running to serve cached content.

Configuration

Environment Variables

UPSTREAM_SERVER: Backend server URL (default: owl.virtualflybrain.org:80)
CACHE_MAX_SIZE: Maximum cache size on disk (default: 20g, accepts NGINX size units like 1t for 1TB)
CACHE_STALE_TIME: How long a cached response is considered fresh (default: 6M). After this time the entry is served stale while being refreshed in the background. Accepts NGINX time units: s, m, h, d, w, M (30 days), y (365 days).
DNS_RESOLVER: DNS resolver servers (default: 8.8.8.8, space-separated list). Check cat /etc/resolv.conf in your container to find the correct value for your environment.
STATUS_POLL_INTERVAL: Seconds between /status refreshes (default: 5)
HEALTH_LOG_INTERVAL: Seconds between periodic upstream health log lines when state is unchanged (default: 300)
AUTO_BLOCK_SCANNERS: Automatically append probe-source IPs from /logs/hacks/probes.log to /logs/blocked.txt and live-reload NGINX maps (default: true)

Security Filtering and Blocking

Probe filtering: Requests matching common probing signatures (for example *.php, WordPress probe paths like wp-login.php, xmlrpc.php, wlwmanifest.xml, wp-includes/*, .env, phpmyadmin, path traversal payloads) are immediately refused with HTTP 403 and are not forwarded upstream.
Probe log output: Refused probe requests are logged to /logs/hacks/probes.log, including both raw X-Forwarded-For and the extracted left-most client IP.
Automatic scanner blocking: When AUTO_BLOCK_SCANNERS=true, newly detected client_ip values in /logs/hacks/probes.log are appended to /logs/blocked.txt (unless already present or whitelisted), and NGINX is reloaded so the block takes effect without container restart.
Manual IP blocklist: Add one IPv4/IPv6 address per line in /logs/blocked.txt (comments allowed with #).
Manual IP whitelist: Add one IPv4/IPv6 address per line in /logs/whitelist.txt (comments allowed with #).

Example /logs/blocked.txt:

203.0.113.10
# office VPN egress
2001:db8::1234

Example /logs/whitelist.txt:

203.0.113.50
# trusted monitoring source
2001:db8::beef

Blocked IP requests return HTTP 403 and are logged to /logs/hacks/blocked.log.

Whitelist entries take precedence over both the blocklist and probe filter.

Blocklist/whitelist entries are watched continuously by the runtime monitor. Updates to /logs/blocked.txt or /logs/whitelist.txt are converted into map files and applied via nginx -s reload within a few seconds.

Cache Headers

The proxy adds helpful headers to responses:

X-Cache-Status: HIT, MISS, EXPIRED, STALE, UPDATING, or REVALIDATED
X-Cache-Key: The cache key used for the request

Performance

Cache TTL: 6 months for successful responses (configurable via CACHE_STALE_TIME)
Disk retention: 5 years (inactive=5y) — entries are never evicted while disk space allows
First request: ~200ms (backend query)
Cached requests: <10ms (from cache)
Cache size: Up to 20GB on disk (configurable via CACHE_MAX_SIZE)
Memory usage: ~100MB for cache metadata

Technical Details

Architecture

Base image: nginx:1.26-alpine
Cache storage: /var/cache/nginx/owlery with 1:2 directory levels
Cache zone: 100MB in-memory metadata zone
Max cache size: 20GB on disk (configurable via CACHE_MAX_SIZE environment variable)
Status monitoring: Background process updates /var/run/nginx/status.json from the access log and NGINX stub_status
Health monitoring: Background process checks upstream connectivity and logs warnings without taking the cache offline

Caching Behavior

Cache TTL: 6 months for HTTP 200/400, 10 minutes for 404 (TTL configurable via CACHE_STALE_TIME)
Always serve stale: proxy_cache_use_stale expired updating — expired entries are served immediately while refreshed in the background (prevents MISSes after TTL)
Disk retention: 5 years — cache files are kept on disk even after TTL expires
Retry on errors: Automatically retries failed requests (502, 503, 504, timeouts) up to 2 times
Cache lock: Prevents stampede with proxy_cache_lock on
Cache key: $request_method$request_uri
Ignores backend headers: Cache-Control, Expires, Set-Cookie

Networking

Listen ports: 80 and 8080 (both ports handle requests identically)
Status endpoints: /health for liveness, /status for JSON metrics, and internal-only /__nginx_status for raw NGINX counters
DNS resolver: Configurable via DNS_RESOLVER (default: Google Public DNS 8.8.8.8 with 30s TTL for fast upstream IP updates). Check cat /etc/resolv.conf in your container for the correct value.
Host-agnostic: Ignores Host header for routing
Connection pooling: 16 keep-alive connections to backend
Timeouts: 90s connect/read/send, 3s for health checks

Build and Deployment

Deployment Process

# Pull image
docker pull virtualflybrain/owl_cache:latest

# Create cache directory
mkdir -p /cache
chown -R 101:101 /cache

# Create persistent logs + blocklist file
mkdir -p /logs/hacks
touch /logs/blocked.txt
touch /logs/whitelist.txt

# Deploy with compose
docker-compose up -d

# Verify
curl -I http://localhost/health

Configuration Files

Dockerfile: Image build instructions
nginx.conf.template: NGINX configuration template
docker-compose.yml: Example deployment configuration
.github/workflows/docker.yml: GitHub Actions CI/CD pipeline

CI/CD

This repository includes GitHub Actions workflow (.github/workflows/docker.yml) that:

Tests NGINX configuration syntax on every push
Builds and tests the Docker image
Pushes to Docker Hub on push to main branch or release

Required Secrets

Set these in your GitHub repository secrets:

DOCKER_HUB_USER: Your Docker Hub username
DOCKER_HUB_PASSWORD: Your Docker Hub password or access token

Expected Behavior

First Request: Cache MISS → Query backend (~200ms) → Cache result → Return with X-Cache-Status: MISS
Subsequent Requests: Cache HIT → Return cached result (<10ms) with X-Cache-Status: HIT
Expired Cache: Return stale content immediately with X-Cache-Status: UPDATING + background refresh
Backend Errors: Forward errors to client without caching, allowing retries to succeed
Status Reporting: /status shows current hit/miss/total counts from the access log plus sampled connection counters

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.github/workflows		.github/workflows
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
docker-entrypoint.sh		docker-entrypoint.sh
health-monitor.sh		health-monitor.sh
nginx.conf.template		nginx.conf.template

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NGINX Caching Proxy for Owlery

Usage Examples

Basic Usage

With Docker Compose

Health Check

Status Endpoint

Configuration

Environment Variables

Security Filtering and Blocking

Cache Headers

Performance

Technical Details

Architecture

Caching Behavior

Networking

Build and Deployment

Deployment Process

Configuration Files

CI/CD

Required Secrets

Expected Behavior

About

Uh oh!

Releases 23

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NGINX Caching Proxy for Owlery

Usage Examples

Basic Usage

With Docker Compose

Health Check

Status Endpoint

Configuration

Environment Variables

Security Filtering and Blocking

Cache Headers

Performance

Technical Details

Architecture

Caching Behavior

Networking

Build and Deployment

Deployment Process

Configuration Files

CI/CD

Required Secrets

Expected Behavior

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 23

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages