Academic research project for classifying automated HTTP clients (bots, LLMs, crawlers) vs real browsers using transport-level fingerprinting.
Live: Dashboard · Bot detector
Version: 1.5.1 | Changelog | Methodology
Localhost (10s, 50 concurrent connections; Go server only, no nginx in front):
| Mode | RPS | RPM | Latency avg |
|---|---|---|---|
| HTTP (no TLS) | ~11,550 | ~693K | ~4.3 ms |
| HTTPS (TLS fingerprinting, JA3/JA4/JA4H) | ~8,210 | ~493K | ~6.1 ms |
Over the network (patched nginx + TLS termination + X-FP-* at edge, 10s, 50 concurrent, HTTPS):
| Endpoint | RPS | RPM | Latency avg |
|---|---|---|---|
| GET / (classify) | ~2,640 | ~158K | ~18.9 ms |
| GET /health | ~4,242 | ~255K | ~11.8 ms |
Summary (10s, c=50, same host): Localhost: 8–11K RPS, 4–6 ms avg. Over the network (nginx/TLS): classify ~2.6K RPS, ~18.9 ms; health ~4.2K RPS, ~11.8 ms — so classification adds ~7 ms vs health. With Redis + on-the-fly request metrics on the same host: ~1,620 RPS, 30.9 ms (vs classify without Redis: −39% RPS, +12 ms per request). Takeaways: classification cost ~7 ms over transport; Redis + stats add ~12 ms and ~40% throughput drop on endpoints that build request_metrics.
Create a single HTTP endpoint that classifies clients as browser or bot based exclusively on:
- TLS handshake patterns (JA3/JA4 fingerprinting)
- HTTP/2 negotiation behavior
- Header structure and semantics
- Request patterns
No JavaScript challenges, no rate limiting — pure network fingerprinting.
Phase 1 [COMPLETED] — TLS + HTTP fingerprinting: ClientHello capture, JA3/JA4/JA4H, TLS and HTTP signals in scoring, JA4H↔HTTP consistency (evasion detection), HTTPS server mode.
Phase 2 — HTTP/2: H2 fingerprint consumed from proxy (X-FP-H2); SETTINGS/PRIORITY/window come from nginx modules at the edge (e.g. nginx-http2-fingerprint) and are used in classification when present. No low-level H2 parsing in Go. Planned: H2/H3 ratio tracking. See docs/nginx.md and Methodology → Phase 2.
Phase 3 — Inconsistency detection: spatial (JA4H vs HTTP, TLS/HTTP version mismatch) in place. Planned: temporal inconsistency (same client, changing FPs), header–UA validation. See Methodology → Phase 3.
Redis & behavioural (Appendix L, M) — Optional Redis (REDIS_URL): (1) Challenge store — nonce→User-Agent stored in Redis so multiple instances share state; (2) Behavioural metrics — request counts and timestamps per IP and per __ch_nonce (sliding window). When behavioral_edges are set in scoring config, the classifier adds bot score for rate and inter-arrival conditions (Appendix M). /debug returns request_metrics for the current request. See docs/deploy/README.md, Methodology Appendix L, config/README.md.
See CHANGELOG.md for detailed release notes.
Direct TLS (Go terminates HTTPS):
client → TLS listener (Go) → fingerprint collector → classifier → response
Via nginx (TLS termination at edge, fingerprint via headers):
client → nginx (TLS + JA3 + H2 fingerprint) → proxy_pass → Go (HTTP :8080, X-FP-* headers) → collector → classifier → response
See docs/nginx.md and Methodology Appendix F.
- Core: Go (HTTP/2 server, TLS fingerprinting, classification)
- Analytics: Python (log analysis, pattern extraction). Request log statistics — tools/python/request_log_stats.py aggregates JSONL logs: top-N by path, method, IP, user agent, JA3/JA4/JA4H, headers; bot/browser breakdown; scoring-signal prevalence; optional significance filter (√N). Dashboard payload — tools/python/build_dashboard_payload.py builds the JSON for the web dashboard (windows, timeline, transport + behavioural signals). See tools/python/README.md and Methodology Appendix J.
- Dashboard: React (Vite, TypeScript) in tools/ts/dashboard — terminal-style UI: time windows, timeline (60 bars, auto-clustering by 10 s / 1 min / 10 min), signal activation table (sortable), auto-refresh. Consumes JSON produced by
build_dashboard_payload.py. See tools/ts/dashboard/README.md and Methodology Appendix N. - Logging: Structured JSON logs per day (
logs/requests_YYYYMMDD.jsonl) for research analysis
.
├── cmd/
│ └── server/ # HTTP server entry point
├── internal/
│ ├── config/ # Scoring config loader (JSON → classifier + fingerprint)
│ ├── fingerprint/ # TLS/HTTP signal collection
│ ├── classifier/ # Rule-based classification
│ ├── logger/ # Structured JSON logging
│ ├── metrics/ # Behavioral metrics (Redis: per IP, per __ch_nonce; Appendix L)
│ └── server/ # HTTP handlers (challenge store, Redis wiring)
├── config/ # Scoring config (scoring.json, scoring.default.json, README)
├── tests/
│ ├── integration/ # Automated client tests
│ ├── unit/ # Unit tests
│ └── testdata/ # Test stubs (e.g. ja4db_fixture.json, reference_*.json)
├── tools/
│ ├── benchmark/ # HTTP benchmark tool
│ ├── python/ # Analytics tools (request_log_stats, build_dashboard_payload, behavioral_bars, …)
│ ├── ts/
│ │ └── dashboard/ # React dashboard (time windows, timeline, signals; consumes dashboard.json)
│ └── shell/ # Integration test scripts
├── internal/fingerprint/data/ # JA4 DB path (ja4db.json downloaded on first start if missing)
├── logs/ # JSON traffic logs (requests_YYYYMMDD.jsonl per day)
└── docs/ # Research documentation
- Full ClientHello capture via custom TLS listener
- JA3/JA4 fingerprint hashing
- ALPN negotiation (h2, http/1.1)
- Cipher suite count and complexity (15+ suggests browser)
- TLS extensions count (10+ suggests browser)
- Supported versions, signature schemes, elliptic curve groups
- Session ticket and early data support
- HTTP/2 vs HTTP/1.1; HTTP/2 fingerprint (SETTINGS, PRIORITY, window) when provided by proxy
- JA4H fingerprinting (HTTP fingerprint from JA4+ family)
- Header order and structure; browser-specific headers (sec-fetch-*, accept-language); header count and entropy
- Cross-signal consistency: JA4H vs HTTP; TLS vs User-Agent (known library/browser JA3/JA4); H2 vs JA4 (ALPN); TLS ALPN vs HTTP version (direct TLS)
- Absence signals (direct TLS only): missing SNI or ALPN when TLS is available scores toward bot; optional browser bonus when no smoking-gun bot signals fire (see config/README.md)
- Collect: Run server, generate traffic (curl, browsers, LLM tools)
- Log: All requests logged as structured JSON to daily files (
logs/requests_YYYYMMDD.jsonl) - Analyze: Run request_log_stats.py on JSONL logs for top-N by path/method/IP/fingerprint and scoring-signal prevalence; see Methodology Appendix J
- Dashboard (optional): Build dashboard JSON with build_dashboard_payload.py and serve the React dashboard (tools/ts/dashboard) for live views of windows, timeline, and signal activation
- Iterate: Update classification heuristics based on findings
- Test: Automated integration tests validate behavior
- Go 1.22+ — Download installers for Windows, macOS, Linux; or install via package manager (e.g.
winget install GoLang.Go,brew install go,apt install golang-go). Ensuregois on your PATH. - Go tools directory in PATH — add
$HOME/go/bin(default when Go is installed in the usual way). Required sotaskandgolangci-lintare found aftergo install. Using this explicit path avoids errors whengocannot read the current directory (e.g. aftersudo su). Do not install thetaskortaskwarriorapt/snap packages (they are different programs). - TLS certificate and key (for HTTPS mode)
# Clone repository
git clone https://github.com/muliwe/go-client-classifier.git
cd go-client-classifier
# Install dependencies and dev tools
go mod tidy
go install github.com/go-task/task/v3/cmd/task@latest
go install github.com/golangci/golangci-lint/cmd/golangci-lint@latest
# Ensure Go bin is on PATH (required for `task` and `golangci-lint`)
# Use explicit path so it works even when current directory has permission issues (e.g. after sudo su)
export PATH=$PATH:$HOME/go/bin
# To make it permanent, add the same line to your shell profile and reload:
echo 'export PATH=$PATH:$HOME/go/bin' >> ~/.bashrc && source ~/.bashrc # bashFor TLS fingerprinting to work, the server must run in HTTPS mode. Place your certificate and key in the certs/ directory:
certs/
├── server.crt
└── server.key
Note: The certs/ directory is in .gitignore — certificates are not committed to the repository.
To generate a self-signed certificate for local development:
# Create certs directory
mkdir certs
# Generate self-signed certificate (valid for 1 year)
openssl req -x509 -newkey rsa:4096 -keyout certs/server.key -out certs/server.crt \
-days 365 -nodes -subj "/CN=localhost"Add the certificate to your system's trusted certificates for browser testing without warnings.
Using Let's Encrypt (certbot) — for a public hostname with HTTPS:
# Install certbot (Ubuntu/Debian)
sudo apt install certbot
# Obtain a certificate (standalone mode: port 80 must be free for the challenge)
sudo certbot certonly --standalone -d your-domain.example.com
# Certbot stores certs under /etc/letsencrypt/live/<domain>/
# Point the server at them via env or symlink into certs/:
# TLS_CERT=/etc/letsencrypt/live/your-domain.example.com/fullchain.pem
# TLS_KEY=/etc/letsencrypt/live/your-domain.example.com/privkey.pem
# Or copy/symlink into project certs/ (ensure deploy user can read; certbot files are root-readable):
sudo cp /etc/letsencrypt/live/your-domain.example.com/fullchain.pem certs/server.crt
sudo cp /etc/letsencrypt/live/your-domain.example.com/privkey.pem certs/server.key
sudo chown $(whoami) certs/server.crt certs/server.keyRenewal: certbot can renew via sudo certbot renew (e.g. from cron or systemd timer). After renewal, restart the Go server so it reloads the certs.
# Build binary
task build
# Run server (HTTP mode, no TLS fingerprinting)
task run
# Run server with HTTPS (required for TLS fingerprinting)
task run:tls
# Run tests
task test
# Run linter
task lint
# Format code
task fmt
# Run all checks (fmt, lint, test)
task check
# List all available tasks
task --list# Build binary to bin/server
task build
# Or manually
go build -o bin/server ./cmd/server
# Run the binary
./bin/serverThe server uses a JA4 fingerprint database (ja4db.com) for TLS vs User-Agent consistency. If the file is absent, the server downloads it itself on first use (saved to internal/fingerprint/data/ja4db.json when running from repo root). No manual step is required for basic runs.
For deployment, you can optionally download the dictionary manually (e.g. to avoid first-request latency or when the host has no outbound HTTPS):
# From repo root; creates internal/fingerprint/data/ja4db.json
curl -o internal/fingerprint/data/ja4db.json "https://ja4db.com/api/read/"Or with PowerShell:
Invoke-WebRequest -Uri "https://ja4db.com/api/read/" -OutFile "internal/fingerprint/data/ja4db.json" -UseBasicParsingEnsure the directory exists (mkdir -p internal/fingerprint/data or New-Item -ItemType Directory -Force -Path internal/fingerprint/data). Override path with env JA4DB_PATH if you place the file elsewhere.
# Run all tests
task test
# Run tests (short mode)
task test:short
# Test with curl (HTTP mode)
curl http://localhost:8080/
# Test with curl (HTTPS mode)
curl https://localhost:8443/
# Test health endpoint
curl http://localhost:8080/health
curl https://localhost:8443/healthRun integration tests against a running server using curl:
# HTTP mode
task run # Start server (terminal 1)
task integration # Run tests (terminal 2)
# HTTPS mode (TLS fingerprinting)
task run:tls # Start HTTPS server (terminal 1)
task integration:tls # Run tests with --insecure (terminal 2)
# Custom base URL
task integration BASE_URL=http://localhost:3000
task integration:tls BASE_URL=https://localhost:8443Run HTTP performance benchmark against a running server. You can pass a URL to test different routes (e.g. /, /health, /debug).
# Start server
task run:tls # HTTPS mode (terminal 1)
# Run benchmark (terminal 2)
task bench:tls # Default URL: https://localhost:8443/, 10s, 10 concurrent
# Pass URL to test a specific path (variable or positional after --)
task bench:tls URL=https://localhost:8443/debug
task bench:tls -- https://localhost:8443/health
# Custom duration and concurrency
task bench:tls DURATION=30s CONCURRENCY=50
# HTTP mode (default URL: http://localhost:8080/)
task bench
task bench URL=http://localhost:8080/health DURATION=10s CONCURRENCY=10
task bench -- http://localhost:8080/Benchmark output includes RPS, RPM, and latency statistics (avg/min/max).
The integration tests automatically detect the OS and use:
tools/shell/integration_test.ps1for Windows (PowerShell)tools/shell/integration_test.shfor Unix (Linux/macOS)
Tests verify:
GET /health— health check endpoint returns{"status":"ok"}GET /— classify endpoint returns classificationGET /debug— debug endpoint returns fingerprint data- curl is correctly detected as bot
| Endpoint | Description |
|---|---|
GET / |
Classify client as browser or bot |
GET /health |
Health check |
GET /debug |
Debug info with full fingerprint (dev only) |
Example API response (GET /):
{
"classification": "browser",
"confidence": "0.95",
"message": "You appear to be using a browser",
"request_id": "uuid",
"timestamp": "2026-02-18T12:00:00Z",
"version": "1.5.1"
}(confidence is a string with 2 decimal places to avoid float instability.)
Each request is logged as one JSON line (JSONL) with full fingerprint data. Log files are written by day in UTC: logs/requests_YYYYMMDD.jsonl (e.g. logs/requests_20260217.jsonl). The server rotates to a new file automatically when the date changes.
{
"timestamp": "2026-02-12T12:40:35Z",
"request_id": "uuid",
"classification": "browser",
"confidence": 0.99,
"fingerprint": {
"tls": {
"version": "TLS 1.3",
"cipher_suites_count": 16,
"extensions_count": 18,
"ja3_hash": "9b0d79d10808bc0e509b4789f870a650",
"ja4_hash": "t13d1516h2_8daaf6152771_d8a2da3f94cd",
"supported_groups": ["GREASE", "x25519", "secp256r1", "secp384r1"]
},
"http": {
"version": "HTTP/2.0",
"header_count": 14
}
},
"signals": {
"browser_score": 18,
"bot_score": 0,
"score_breakdown": "BROWSER[http2(+2) sec-fetch(+3) ...] BOT[]"
},
"score": 18
}You can run the service on Ubuntu as a systemd unit: one process listens on both HTTP and HTTPS, and restarts on failure or after a reboot.
1. Build the Linux binary
On your dev machine or in CI:
task build:prodThe binary will be at bin/server. Copy it to the server (e.g. /opt/go-client-classifier/).
2. Certificates
Place the certificate and key in the app directory, for example:
/opt/go-client-classifier/
├── server # binary
├── certs/
│ ├── server.crt
│ └── server.key
└── logs/ # created automatically
3. systemd unit file
Create /etc/systemd/system/go-client-classifier.service:
[Unit]
Description=Go Client Classifier (bot detector)
After=network.target
[Service]
Type=simple
User=deploy
Group=deploy
WorkingDirectory=/opt/go-client-classifier
ExecStart=/opt/go-client-classifier/bin/server
Restart=always
RestartSec=5
# HTTP :8080, HTTPS :8443
Environment=PORT=8080
Environment=TLS_PORT=8443
Environment=TLS_CERT=/opt/go-client-classifier/certs/server.crt
Environment=TLS_KEY=/opt/go-client-classifier/certs/server.key
# Optional: Redis — challenge store + behavioural metrics (Appendix L). If unset, challenge store is in-memory and metrics are not collected.
# Environment=REDIS_URL=redis://127.0.0.1:6379/0
# Environment=CHALLENGE_TTL_SEC=120
# Optional: enable PROXY protocol on TLS port (when nginx stream uses proxy_protocol on → real client IP in logs)
# Environment=PROXY_PROTOCOL=1
# Raise open-file limit (default 1024 can cause SSL/connection failures under load)
LimitNOFILE=65535
# Optional: disable request logging, only health/debug
# Environment=DEBUG=false
[Install]
WantedBy=multi-user.targetRedis (optional) — If you use REDIS_URL, ensure Redis is running. Check and install:
# Check: expect PONG
redis-cli ping
# Install if missing (Debian/Ubuntu)
command -v redis-server >/dev/null 2>&1 || sudo apt-get update && sudo apt-get install -y redis-server
# Or RHEL/Rocky/Fedora
command -v redis-server >/dev/null 2>&1 || sudo dnf install -y redis && sudo systemctl enable --now redis
# After install: ensure it runs and responds (service name: redis on RHEL/Ubuntu 24+, redis-server on older Debian/Ubuntu)
sudo systemctl enable --now redis 2>/dev/null || sudo systemctl enable --now redis-server
redis-cli pingReplace User=deploy and Group=deploy with the user and group that should run the service. Ensure that user can read the binary, certs/, and write to logs/ (e.g. chown -R deploy:deploy /opt/go-client-classifier).
Alternatively, put variables in a file: create /opt/go-client-classifier/.env (or environment.conf) and add EnvironmentFile=/opt/go-client-classifier/.env to the unit.
4. Enable and start
sudo systemctl daemon-reload
sudo systemctl enable go-client-classifier
sudo systemctl start go-client-classifier
sudo systemctl status go-client-classifierVerify: curl http://localhost:8080/health and curl -k https://localhost:8443/health.
Viewing logs in real time
- Service output (stdout/stderr: startup message, per-request console line, errors):
journalctl -u go-client-classifier -f
- Request log file (JSONL, one line per classify request):
Or from the app directory:
tail -f /opt/go-client-classifier/logs/requests_$(date +%Y%m%d).jsonltail -f logs/requests_*.jsonl(today’s file).
Note: Any request that hits the classify handler (including non-root paths like /not-known) is classified and written to the JSONL and console logs; only GET / returns 200 JSON, other paths return 404. GET /health and GET /debug are handled by other handlers and are not logged. If the log stays empty, check journalctl -u go-client-classifier -f for the "Logs:" path at startup and any "Error logging result" messages.
Environment variables
| Variable | Description | Example |
|---|---|---|
PORT |
HTTP port | 8080 |
TLS_PORT |
HTTPS port (when using TLS) | 8443 |
PROXY_PROTOCOL |
PROXY protocol on TLS (if nginx has proxy_protocol on) |
true |
TLS_CERT |
Path to certificate file | certs/server.crt |
TLS_KEY |
Path to key file | certs/server.key |
DEBUG |
Enable /debug endpoint |
true / false |
SCORING_CONFIG |
Path to scoring JSON (points, thresholds, classifier) | config/scoring.json |
If only TLS_CERT and TLS_KEY are set (no TLS_PORT), the service runs in HTTPS-only mode on PORT.
Scoring config — All scoring points, thresholds, classifier weight and confidence parameters are read from a single JSON file at startup. Path: SCORING_CONFIG or default config/scoring.json. If the file is missing or invalid, built-in defaults are used. Tuning (e.g. reducing false bots for incognito) is done via the config without code changes. See config/README.md for the schema, smoking guns (+3), strong/weak bot signals, and zero-point (easily spoofable) signals; config/scoring.default.json is the reference default.
Dashboard deployment — To use the web dashboard in production:
-
Build the frontend: From
tools/ts/dashboard, runnpm run build. Serve the contents ofdist/as static files (e.g. under/dashboard/or a dedicated subdomain). -
Expose the payload in nginx: Configure nginx so the dashboard can load the JSON, e.g.:
- Alias a location (e.g.
/dashboard.jsonor/data/dashboard.json) to the path where the payload is written (e.g./var/www/dashboard/dashboard.json). - Or proxy that path to an upstream that serves the file.
- The dashboard uses
/dashboard.jsonby default, or the URL set at build time viaVITE_DASHBOARD_JSON_URL.
- Alias a location (e.g.
-
Cron for statistics: Run build_dashboard_payload.py on a schedule so the JSON is updated (e.g. every 1–5 minutes). Example:
*/5 * * * * cd /opt/go-client-classifier/tools/python && /home/deploy/.local/bin/poetry run python build_dashboard_payload.py "../../logs/requests_*.jsonl" --out ../../tools/ts/dashboard/dashboard.json --config ../../config/scoring.json >> /var/log/dashboard_build.log 2>&1
Adjust
--log-glob,--out, and the config path (e.g.--config config/scoring.json) to match your layout. See tools/python/README.md for all options.
Without (1)–(3), the dashboard UI may load but show stale or missing data; nginx (or equivalent) must serve both the built frontend and the payload URL, and cron (or another scheduler) must keep the payload up to date. Dashboard functionality and JSON contract are described in Methodology Appendix N.
- Can transport-level signals reliably distinguish browsers from automation?
- Which signals are most predictive?
- How do sophisticated bots (headless Chrome) behave?
- What are the false positive/negative rates?
Project uses git pre-commit hooks for code quality:
- Format check (
go fmt) - Linter (
golangci-lint) - Tests (
go test)
Hooks are automatically run before each commit.
- CHANGELOG.md — version history and release notes
- config/README.md — scoring config schema, smoking guns, weak/zero signals, thresholds
- docs/METHODOLOGY.md — research methodology, signals, scoring algorithm, references; Appendix J — request log statistics; Appendix N — dashboard functionality
- docs/nginx.md — nginx setup for TLS termination, HTTP/2 fingerprint (X-FP-H2), JA3 (X-FP-JA3); Go consumes headers and uses H2/JA3 in cross-validation (Appendix G)
- tools/python/README.md — Python tools: build_dashboard_payload, request_log_stats, request_log_stats_by_class, behavioral_bars, antibot_test
- tools/ts/dashboard/README.md — React dashboard: setup, JSON contract, auto-refresh, sortable signals table
MIT (Academic Research)
Research project for academic purposes.