From ade64386a624c517c43f5b721b528fc39ec18858 Mon Sep 17 00:00:00 2001 From: Andrew Nesbitt Date: Thu, 12 Mar 2026 12:18:27 +0000 Subject: [PATCH] Document web UI, monitoring, database schema, and cooldown support Add web interface section to README describing all pages (dashboard, package browser, source browser, version diff). Add monitoring section with the full Prometheus metrics table and scrape config. Add cooldown column to the registry support table. Update architecture doc with accurate database schema including all columns and indexes, and add entries for metrics, cooldown, and enrichment packages. --- README.md | 95 ++++++++++++++++++++++-------- docs/architecture.md | 137 ++++++++++++++++++++++++++++++++----------- 2 files changed, 172 insertions(+), 60 deletions(-) diff --git a/README.md b/README.md index 930ee61..8e56851 100644 --- a/README.md +++ b/README.md @@ -24,31 +24,33 @@ Currently works with npm, PyPI, pub.dev, and Composer, which all include publish ## Supported Registries -| Registry | Language/Platform | URL Resolution | Handler | Completed | -|----------|-------------------|:--------------:|:-------:|:---------:| -| npm | JavaScript | Yes | Yes | ✓ | -| Cargo | Rust | Yes | Yes | ✓ | -| RubyGems | Ruby | Yes | Yes | ✓ | -| Go proxy | Go | Yes | Yes | ✓ | -| Hex | Elixir | Yes | Yes | ✓ | -| pub.dev | Dart | Yes | Yes | ✓ | -| PyPI | Python | Yes | Yes | ✓ | -| Maven | Java | Yes | Yes | ✓ | -| NuGet | .NET | Yes | Yes | ✓ | -| Composer | PHP | Yes | Yes | ✓ | -| Conan | C/C++ | Yes | Yes | ✓ | -| Conda | Python/R | Yes | Yes | ✓ | -| CRAN | R | Yes | Yes | ✓ | -| Container | Docker/OCI | Yes | Yes | ✓ | -| Debian | Debian/Ubuntu | Yes | Yes | ✓ | -| RPM | RHEL/Fedora | Yes | Yes | ✓ | -| Alpine | Alpine Linux | No | No | ✗ | -| Arch | Arch Linux | No | No | ✗ | -| Chef | Chef | No | No | ✗ | -| Generic | Any | No | No | ✗ | -| Helm | Kubernetes | No | No | ✗ | -| Swift | Swift | No | No | ✗ | -| Vagrant | Vagrant | No | No | ✗ | +| Registry | Language/Platform | Cooldown | Completed | +|----------|-------------------|:--------:|:---------:| +| npm | JavaScript | Yes | ✓ | +| Cargo | Rust | | ✓ | +| RubyGems | Ruby | | ✓ | +| Go proxy | Go | | ✓ | +| Hex | Elixir | | ✓ | +| pub.dev | Dart | Yes | ✓ | +| PyPI | Python | Yes | ✓ | +| Maven | Java | | ✓ | +| NuGet | .NET | | ✓ | +| Composer | PHP | Yes | ✓ | +| Conan | C/C++ | | ✓ | +| Conda | Python/R | | ✓ | +| CRAN | R | | ✓ | +| Container | Docker/OCI | | ✓ | +| Debian | Debian/Ubuntu | | ✓ | +| RPM | RHEL/Fedora | | ✓ | +| Alpine | Alpine Linux | | ✗ | +| Arch | Arch Linux | | ✗ | +| Chef | Chef | | ✗ | +| Generic | Any | | ✗ | +| Helm | Kubernetes | | ✗ | +| Swift | Swift | | ✗ | +| Vagrant | Vagrant | | ✗ | + +Cooldown requires publish timestamps in metadata. Registries without a "Yes" in the cooldown column either don't expose timestamps or haven't been wired up yet. ## Quick Start @@ -465,9 +467,10 @@ Recently cached: | Endpoint | Description | |----------|-------------| -| `GET /` | Welcome message and endpoint list | +| `GET /` | Dashboard (web UI) | | `GET /health` | Health check (returns "ok" if healthy) | | `GET /stats` | Cache statistics (JSON) | +| `GET /metrics` | Prometheus metrics | | `GET /npm/*` | npm registry protocol | | `GET /cargo/*` | Cargo sparse index protocol | | `GET /gem/*` | RubyGems protocol | @@ -667,6 +670,46 @@ Response: └─────────┘ ``` +## Web Interface + +The proxy serves a web UI at the root URL. No separate frontend build is needed -- templates and assets are embedded in the binary. + +- **Dashboard** (`/`) -- cache stats, popular packages, recently cached artifacts, and vulnerability overview. +- **Install guide** (`/install`) -- per-ecosystem configuration instructions, so you don't have to look them up here. +- **Package browser** (`/packages`) -- browse all cached packages with filtering by ecosystem and sorting by hits, size, name, or vulnerability count. +- **Search** (`/search?q=...`) -- search cached packages by name. +- **Package detail** (`/package/{ecosystem}/{name}`) -- metadata, license, vulnerabilities, and version list for a package. You can select two versions to compare. +- **Version detail** (`/package/{ecosystem}/{name}/{version}`) -- per-version metadata, integrity hash, artifact cache status, and hit counts. +- **Source browser** (`/package/{ecosystem}/{name}/{version}/browse`) -- browse files inside cached archives with syntax highlighting for text files and image previews. +- **Version diff** (`/package/{ecosystem}/{name}/compare/{v1}...{v2}`) -- side-by-side diff of two cached versions showing added, removed, and changed files. + +## Monitoring + +The proxy exposes Prometheus metrics at `GET /metrics`. All metric names are prefixed with `proxy_`. + +| Metric | Type | Labels | Description | +|--------|------|--------|-------------| +| `proxy_cache_hits_total` | counter | `ecosystem` | Cache hits | +| `proxy_cache_misses_total` | counter | `ecosystem` | Cache misses | +| `proxy_cache_size_bytes` | gauge | | Total size of cached artifacts | +| `proxy_cached_artifacts_total` | gauge | | Number of cached artifacts | +| `proxy_upstream_fetch_duration_seconds` | histogram | `ecosystem` | Time spent fetching from upstream | +| `proxy_upstream_errors_total` | counter | `ecosystem`, `error_type` | Upstream fetch failures | +| `proxy_storage_operation_duration_seconds` | histogram | `operation` | Storage read/write latency | +| `proxy_storage_errors_total` | counter | `operation` | Storage read/write failures | +| `proxy_active_requests` | gauge | | In-flight requests | + +Cache size and artifact count are refreshed every 60 seconds. The remaining metrics update on each request. + +Scrape config for Prometheus: + +```yaml +scrape_configs: + - job_name: git-pkgs-proxy + static_configs: + - targets: ["localhost:8080"] +``` + ## Production Deployment ### Systemd Service diff --git a/docs/architecture.md b/docs/architecture.md index be9a6a6..8b207bd 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -7,29 +7,24 @@ This document describes the internal architecture of the git-pkgs proxy. The proxy is a caching HTTP server that sits between package manager clients and upstream registries. It intercepts requests, checks a local cache, and either serves cached content or fetches from upstream. ``` -┌─────────────────────────────────────────────────────────────────┐ -│ HTTP Server │ -│ ┌─────────────────────────────────────────────────────────┐ │ -│ │ Router (ServeMux) │ │ -│ │ /npm/* -> NPMHandler │ │ -│ │ /cargo/* -> CargoHandler │ │ -│ │ /health -> healthHandler │ │ -│ │ /stats -> statsHandler │ │ -│ └─────────────────────────────────────────────────────────┘ │ -│ │ │ -│ ▼ │ -│ ┌─────────────────────────────────────────────────────────┐ │ -│ │ Proxy │ │ -│ │ - GetOrFetchArtifact() │ │ -│ │ - Coordinates DB, Storage, Fetcher │ │ -│ └─────────────────────────────────────────────────────────┘ │ -│ │ │ │ │ -│ ▼ ▼ ▼ │ +┌──────────────────────────────────────────────────────────────────┐ +│ HTTP Server │ +│ ┌──────────────────────────────────────────────────────────┐ │ +│ │ Router (Chi) │ │ +│ │ /npm/* -> NPMHandler /health -> healthHandler │ │ +│ │ /cargo/* -> CargoHandler /stats -> statsHandler │ │ +│ │ /gem/* -> GemHandler /metrics -> prometheus │ │ +│ │ ...16 ecosystems /api/* -> APIHandler │ │ +│ │ / -> Web UI │ │ +│ └──────────────────────────────────────────────────────────┘ │ +│ │ │ │ │ +│ ▼ ▼ ▼ │ │ ┌───────────┐ ┌─────────────┐ ┌─────────────┐ │ -│ │ Database │ │ Storage │ │ Upstream │ │ -│ │ (SQLite) │ │ (Filesystem)│ │ (Fetcher) │ │ +│ │ Database │ │ Storage │ │ Upstream │ │ +│ │ SQLite or │ │ Filesystem │ │ Registries │ │ +│ │ Postgres │ │ or S3 │ │ (Fetcher) │ │ │ └───────────┘ └─────────────┘ └─────────────┘ │ -└─────────────────────────────────────────────────────────────────┘ +└──────────────────────────────────────────────────────────────────┘ ``` ## Request Flow @@ -91,29 +86,87 @@ Metadata is not cached - always fetched fresh. This ensures clients see new vers ### `internal/database` -SQLite database for cache metadata. Uses `modernc.org/sqlite` (pure Go, no CGO). +SQLite or PostgreSQL database for cache metadata. SQLite uses `modernc.org/sqlite` (pure Go, no CGO). PostgreSQL uses `lib/pq`. + +The schema is compatible with [git-pkgs](https://github.com/git-pkgs) databases. The proxy adds the `artifacts` and `vulnerabilities` tables on top of the shared `packages` and `versions` tables, so both tools can point at the same database. **Tables:** ```sql packages ( - id, purl, ecosystem, name, namespace, latest_version, - license, description, homepage, repository_url, upstream_url, - metadata_fetched_at, created_at, updated_at + id INTEGER PRIMARY KEY, -- SERIAL on Postgres + purl TEXT NOT NULL, -- unique, e.g. pkg:npm/lodash + ecosystem TEXT NOT NULL, + name TEXT NOT NULL, + latest_version TEXT, + license TEXT, + description TEXT, + homepage TEXT, + repository_url TEXT, + registry_url TEXT, + supplier_name TEXT, + supplier_type TEXT, + source TEXT, + enriched_at DATETIME, + vulns_synced_at DATETIME, + created_at DATETIME, + updated_at DATETIME ) +-- indexes: purl (unique), (ecosystem, name) versions ( - id, purl, package_id, version, license, integrity, - published_at, yanked, metadata_fetched_at, created_at, updated_at + id INTEGER PRIMARY KEY, + purl TEXT NOT NULL, -- unique, e.g. pkg:npm/lodash@4.17.21 + package_purl TEXT NOT NULL, -- FK to packages.purl + license TEXT, + published_at DATETIME, + integrity TEXT, -- subresource integrity hash + yanked INTEGER DEFAULT 0, -- BOOLEAN on Postgres + source TEXT, + enriched_at DATETIME, + created_at DATETIME, + updated_at DATETIME ) +-- indexes: purl (unique), package_purl artifacts ( - id, version_id, filename, upstream_url, storage_path, - content_hash, size, content_type, fetched_at, - hit_count, last_accessed_at, created_at, updated_at + id INTEGER PRIMARY KEY, + version_purl TEXT NOT NULL, + filename TEXT NOT NULL, + upstream_url TEXT NOT NULL, + storage_path TEXT, -- null until cached + content_hash TEXT, -- SHA-256 + size INTEGER, -- BIGINT on Postgres + content_type TEXT, + fetched_at DATETIME, + hit_count INTEGER DEFAULT 0, -- BIGINT on Postgres + last_accessed_at DATETIME, + created_at DATETIME, + updated_at DATETIME +) +-- indexes: (version_purl, filename) unique, storage_path, last_accessed_at + +vulnerabilities ( + id INTEGER PRIMARY KEY, + vuln_id TEXT NOT NULL, -- e.g. CVE-2021-1234 + ecosystem TEXT NOT NULL, + package_name TEXT NOT NULL, + severity TEXT, + summary TEXT, + fixed_version TEXT, + cvss_score REAL, + "references" TEXT, -- JSON array + fetched_at DATETIME, + created_at DATETIME, + updated_at DATETIME ) +-- indexes: (vuln_id, ecosystem, package_name) unique, (ecosystem, package_name) ``` +On PostgreSQL, `INTEGER PRIMARY KEY` becomes `SERIAL`, `DATETIME` becomes `TIMESTAMP`, `INTEGER DEFAULT 0` booleans become `BOOLEAN DEFAULT FALSE`, and size/count columns use `BIGINT`. + +The `MigrateSchema()` function handles backward compatibility with older git-pkgs databases by adding missing columns via `ALTER TABLE` as needed. + **Key operations:** - `GetPackageByPURL()` - Look up package by PURL - `GetVersionByPURL()` - Look up version by PURL @@ -121,6 +174,7 @@ artifacts ( - `UpsertPackage/Version/Artifact()` - Insert or update records - `RecordArtifactHit()` - Increment hit counter, update access time - `GetLeastRecentlyUsedArtifacts()` - For cache eviction +- `SearchPackages()` - Full-text search across cached packages ### `internal/storage` @@ -201,12 +255,27 @@ HTTP protocol handlers for each registry type. ### `internal/server` -HTTP server setup. +HTTP server setup, web UI, and API handlers. - Creates and wires together all components -- Mounts handlers at appropriate paths -- Adds logging middleware -- Health and stats endpoints +- Mounts protocol handlers at ecosystem-specific paths +- Middleware: request ID, real IP, logging, panic recovery, active request tracking +- Web UI: dashboard, package browser, source browser, version comparison +- Templates are embedded in the binary via `//go:embed` +- Enrichment API for package metadata, vulnerability scanning, and outdated detection +- Health, stats, and Prometheus metrics endpoints + +### `internal/metrics` + +Prometheus metrics for cache performance, upstream latency, storage operations, and active requests. See the Monitoring section of the README for the full metric list. + +### `internal/cooldown` + +Version age filtering for supply chain attack mitigation. Configurable at global, ecosystem, and per-package levels. Supported by npm, PyPI, pub.dev, and Composer handlers. + +### `internal/enrichment` + +Package metadata enrichment. Fetches license, description, homepage, repository URL, and vulnerability data from upstream registries. Powers the `/api/` endpoints and the web UI's package detail pages. ### `internal/config`