Skip to content

Commit c46535b

Browse files
authored
Merge pull request #27 from git-pkgs/docs-improvements
Document web UI, monitoring, and database schema
2 parents 9b321ea + ade6438 commit c46535b

2 files changed

Lines changed: 172 additions & 60 deletions

File tree

README.md

Lines changed: 69 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -24,31 +24,33 @@ Currently works with npm, PyPI, pub.dev, and Composer, which all include publish
2424

2525
## Supported Registries
2626

27-
| Registry | Language/Platform | URL Resolution | Handler | Completed |
28-
|----------|-------------------|:--------------:|:-------:|:---------:|
29-
| npm | JavaScript | Yes | Yes | ✓ |
30-
| Cargo | Rust | Yes | Yes | ✓ |
31-
| RubyGems | Ruby | Yes | Yes | ✓ |
32-
| Go proxy | Go | Yes | Yes | ✓ |
33-
| Hex | Elixir | Yes | Yes | ✓ |
34-
| pub.dev | Dart | Yes | Yes | ✓ |
35-
| PyPI | Python | Yes | Yes | ✓ |
36-
| Maven | Java | Yes | Yes | ✓ |
37-
| NuGet | .NET | Yes | Yes | ✓ |
38-
| Composer | PHP | Yes | Yes | ✓ |
39-
| Conan | C/C++ | Yes | Yes | ✓ |
40-
| Conda | Python/R | Yes | Yes | ✓ |
41-
| CRAN | R | Yes | Yes | ✓ |
42-
| Container | Docker/OCI | Yes | Yes | ✓ |
43-
| Debian | Debian/Ubuntu | Yes | Yes | ✓ |
44-
| RPM | RHEL/Fedora | Yes | Yes | ✓ |
45-
| Alpine | Alpine Linux | No | No | ✗ |
46-
| Arch | Arch Linux | No | No | ✗ |
47-
| Chef | Chef | No | No | ✗ |
48-
| Generic | Any | No | No | ✗ |
49-
| Helm | Kubernetes | No | No | ✗ |
50-
| Swift | Swift | No | No | ✗ |
51-
| Vagrant | Vagrant | No | No | ✗ |
27+
| Registry | Language/Platform | Cooldown | Completed |
28+
|----------|-------------------|:--------:|:---------:|
29+
| npm | JavaScript | Yes | ✓ |
30+
| Cargo | Rust | | ✓ |
31+
| RubyGems | Ruby | | ✓ |
32+
| Go proxy | Go | | ✓ |
33+
| Hex | Elixir | | ✓ |
34+
| pub.dev | Dart | Yes | ✓ |
35+
| PyPI | Python | Yes | ✓ |
36+
| Maven | Java | | ✓ |
37+
| NuGet | .NET | | ✓ |
38+
| Composer | PHP | Yes | ✓ |
39+
| Conan | C/C++ | | ✓ |
40+
| Conda | Python/R | | ✓ |
41+
| CRAN | R | | ✓ |
42+
| Container | Docker/OCI | | ✓ |
43+
| Debian | Debian/Ubuntu | | ✓ |
44+
| RPM | RHEL/Fedora | | ✓ |
45+
| Alpine | Alpine Linux | | ✗ |
46+
| Arch | Arch Linux | | ✗ |
47+
| Chef | Chef | | ✗ |
48+
| Generic | Any | | ✗ |
49+
| Helm | Kubernetes | | ✗ |
50+
| Swift | Swift | | ✗ |
51+
| Vagrant | Vagrant | | ✗ |
52+
53+
Cooldown requires publish timestamps in metadata. Registries without a "Yes" in the cooldown column either don't expose timestamps or haven't been wired up yet.
5254

5355
## Quick Start
5456

@@ -465,9 +467,10 @@ Recently cached:
465467

466468
| Endpoint | Description |
467469
|----------|-------------|
468-
| `GET /` | Welcome message and endpoint list |
470+
| `GET /` | Dashboard (web UI) |
469471
| `GET /health` | Health check (returns "ok" if healthy) |
470472
| `GET /stats` | Cache statistics (JSON) |
473+
| `GET /metrics` | Prometheus metrics |
471474
| `GET /npm/*` | npm registry protocol |
472475
| `GET /cargo/*` | Cargo sparse index protocol |
473476
| `GET /gem/*` | RubyGems protocol |
@@ -667,6 +670,46 @@ Response:
667670
└─────────┘
668671
```
669672

673+
## Web Interface
674+
675+
The proxy serves a web UI at the root URL. No separate frontend build is needed -- templates and assets are embedded in the binary.
676+
677+
- **Dashboard** (`/`) -- cache stats, popular packages, recently cached artifacts, and vulnerability overview.
678+
- **Install guide** (`/install`) -- per-ecosystem configuration instructions, so you don't have to look them up here.
679+
- **Package browser** (`/packages`) -- browse all cached packages with filtering by ecosystem and sorting by hits, size, name, or vulnerability count.
680+
- **Search** (`/search?q=...`) -- search cached packages by name.
681+
- **Package detail** (`/package/{ecosystem}/{name}`) -- metadata, license, vulnerabilities, and version list for a package. You can select two versions to compare.
682+
- **Version detail** (`/package/{ecosystem}/{name}/{version}`) -- per-version metadata, integrity hash, artifact cache status, and hit counts.
683+
- **Source browser** (`/package/{ecosystem}/{name}/{version}/browse`) -- browse files inside cached archives with syntax highlighting for text files and image previews.
684+
- **Version diff** (`/package/{ecosystem}/{name}/compare/{v1}...{v2}`) -- side-by-side diff of two cached versions showing added, removed, and changed files.
685+
686+
## Monitoring
687+
688+
The proxy exposes Prometheus metrics at `GET /metrics`. All metric names are prefixed with `proxy_`.
689+
690+
| Metric | Type | Labels | Description |
691+
|--------|------|--------|-------------|
692+
| `proxy_cache_hits_total` | counter | `ecosystem` | Cache hits |
693+
| `proxy_cache_misses_total` | counter | `ecosystem` | Cache misses |
694+
| `proxy_cache_size_bytes` | gauge | | Total size of cached artifacts |
695+
| `proxy_cached_artifacts_total` | gauge | | Number of cached artifacts |
696+
| `proxy_upstream_fetch_duration_seconds` | histogram | `ecosystem` | Time spent fetching from upstream |
697+
| `proxy_upstream_errors_total` | counter | `ecosystem`, `error_type` | Upstream fetch failures |
698+
| `proxy_storage_operation_duration_seconds` | histogram | `operation` | Storage read/write latency |
699+
| `proxy_storage_errors_total` | counter | `operation` | Storage read/write failures |
700+
| `proxy_active_requests` | gauge | | In-flight requests |
701+
702+
Cache size and artifact count are refreshed every 60 seconds. The remaining metrics update on each request.
703+
704+
Scrape config for Prometheus:
705+
706+
```yaml
707+
scrape_configs:
708+
- job_name: git-pkgs-proxy
709+
static_configs:
710+
- targets: ["localhost:8080"]
711+
```
712+
670713
## Production Deployment
671714
672715
### Systemd Service

docs/architecture.md

Lines changed: 103 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -7,29 +7,24 @@ This document describes the internal architecture of the git-pkgs proxy.
77
The proxy is a caching HTTP server that sits between package manager clients and upstream registries. It intercepts requests, checks a local cache, and either serves cached content or fetches from upstream.
88

99
```
10-
┌─────────────────────────────────────────────────────────────────┐
11-
│ HTTP Server │
12-
│ ┌─────────────────────────────────────────────────────────┐ │
13-
│ │ Router (ServeMux) │ │
14-
│ │ /npm/* -> NPMHandler │ │
15-
│ │ /cargo/* -> CargoHandler │ │
16-
│ │ /health -> healthHandler │ │
17-
│ │ /stats -> statsHandler │ │
18-
│ └─────────────────────────────────────────────────────────┘ │
19-
│ │ │
20-
│ ▼ │
21-
│ ┌─────────────────────────────────────────────────────────┐ │
22-
│ │ Proxy │ │
23-
│ │ - GetOrFetchArtifact() │ │
24-
│ │ - Coordinates DB, Storage, Fetcher │ │
25-
│ └─────────────────────────────────────────────────────────┘ │
26-
│ │ │ │ │
27-
│ ▼ ▼ ▼ │
10+
┌──────────────────────────────────────────────────────────────────┐
11+
│ HTTP Server │
12+
│ ┌──────────────────────────────────────────────────────────┐ │
13+
│ │ Router (Chi) │ │
14+
│ │ /npm/* -> NPMHandler /health -> healthHandler │ │
15+
│ │ /cargo/* -> CargoHandler /stats -> statsHandler │ │
16+
│ │ /gem/* -> GemHandler /metrics -> prometheus │ │
17+
│ │ ...16 ecosystems /api/* -> APIHandler │ │
18+
│ │ / -> Web UI │ │
19+
│ └──────────────────────────────────────────────────────────┘ │
20+
│ │ │ │ │
21+
│ ▼ ▼ ▼ │
2822
│ ┌───────────┐ ┌─────────────┐ ┌─────────────┐ │
29-
│ │ Database │ │ Storage │ │ Upstream │ │
30-
│ │ (SQLite) │ │ (Filesystem)│ │ (Fetcher) │ │
23+
│ │ Database │ │ Storage │ │ Upstream │ │
24+
│ │ SQLite or │ │ Filesystem │ │ Registries │ │
25+
│ │ Postgres │ │ or S3 │ │ (Fetcher) │ │
3126
│ └───────────┘ └─────────────┘ └─────────────┘ │
32-
└─────────────────────────────────────────────────────────────────┘
27+
└─────────────────────────────────────────────────────────────────
3328
```
3429

3530
## Request Flow
@@ -91,36 +86,95 @@ Metadata is not cached - always fetched fresh. This ensures clients see new vers
9186

9287
### `internal/database`
9388

94-
SQLite database for cache metadata. Uses `modernc.org/sqlite` (pure Go, no CGO).
89+
SQLite or PostgreSQL database for cache metadata. SQLite uses `modernc.org/sqlite` (pure Go, no CGO). PostgreSQL uses `lib/pq`.
90+
91+
The schema is compatible with [git-pkgs](https://github.com/git-pkgs) databases. The proxy adds the `artifacts` and `vulnerabilities` tables on top of the shared `packages` and `versions` tables, so both tools can point at the same database.
9592

9693
**Tables:**
9794

9895
```sql
9996
packages (
100-
id, purl, ecosystem, name, namespace, latest_version,
101-
license, description, homepage, repository_url, upstream_url,
102-
metadata_fetched_at, created_at, updated_at
97+
id INTEGER PRIMARY KEY, -- SERIAL on Postgres
98+
purl TEXT NOT NULL, -- unique, e.g. pkg:npm/lodash
99+
ecosystem TEXT NOT NULL,
100+
name TEXT NOT NULL,
101+
latest_version TEXT,
102+
license TEXT,
103+
description TEXT,
104+
homepage TEXT,
105+
repository_url TEXT,
106+
registry_url TEXT,
107+
supplier_name TEXT,
108+
supplier_type TEXT,
109+
source TEXT,
110+
enriched_at DATETIME,
111+
vulns_synced_at DATETIME,
112+
created_at DATETIME,
113+
updated_at DATETIME
103114
)
115+
-- indexes: purl (unique), (ecosystem, name)
104116

105117
versions (
106-
id, purl, package_id, version, license, integrity,
107-
published_at, yanked, metadata_fetched_at, created_at, updated_at
118+
id INTEGER PRIMARY KEY,
119+
purl TEXT NOT NULL, -- unique, e.g. pkg:npm/lodash@4.17.21
120+
package_purl TEXT NOT NULL, -- FK to packages.purl
121+
license TEXT,
122+
published_at DATETIME,
123+
integrity TEXT, -- subresource integrity hash
124+
yanked INTEGER DEFAULT 0, -- BOOLEAN on Postgres
125+
source TEXT,
126+
enriched_at DATETIME,
127+
created_at DATETIME,
128+
updated_at DATETIME
108129
)
130+
-- indexes: purl (unique), package_purl
109131

110132
artifacts (
111-
id, version_id, filename, upstream_url, storage_path,
112-
content_hash, size, content_type, fetched_at,
113-
hit_count, last_accessed_at, created_at, updated_at
133+
id INTEGER PRIMARY KEY,
134+
version_purl TEXT NOT NULL,
135+
filename TEXT NOT NULL,
136+
upstream_url TEXT NOT NULL,
137+
storage_path TEXT, -- null until cached
138+
content_hash TEXT, -- SHA-256
139+
size INTEGER, -- BIGINT on Postgres
140+
content_type TEXT,
141+
fetched_at DATETIME,
142+
hit_count INTEGER DEFAULT 0, -- BIGINT on Postgres
143+
last_accessed_at DATETIME,
144+
created_at DATETIME,
145+
updated_at DATETIME
146+
)
147+
-- indexes: (version_purl, filename) unique, storage_path, last_accessed_at
148+
149+
vulnerabilities (
150+
id INTEGER PRIMARY KEY,
151+
vuln_id TEXT NOT NULL, -- e.g. CVE-2021-1234
152+
ecosystem TEXT NOT NULL,
153+
package_name TEXT NOT NULL,
154+
severity TEXT,
155+
summary TEXT,
156+
fixed_version TEXT,
157+
cvss_score REAL,
158+
"references" TEXT, -- JSON array
159+
fetched_at DATETIME,
160+
created_at DATETIME,
161+
updated_at DATETIME
114162
)
163+
-- indexes: (vuln_id, ecosystem, package_name) unique, (ecosystem, package_name)
115164
```
116165

166+
On PostgreSQL, `INTEGER PRIMARY KEY` becomes `SERIAL`, `DATETIME` becomes `TIMESTAMP`, `INTEGER DEFAULT 0` booleans become `BOOLEAN DEFAULT FALSE`, and size/count columns use `BIGINT`.
167+
168+
The `MigrateSchema()` function handles backward compatibility with older git-pkgs databases by adding missing columns via `ALTER TABLE` as needed.
169+
117170
**Key operations:**
118171
- `GetPackageByPURL()` - Look up package by PURL
119172
- `GetVersionByPURL()` - Look up version by PURL
120173
- `GetArtifact()` - Look up artifact by version + filename
121174
- `UpsertPackage/Version/Artifact()` - Insert or update records
122175
- `RecordArtifactHit()` - Increment hit counter, update access time
123176
- `GetLeastRecentlyUsedArtifacts()` - For cache eviction
177+
- `SearchPackages()` - Full-text search across cached packages
124178

125179
### `internal/storage`
126180

@@ -201,12 +255,27 @@ HTTP protocol handlers for each registry type.
201255

202256
### `internal/server`
203257

204-
HTTP server setup.
258+
HTTP server setup, web UI, and API handlers.
205259

206260
- Creates and wires together all components
207-
- Mounts handlers at appropriate paths
208-
- Adds logging middleware
209-
- Health and stats endpoints
261+
- Mounts protocol handlers at ecosystem-specific paths
262+
- Middleware: request ID, real IP, logging, panic recovery, active request tracking
263+
- Web UI: dashboard, package browser, source browser, version comparison
264+
- Templates are embedded in the binary via `//go:embed`
265+
- Enrichment API for package metadata, vulnerability scanning, and outdated detection
266+
- Health, stats, and Prometheus metrics endpoints
267+
268+
### `internal/metrics`
269+
270+
Prometheus metrics for cache performance, upstream latency, storage operations, and active requests. See the Monitoring section of the README for the full metric list.
271+
272+
### `internal/cooldown`
273+
274+
Version age filtering for supply chain attack mitigation. Configurable at global, ecosystem, and per-package levels. Supported by npm, PyPI, pub.dev, and Composer handlers.
275+
276+
### `internal/enrichment`
277+
278+
Package metadata enrichment. Fetches license, description, homepage, repository URL, and vulnerability data from upstream registries. Powers the `/api/` endpoints and the web UI's package detail pages.
210279

211280
### `internal/config`
212281

0 commit comments

Comments
 (0)