Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 69 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,31 +24,33 @@ Currently works with npm, PyPI, pub.dev, and Composer, which all include publish

## Supported Registries

| Registry | Language/Platform | URL Resolution | Handler | Completed |
|----------|-------------------|:--------------:|:-------:|:---------:|
| npm | JavaScript | Yes | Yes | ✓ |
| Cargo | Rust | Yes | Yes | ✓ |
| RubyGems | Ruby | Yes | Yes | ✓ |
| Go proxy | Go | Yes | Yes | ✓ |
| Hex | Elixir | Yes | Yes | ✓ |
| pub.dev | Dart | Yes | Yes | ✓ |
| PyPI | Python | Yes | Yes | ✓ |
| Maven | Java | Yes | Yes | ✓ |
| NuGet | .NET | Yes | Yes | ✓ |
| Composer | PHP | Yes | Yes | ✓ |
| Conan | C/C++ | Yes | Yes | ✓ |
| Conda | Python/R | Yes | Yes | ✓ |
| CRAN | R | Yes | Yes | ✓ |
| Container | Docker/OCI | Yes | Yes | ✓ |
| Debian | Debian/Ubuntu | Yes | Yes | ✓ |
| RPM | RHEL/Fedora | Yes | Yes | ✓ |
| Alpine | Alpine Linux | No | No | ✗ |
| Arch | Arch Linux | No | No | ✗ |
| Chef | Chef | No | No | ✗ |
| Generic | Any | No | No | ✗ |
| Helm | Kubernetes | No | No | ✗ |
| Swift | Swift | No | No | ✗ |
| Vagrant | Vagrant | No | No | ✗ |
| Registry | Language/Platform | Cooldown | Completed |
|----------|-------------------|:--------:|:---------:|
| npm | JavaScript | Yes | ✓ |
| Cargo | Rust | | ✓ |
| RubyGems | Ruby | | ✓ |
| Go proxy | Go | | ✓ |
| Hex | Elixir | | ✓ |
| pub.dev | Dart | Yes | ✓ |
| PyPI | Python | Yes | ✓ |
| Maven | Java | | ✓ |
| NuGet | .NET | | ✓ |
| Composer | PHP | Yes | ✓ |
| Conan | C/C++ | | ✓ |
| Conda | Python/R | | ✓ |
| CRAN | R | | ✓ |
| Container | Docker/OCI | | ✓ |
| Debian | Debian/Ubuntu | | ✓ |
| RPM | RHEL/Fedora | | ✓ |
| Alpine | Alpine Linux | | ✗ |
| Arch | Arch Linux | | ✗ |
| Chef | Chef | | ✗ |
| Generic | Any | | ✗ |
| Helm | Kubernetes | | ✗ |
| Swift | Swift | | ✗ |
| Vagrant | Vagrant | | ✗ |

Cooldown requires publish timestamps in metadata. Registries without a "Yes" in the cooldown column either don't expose timestamps or haven't been wired up yet.

## Quick Start

Expand Down Expand Up @@ -465,9 +467,10 @@ Recently cached:

| Endpoint | Description |
|----------|-------------|
| `GET /` | Welcome message and endpoint list |
| `GET /` | Dashboard (web UI) |
| `GET /health` | Health check (returns "ok" if healthy) |
| `GET /stats` | Cache statistics (JSON) |
| `GET /metrics` | Prometheus metrics |
| `GET /npm/*` | npm registry protocol |
| `GET /cargo/*` | Cargo sparse index protocol |
| `GET /gem/*` | RubyGems protocol |
Expand Down Expand Up @@ -667,6 +670,46 @@ Response:
└─────────┘
```

## Web Interface

The proxy serves a web UI at the root URL. No separate frontend build is needed -- templates and assets are embedded in the binary.

- **Dashboard** (`/`) -- cache stats, popular packages, recently cached artifacts, and vulnerability overview.
- **Install guide** (`/install`) -- per-ecosystem configuration instructions, so you don't have to look them up here.
- **Package browser** (`/packages`) -- browse all cached packages with filtering by ecosystem and sorting by hits, size, name, or vulnerability count.
- **Search** (`/search?q=...`) -- search cached packages by name.
- **Package detail** (`/package/{ecosystem}/{name}`) -- metadata, license, vulnerabilities, and version list for a package. You can select two versions to compare.
- **Version detail** (`/package/{ecosystem}/{name}/{version}`) -- per-version metadata, integrity hash, artifact cache status, and hit counts.
- **Source browser** (`/package/{ecosystem}/{name}/{version}/browse`) -- browse files inside cached archives with syntax highlighting for text files and image previews.
- **Version diff** (`/package/{ecosystem}/{name}/compare/{v1}...{v2}`) -- side-by-side diff of two cached versions showing added, removed, and changed files.

## Monitoring

The proxy exposes Prometheus metrics at `GET /metrics`. All metric names are prefixed with `proxy_`.

| Metric | Type | Labels | Description |
|--------|------|--------|-------------|
| `proxy_cache_hits_total` | counter | `ecosystem` | Cache hits |
| `proxy_cache_misses_total` | counter | `ecosystem` | Cache misses |
| `proxy_cache_size_bytes` | gauge | | Total size of cached artifacts |
| `proxy_cached_artifacts_total` | gauge | | Number of cached artifacts |
| `proxy_upstream_fetch_duration_seconds` | histogram | `ecosystem` | Time spent fetching from upstream |
| `proxy_upstream_errors_total` | counter | `ecosystem`, `error_type` | Upstream fetch failures |
| `proxy_storage_operation_duration_seconds` | histogram | `operation` | Storage read/write latency |
| `proxy_storage_errors_total` | counter | `operation` | Storage read/write failures |
| `proxy_active_requests` | gauge | | In-flight requests |

Cache size and artifact count are refreshed every 60 seconds. The remaining metrics update on each request.

Scrape config for Prometheus:

```yaml
scrape_configs:
- job_name: git-pkgs-proxy
static_configs:
- targets: ["localhost:8080"]
```

## Production Deployment

### Systemd Service
Expand Down
137 changes: 103 additions & 34 deletions docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,29 +7,24 @@ This document describes the internal architecture of the git-pkgs proxy.
The proxy is a caching HTTP server that sits between package manager clients and upstream registries. It intercepts requests, checks a local cache, and either serves cached content or fetches from upstream.

```
┌─────────────────────────────────────────────────────────────────┐
│ HTTP Server │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Router (ServeMux) │ │
│ │ /npm/* -> NPMHandler │ │
│ │ /cargo/* -> CargoHandler │ │
│ │ /health -> healthHandler │ │
│ │ /stats -> statsHandler │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Proxy │ │
│ │ - GetOrFetchArtifact() │ │
│ │ - Coordinates DB, Storage, Fetcher │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
┌──────────────────────────────────────────────────────────────────┐
│ HTTP Server │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Router (Chi) │ │
│ │ /npm/* -> NPMHandler /health -> healthHandler │ │
│ │ /cargo/* -> CargoHandler /stats -> statsHandler │ │
│ │ /gem/* -> GemHandler /metrics -> prometheus │ │
│ │ ...16 ecosystems /api/* -> APIHandler │ │
│ │ / -> Web UI │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌───────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Database │ │ Storage │ │ Upstream │ │
│ │ (SQLite) │ │ (Filesystem)│ │ (Fetcher) │ │
│ │ Database │ │ Storage │ │ Upstream │ │
│ │ SQLite or │ │ Filesystem │ │ Registries │ │
│ │ Postgres │ │ or S3 │ │ (Fetcher) │ │
│ └───────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────────┘
└─────────────────────────────────────────────────────────────────
```

## Request Flow
Expand Down Expand Up @@ -91,36 +86,95 @@ Metadata is not cached - always fetched fresh. This ensures clients see new vers

### `internal/database`

SQLite database for cache metadata. Uses `modernc.org/sqlite` (pure Go, no CGO).
SQLite or PostgreSQL database for cache metadata. SQLite uses `modernc.org/sqlite` (pure Go, no CGO). PostgreSQL uses `lib/pq`.

The schema is compatible with [git-pkgs](https://github.com/git-pkgs) databases. The proxy adds the `artifacts` and `vulnerabilities` tables on top of the shared `packages` and `versions` tables, so both tools can point at the same database.

**Tables:**

```sql
packages (
id, purl, ecosystem, name, namespace, latest_version,
license, description, homepage, repository_url, upstream_url,
metadata_fetched_at, created_at, updated_at
id INTEGER PRIMARY KEY, -- SERIAL on Postgres
purl TEXT NOT NULL, -- unique, e.g. pkg:npm/lodash
ecosystem TEXT NOT NULL,
name TEXT NOT NULL,
latest_version TEXT,
license TEXT,
description TEXT,
homepage TEXT,
repository_url TEXT,
registry_url TEXT,
supplier_name TEXT,
supplier_type TEXT,
source TEXT,
enriched_at DATETIME,
vulns_synced_at DATETIME,
created_at DATETIME,
updated_at DATETIME
)
-- indexes: purl (unique), (ecosystem, name)

versions (
id, purl, package_id, version, license, integrity,
published_at, yanked, metadata_fetched_at, created_at, updated_at
id INTEGER PRIMARY KEY,
purl TEXT NOT NULL, -- unique, e.g. pkg:npm/lodash@4.17.21
package_purl TEXT NOT NULL, -- FK to packages.purl
license TEXT,
published_at DATETIME,
integrity TEXT, -- subresource integrity hash
yanked INTEGER DEFAULT 0, -- BOOLEAN on Postgres
source TEXT,
enriched_at DATETIME,
created_at DATETIME,
updated_at DATETIME
)
-- indexes: purl (unique), package_purl

artifacts (
id, version_id, filename, upstream_url, storage_path,
content_hash, size, content_type, fetched_at,
hit_count, last_accessed_at, created_at, updated_at
id INTEGER PRIMARY KEY,
version_purl TEXT NOT NULL,
filename TEXT NOT NULL,
upstream_url TEXT NOT NULL,
storage_path TEXT, -- null until cached
content_hash TEXT, -- SHA-256
size INTEGER, -- BIGINT on Postgres
content_type TEXT,
fetched_at DATETIME,
hit_count INTEGER DEFAULT 0, -- BIGINT on Postgres
last_accessed_at DATETIME,
created_at DATETIME,
updated_at DATETIME
)
-- indexes: (version_purl, filename) unique, storage_path, last_accessed_at

vulnerabilities (
id INTEGER PRIMARY KEY,
vuln_id TEXT NOT NULL, -- e.g. CVE-2021-1234
ecosystem TEXT NOT NULL,
package_name TEXT NOT NULL,
severity TEXT,
summary TEXT,
fixed_version TEXT,
cvss_score REAL,
"references" TEXT, -- JSON array
fetched_at DATETIME,
created_at DATETIME,
updated_at DATETIME
)
-- indexes: (vuln_id, ecosystem, package_name) unique, (ecosystem, package_name)
```

On PostgreSQL, `INTEGER PRIMARY KEY` becomes `SERIAL`, `DATETIME` becomes `TIMESTAMP`, `INTEGER DEFAULT 0` booleans become `BOOLEAN DEFAULT FALSE`, and size/count columns use `BIGINT`.

The `MigrateSchema()` function handles backward compatibility with older git-pkgs databases by adding missing columns via `ALTER TABLE` as needed.

**Key operations:**
- `GetPackageByPURL()` - Look up package by PURL
- `GetVersionByPURL()` - Look up version by PURL
- `GetArtifact()` - Look up artifact by version + filename
- `UpsertPackage/Version/Artifact()` - Insert or update records
- `RecordArtifactHit()` - Increment hit counter, update access time
- `GetLeastRecentlyUsedArtifacts()` - For cache eviction
- `SearchPackages()` - Full-text search across cached packages

### `internal/storage`

Expand Down Expand Up @@ -201,12 +255,27 @@ HTTP protocol handlers for each registry type.

### `internal/server`

HTTP server setup.
HTTP server setup, web UI, and API handlers.

- Creates and wires together all components
- Mounts handlers at appropriate paths
- Adds logging middleware
- Health and stats endpoints
- Mounts protocol handlers at ecosystem-specific paths
- Middleware: request ID, real IP, logging, panic recovery, active request tracking
- Web UI: dashboard, package browser, source browser, version comparison
- Templates are embedded in the binary via `//go:embed`
- Enrichment API for package metadata, vulnerability scanning, and outdated detection
- Health, stats, and Prometheus metrics endpoints

### `internal/metrics`

Prometheus metrics for cache performance, upstream latency, storage operations, and active requests. See the Monitoring section of the README for the full metric list.

### `internal/cooldown`

Version age filtering for supply chain attack mitigation. Configurable at global, ecosystem, and per-package levels. Supported by npm, PyPI, pub.dev, and Composer handlers.

### `internal/enrichment`

Package metadata enrichment. Fetches license, description, homepage, repository URL, and vulnerability data from upstream registries. Powers the `/api/` endpoints and the web UI's package detail pages.

### `internal/config`

Expand Down