A modern Python 3.13+ microservices platform for transforming the complete Discogs music database into powerful, queryable knowledge graphs and analytics engines.
π Quick Start | π Documentation | π― Features | π¬ Community
Discogsography transforms monthly Discogs data dumps (50GB+ compressed XML) into:
- π Neo4j Graph Database: Navigate complex music industry relationships
- π PostgreSQL Database: High-performance queries and full-text search
- π Interactive Explorer: Graph visualisation, trends, and path discovery
- π Real-time Dashboard: Monitor system health and processing metrics
Perfect for music researchers, data scientists, developers, and music enthusiasts who want to explore the world's largest music database.
| Service | Purpose | Key Technologies |
|---|---|---|
| π API | User accounts and JWT authentication | FastAPI, psycopg3, redis, Discogs OAuth 1.0 |
| ποΈ Curator | Background collection & wantlist sync | FastAPI, psycopg3, neo4j-driver |
| π Dashboard | Real-time system monitoring | FastAPI, WebSocket, reactive UI |
| π Explore | Serves graph exploration frontend (static files) | FastAPI, D3.js, Plotly.js |
| β‘ Extractor | High-performance Rust-based extractor | tokio, quick-xml, lapin |
| π Graphinator | Builds Neo4j knowledge graphs | neo4j-driver, graph algorithms |
| π§ Schema-Init | One-shot database schema initializer | neo4j-driver, psycopg3 |
| π Tableinator | Creates PostgreSQL analytics tables | psycopg3, JSONB, full-text search |
graph TD
S3[("π Discogs S3<br/>Monthly Data Dumps<br/>~50GB XML")]
SCHEMA[["π§ Schema-Init<br/>One-shot DDL<br/>Initialiser"]]
EXT[["β‘ Extractor<br/>High-Performance<br/>XML Processing"]]
RMQ{{"π° RabbitMQ 4.x<br/>Message Broker<br/>8 Queues + DLQs"}}
NEO4J[("π Neo4j 2026<br/>Graph Database<br/>Relationships")]
PG[("π PostgreSQL 18<br/>Analytics DB<br/>Full-text Search")]
REDIS[("π΄ Redis<br/>Cache Layer<br/>Query Cache")]
GRAPH[["π Graphinator<br/>Graph Builder"]]
TABLE[["π Tableinator<br/>Table Builder"]]
DASH[["π Dashboard<br/>Real-time Monitor<br/>WebSocket"]]
EXPLORE[["π Explore<br/>Graph Explorer<br/>Trends & Paths"]]
API[["π API<br/>User Auth<br/>JWT & OAuth"]]
CURATOR[["ποΈ Curator<br/>Collection<br/>Sync"]]
SCHEMA -->|0. Create Indexes & Constraints| NEO4J
SCHEMA -->|0. Create Tables & Indexes| PG
S3 -->|1. Download & Parse| EXT
EXT -->|2. Publish Messages| RMQ
RMQ -->|3a. Artists/Labels/Releases/Masters| GRAPH
RMQ -->|3b. Artists/Labels/Releases/Masters| TABLE
GRAPH -->|4a. Build Graph| NEO4J
TABLE -->|4b. Store Data| PG
EXPLORE -.->|Health Check| NEO4J
API -.->|User Accounts| PG
API -.->|Graph Queries| NEO4J
API -.->|OAuth State| REDIS
CURATOR -.->|Sync Collections| NEO4J
CURATOR -.->|Sync History| PG
DASH -.->|Monitor| EXT
DASH -.->|Monitor| GRAPH
DASH -.->|Monitor| TABLE
DASH -.->|Monitor| EXPLORE
DASH -.->|Cache| REDIS
DASH -.->|Stats| RMQ
DASH -.->|Stats| NEO4J
DASH -.->|Stats| PG
style S3 fill:#e1f5fe,stroke:#01579b,stroke-width:2px
style SCHEMA fill:#f9fbe7,stroke:#827717,stroke-width:2px
style EXT fill:#ffccbc,stroke:#d84315,stroke-width:2px
style RMQ fill:#fff3e0,stroke:#e65100,stroke-width:2px
style NEO4J fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
style PG fill:#e8f5e9,stroke:#1b5e20,stroke-width:2px
style REDIS fill:#ffebee,stroke:#b71c1c,stroke-width:2px
style GRAPH fill:#e0f2f1,stroke:#004d40,stroke-width:2px
style TABLE fill:#fce4ec,stroke:#880e4f,stroke-width:2px
style DASH fill:#fce4ec,stroke:#880e4f,stroke-width:2px
style EXPLORE fill:#e8eaf6,stroke:#283593,stroke-width:2px
style API fill:#e3f2fd,stroke:#0d47a1,stroke-width:2px
style CURATOR fill:#fff8e1,stroke:#f57f17,stroke-width:2px
- β‘ High-Speed Processing: 5,000β10,000 records/second XML parsing with Rust-based extractor
- π Smart Deduplication: SHA256 hash-based change detection prevents reprocessing
- π Handles Big Data: Processes 15M+ releases, 2M+ artists across ~50GB compressed XML
- π Auto-Recovery: Automatic retries with exponential backoff and dead letter queues
- π Container Security: Non-root users, read-only filesystems, dropped capabilities
- π Type Safety: Full type hints with strict mypy validation and Bandit security scanning
- β Comprehensive Testing: Unit, integration, and E2E tests with Playwright
# Clone and start all services
git clone https://github.com/SimplicityGuy/discogsography.git
cd discogsography
docker-compose up -d
# Access the dashboard
open http://localhost:8003| Service | URL | Default Credentials |
|---|---|---|
| π API | http://localhost:8004 | Register via /api/auth/register |
| π Dashboard | http://localhost:8003 | None |
| π Neo4j | http://localhost:7474 | neo4j / discogsography |
| π PostgreSQL | localhost:5433 |
discogsography / discogsography |
| π° RabbitMQ | http://localhost:15672 | discogsography / discogsography |
See the Quick Start Guide for prerequisites, local development setup, and environment configuration.
| Document | Purpose |
|---|---|
| Quick Start Guide | β‘ Get Discogsography running in minutes |
| Configuration Guide | βοΈ Complete environment variable and settings reference |
| Architecture Overview | ποΈ System architecture, components, data flow, and scale |
| CLAUDE.md | π€ Claude Code integration guide & development standards |
| Document | Purpose |
|---|---|
| Usage Examples | π‘ Neo4j Cypher and PostgreSQL query examples |
| Database Schema | ποΈ Complete Neo4j graph model and PostgreSQL schema |
| Monitoring Guide | π Real-time dashboard, metrics, and debug utilities |
| Document | Purpose |
|---|---|
| Development Guide | π» Project structure, tooling, and developer workflow |
| Testing Guide | π§ͺ Unit, integration, and E2E testing with Playwright |
| Logging Guide | π Structured logging standards and emoji conventions |
| Contributing Guide | π€ How to contribute: process, standards, and PR flow |
| Python Version Management | π Managing Python 3.13+ across the project |
| Document | Purpose |
|---|---|
| Troubleshooting Guide | π§ Common issues, solutions, and debugging steps |
| Maintenance Guide | π Package upgrades, dependency management |
| Performance Guide | β‘ Database tuning, hardware specs, optimization |
| Performance Benchmarks | π Processing rates and tuning results |
| Database Resilience | πΎ Database connection patterns & error handling |
| Document | Purpose |
|---|---|
| Dockerfile Standards | π Best practices for writing Dockerfiles |
| Docker Security | π Container hardening & security practices |
| GitHub Actions Guide | π CI/CD workflows, automation & best practices |
| Task Automation | βοΈ Complete just and uv run task command reference |
| Monorepo Guide | π¦ Managing Python monorepo with shared dependencies |
| Document | Purpose |
|---|---|
| State Marker System | π Extraction progress tracking & safe restart system |
| State Marker Periodic Updates | πΎ Periodic state saves and crash recovery |
| Consumer Cancellation | π File completion and consumer lifecycle management |
| File Completion Tracking | π Intelligent completion tracking and stall detection |
| Neo4j Indexing | π Advanced Neo4j indexing strategies |
| Platform Targeting | π― Cross-platform compatibility guidelines |
| Emoji Guide | π Standardized emoji usage across the project |
| Recent Improvements | π Latest platform enhancements and changelog |
- π Bug Reports: GitHub Issues
- π‘ Feature Requests: GitHub Discussions
- π¬ Questions: Discussions Q&A
- π Full Documentation: docs/README.md
This project is licensed under the MIT License β see the LICENSE file for details.
- π΅ Discogs for providing the monthly data dumps
- π uv for blazing-fast package management
- π₯ Ruff for lightning-fast linting
- π The Python community for excellent libraries and tools