A local-first Tauri 2.0 desktop app for evaluating Ollama models across three modes: Arena (model vs model debates with Elo ratings), Benchmark (custom test suites with manual + auto-judge scoring), and Sparring Ring (structured human vs AI debates with scorecards). All modes feed a unified leaderboard backed by SQLite. macOS-only, dark theme, arena/colosseum aesthetic.
- Runtime: Tauri 2.x (Rust backend + webview frontend)
- Frontend: React 19 + TypeScript 5.x strict mode
- Build: Vite 6.x with
@tauri-apps/vite-plugin - Styling: Tailwind CSS 4.x (dark theme, gold/amber accents)
- State: Zustand 5.x
- Routing: React Router 7.x
- Charts: Recharts 2.x
- Database: SQLite via
rusqlite0.31+ (bundled, WAL mode) - HTTP:
reqwest0.12+ (async streaming) - Async:
tokio1.x - System info:
sysinfo0.31+ - LLM: Ollama REST API (localhost:11434)
React frontend communicates with Rust backend via Tauri IPC (invoke for commands, listen for streaming events). Rust backend owns all Ollama communication, SQLite access, and Elo calculations. Frontend is purely presentational + state management.
Key modules:
src-tauri/src/db.rs— SQLite connection, migrations, schema (13 tables), seed datasrc-tauri/src/ollama.rs— Ollama REST client with streaming (reads configurable URL from settings)src-tauri/src/lib.rs— All Tauri commands, Model/Setting structs, settings key whitelistsrc-tauri/src/debate.rs— Arena (3 formats) + Sparring debate engine, vote + Elo, scorecardssrc-tauri/src/benchmark.rs— CRUD, runner, auto-judge, blind comparison, hardware metrics, import/exportsrc-tauri/src/elo.rs— Elo rating calculations (67 tests)src-tauri/src/prompts.rs— System prompt templates (arena, formal, socratic, sparring, scorecard judge)
- TypeScript strict mode. No
anytypes. - React: Functional components with hooks only. No class components.
- Rust:
clippyclean.cargo fmton save. - File naming:
snake_case.rsfor Rust,PascalCase.tsxfor React components,camelCase.tsfor utilities - Git commits: conventional commits (
feat:,fix:,refactor:,chore:) - All Tauri commands return
Result<T, String>— handle errors in Rust, display in frontend - Database writes wrapped in explicit transactions
- No unwrap() in production Rust code — use ? operator or proper error handling
v1.0.0 — Feature Complete (all phases done, audit remediation applied)
- Phase 0: Foundation — Tauri 2.0 scaffold, SQLite (13 tables, WAL), Ollama REST client, Elo module
- Phase 1: Arena Mode — Debate engine (freestyle/formal/socratic), vote + Elo, leaderboard, history
- Phase 2: Benchmark — CRUD suites/prompts, runner with TTFT/TPS metrics, manual + auto-judge scoring, blind comparison, hardware metrics, import/export
- Phase 3: Sparring Ring — Human vs AI debates, 3 difficulty levels, 4-phase structure, scorecards, user Elo
- Phase 4: Polish — 3 debate formats, topic suggestions, settings page, blind test, animations, skeleton loading, export (Markdown/CSV/JSON)
- Audit — Security hardening (configurable Ollama URL, query limit caps, settings key whitelist), accessibility (ARIA attributes), error handling, 67 Rust tests
| Decision | Choice | Rationale |
|---|---|---|
| Concurrent streaming | Concurrent with auto sequential fallback when models > 40B combined | User wants dramatic visual. Fallback prevents OOM. |
| Database access | rusqlite directly, not tauri-plugin-sql | More control over WAL mode, migrations, concurrent access |
| Elo parameters | Start 1500, K=40→32→24 based on game count | Standard chess Elo with decay to stabilize ratings |
| Benchmark scoring | 1-5 manual, 1-10 auto-judge normalized | Fast manual scoring, more granular auto-judge |
| App modes | Arena → Benchmark → Sparring (build order) | Arena builds all shared infra, others plug in |
| DB location | ~/.model-colosseum/colosseum.db | Standard macOS app data location |
| Ollama streaming | NDJSON line-by-line parsing, not SSE | That's what Ollama returns |
- Do not scaffold the entire project in one session — follow the phased plan strictly
- Do not use Tauri v1 APIs or import paths — this is Tauri 2.x (
@tauri-apps/apiv2) - Do not use
tauri-plugin-sql— we userusqlitedirectly - Do not use
unwrap()in Rust production code — use?or proper error handling - Do not make any network calls except to localhost Ollama (no telemetry, no cloud)
- Do not use class components in React — hooks only
- Do not store any data outside
~/.model-colosseum/— single source of truth - Do not assume Ollama is running — always health check first and handle absence gracefully