Model Colosseum

Project Overview

A local-first Tauri 2.0 desktop app for evaluating Ollama models across three modes: Arena (model vs model debates with Elo ratings), Benchmark (custom test suites with manual + auto-judge scoring), and Sparring Ring (structured human vs AI debates with scorecards). All modes feed a unified leaderboard backed by SQLite. macOS-only, dark theme, arena/colosseum aesthetic.

Tech Stack

Runtime: Tauri 2.x (Rust backend + webview frontend)
Frontend: React 19 + TypeScript 5.x strict mode
Build: Vite 6.x with @tauri-apps/vite-plugin
Styling: Tailwind CSS 4.x (dark theme, gold/amber accents)
State: Zustand 5.x
Routing: React Router 7.x
Charts: Recharts 2.x
Database: SQLite via rusqlite 0.31+ (bundled, WAL mode)
HTTP: reqwest 0.12+ (async streaming)
Async: tokio 1.x
System info: sysinfo 0.31+
LLM: Ollama REST API (localhost:11434)

Architecture

React frontend communicates with Rust backend via Tauri IPC (invoke for commands, listen for streaming events). Rust backend owns all Ollama communication, SQLite access, and Elo calculations. Frontend is purely presentational + state management.

Key modules:

src-tauri/src/db.rs — SQLite connection, migrations, schema (13 tables), seed data
src-tauri/src/ollama.rs — Ollama REST client with streaming (reads configurable URL from settings)
src-tauri/src/lib.rs — All Tauri commands, Model/Setting structs, settings key whitelist
src-tauri/src/debate.rs — Arena (3 formats) + Sparring debate engine, vote + Elo, scorecards
src-tauri/src/benchmark.rs — CRUD, runner, auto-judge, blind comparison, hardware metrics, import/export
src-tauri/src/elo.rs — Elo rating calculations (67 tests)
src-tauri/src/prompts.rs — System prompt templates (arena, formal, socratic, sparring, scorecard judge)

Development Conventions

TypeScript strict mode. No any types.
React: Functional components with hooks only. No class components.
Rust: clippy clean. cargo fmt on save.
File naming: snake_case.rs for Rust, PascalCase.tsx for React components, camelCase.ts for utilities
Git commits: conventional commits (feat:, fix:, refactor:, chore:)
All Tauri commands return Result<T, String> — handle errors in Rust, display in frontend
Database writes wrapped in explicit transactions
No unwrap() in production Rust code — use ? operator or proper error handling

Current Phase

v1.0.0 — Feature Complete (all phases done, audit remediation applied)

Phase 0: Foundation — Tauri 2.0 scaffold, SQLite (13 tables, WAL), Ollama REST client, Elo module
Phase 1: Arena Mode — Debate engine (freestyle/formal/socratic), vote + Elo, leaderboard, history
Phase 2: Benchmark — CRUD suites/prompts, runner with TTFT/TPS metrics, manual + auto-judge scoring, blind comparison, hardware metrics, import/export
Phase 3: Sparring Ring — Human vs AI debates, 3 difficulty levels, 4-phase structure, scorecards, user Elo
Phase 4: Polish — 3 debate formats, topic suggestions, settings page, blind test, animations, skeleton loading, export (Markdown/CSV/JSON)
Audit — Security hardening (configurable Ollama URL, query limit caps, settings key whitelist), accessibility (ARIA attributes), error handling, 67 Rust tests

Key Decisions Made

Decision	Choice	Rationale
Concurrent streaming	Concurrent with auto sequential fallback when models > 40B combined	User wants dramatic visual. Fallback prevents OOM.
Database access	rusqlite directly, not tauri-plugin-sql	More control over WAL mode, migrations, concurrent access
Elo parameters	Start 1500, K=40→32→24 based on game count	Standard chess Elo with decay to stabilize ratings
Benchmark scoring	1-5 manual, 1-10 auto-judge normalized	Fast manual scoring, more granular auto-judge
App modes	Arena → Benchmark → Sparring (build order)	Arena builds all shared infra, others plug in
DB location	~/.model-colosseum/colosseum.db	Standard macOS app data location
Ollama streaming	NDJSON line-by-line parsing, not SSE	That's what Ollama returns

Do NOT

Do not scaffold the entire project in one session — follow the phased plan strictly
Do not use Tauri v1 APIs or import paths — this is Tauri 2.x (@tauri-apps/api v2)
Do not use tauri-plugin-sql — we use rusqlite directly
Do not use unwrap() in Rust production code — use ? or proper error handling
Do not make any network calls except to localhost Ollama (no telemetry, no cloud)
Do not use class components in React — hooks only
Do not store any data outside ~/.model-colosseum/ — single source of truth
Do not assume Ollama is running — always health check first and handle absence gracefully

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model Colosseum

Project Overview

Tech Stack

Architecture

Development Conventions

Current Phase

Key Decisions Made

Do NOT

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

Model Colosseum

Project Overview

Tech Stack

Architecture

Development Conventions

Current Phase

Key Decisions Made

Do NOT