Reference document for reimplementation. Describes what the app does, not how it's currently built.
MelodAI is an AI-powered karaoke web app. Users search for songs, the backend downloads them from Deezer, splits audio into vocals + instrumental using AI (Demucs), extracts word-level timed lyrics with speaker diarization (WhisperX), and presents a synchronized karaoke player.
- Two independent audio streams: vocals and instrumental
- Each has its own volume slider (0-100%, default 50%)
- Streams are kept in perfect sync (seek, play, pause affect both)
- Web Audio API with GainNodes for volume control
- Audio routing: each source -> gain node -> speakers
- Play/Pause toggle button
- Seek by clicking anywhere on the progress bar
- Jump to word - clicking any word in the lyrics seeks to that word's timestamp
- Previous/Next song navigation (cycles through queue)
- Fullscreen toggle (browser fullscreen API)
- No playback speed, repeat, or shuffle-play modes currently
- Visual fill showing current position as percentage of duration
- Time display:
current / totalinMM:SSformat - Click-to-seek anywhere on the bar
- Shimmer animation effect on the fill
- Hover: slight scale-up, thumb indicator appears at fill edge
This is the central feature of the app.
Data format - lyrics are an array of segments, each containing:
start/endtimestamps (seconds)- Array of
words, each with:word,start,end,speaker
Rendering:
- Lyrics displayed as lines (one segment = one line)
- Each word is a clickable span with timestamp data
- Lines are color-coded by speaker via left border
Real-time sync (runs every animation frame):
- Current word gets highlighted (yellow/gold background)
- Current line gets highlighted (blue background, full opacity)
- Lines fade based on temporal distance from current playback position
- Nearby lines: higher opacity
- Distant lines: lower opacity (gradual fade)
- Active line auto-scrolls to center of the lyrics container
- Smooth scroll behavior
Speaker identification:
- Up to 6 speakers supported with distinct colors:
- Cyan, Pink, Green, Amber, Indigo, Rose
- Each line gets a colored left border matching its speaker
- Speaker detected via WhisperX diarization in the backend
- Album art thumbnail
- Song title
- Artist name
- Updated when a new song loads
- Skeleton shimmer lines shown while lyrics are being fetched
- Empty state with prompt when no song is selected
Each queue item has: id, title, artist, thumbnail, vocalsUrl, musicUrl, lyricsUrl, ready, progress, status, error
- Processing - song is being downloaded/split/transcribed (shows progress %)
- Ready - fully processed, clickable to play
- Error - processing failed, shows retry button
- Active - currently playing (highlighted)
- Click a ready song to play it immediately
- Drag and drop to reorder (grip handle on each item)
- Remove any song except the currently playing one
- Retry errored songs (re-triggers backend processing)
- Shuffle randomizes order (keeps current song in place)
- Clear removes all except currently playing song (confirmation required)
- Random song button fetches a random pre-processed song from the library (excludes songs already in queue)
- Every 5 seconds, polls backend for processing status of non-ready songs
- Updates progress percentage and status text in real-time
- Marks songs as ready when processing completes (progress = 100%)
- Marks songs as error if processing fails
- Queue is session-only - lost on page refresh
- No local storage persistence currently
The library is the catalog of all previously processed songs on the server. Unlike the queue, it's persistent.
- Lazy-loaded (only fetches when Library tab is first opened)
- Each song shows: cover art, title, artist, completion status indicator
- Search/filter by title or artist (client-side, case-insensitive)
- Add to queue button (skips processing since already complete)
- Play now button (adds to queue and immediately plays)
- Add all button (adds all ready songs to queue, skips duplicates)
- Refresh button to re-fetch from server
- Unfinished songs shown with orange indicator (can't be added to queue)
- Search input with 300ms debounce
- Minimum 2 characters to trigger
- Queries Deezer API via backend
- Results shown in dropdown: thumbnail, title, artist
- Click result to add to queue (triggers full processing pipeline)
- Clear button to reset search
- Loading spinner during search
Three download options per song:
- Vocals only (
vocals.mp3) - Instrumental only (
no_vocals.mp3) - Full original song (
song.mp3)
Downloads via blob fetch + programmatic anchor click. Filenames formatted as Artist - Title (type).mp3.
- Generates clean URL:
/song/{id} - Copies to clipboard via Clipboard API
- URL updates automatically when current track changes (via React Router)
- On page load, reads track ID from URL path and adds to queue
- Legacy hash URLs (
/#song={id}) are redirected to clean URLs
- Light/dark mode toggle
- Persisted in localStorage
- CSS custom properties for all colors
- Smooth 300ms transitions on theme change
- Both modes have full color palettes (primary, secondary, success, danger, warning, background, surface, text, border, hover)
- Three types: success (green), error (red), warning (orange)
- Slide-in from right, auto-dismiss after 3 seconds, fade-out
- Used for: download status, queue actions, errors, clipboard operations
- Collapsible sidebar (queue/library panel)
- Toggle button collapses/expands sidebar
- Mobile adjustments at 768px breakpoint
- Touch-friendly control sizes
- Session-based auth with optional "Remember Me" (30-day token cookie)
- Login, register, forgot password forms on
/loginpage - Registration requires invite key OR admin approval
- First registered user auto-becomes admin
- Password reset via email (SMTP, optional)
- User - search, play, download, queue management
- Admin - all user features + admin panel access
- Profile dropdown in player header
- Shows username, links to About page, theme toggle, admin panel (if admin), logout
- View all users (username, creation date, status, role, last online, activity count)
- Approve pending users
- Promote/demote admin role
- Delete users
- Generate and manage invite keys
- Stats: total users, downloads, plays, searches, most active user
- Filterable usage logs (by username, action type)
- Paginated log table
- List all songs with metadata, file sizes, completion status
- Search/filter songs
- Shows which components exist (audio, vocals, instrumental, lyrics)
- Delete songs
- Health checks: Database, Deezer API, File System (disk space), Replicate API, Processing Queue, OpenRouter API
- Processing queue view with real-time progress (3s polling)
- Unfinished tracks view with failure counts (5s polling)
- Reprocess failed tracks
- Delete tracks with 2+ failures
- Status history log
- Metadata fetch (0-10%) - fetch song info from Deezer
- Download (10-20%) - download + decrypt audio from Deezer
- Audio splitting (20-50%) - Demucs via Replicate separates vocals from instrumental
- Lyrics extraction (50-85%) - WhisperX via Replicate produces word-level timestamps + speaker diarization
- Lyrics post-processing (85-90%) - LLM (via OpenRouter) splits long lines naturally
- Complete (100%)
- Failures tracked in database with count and error message
- Manual reprocess from admin panel
- Auto-reprocess of unfinished tracks on server startup
| Service | Purpose |
|---|---|
| Deezer | Song search, download, metadata, cover art |
| Replicate (Demucs) | Vocal/instrumental separation |
| Replicate (WhisperX) | Lyrics extraction with timing + diarization |
| OpenRouter (LLM) | Intelligent lyrics line splitting |
| SMTP (optional) | Password reset emails |
users- accounts, roles, approval statususage_logs- action tracking (search, download, play)invite_keys- registration invite systempassword_resets- reset tokensauth_tokens- remember-me tokenssystem_status- health check historyprocessing_failures- track error tracking
songs/{track_id}/
metadata.json # title, artist, album, duration, cover
song.mp3 # original audio
vocals.mp3 # extracted vocals
no_vocals.mp3 # instrumental
lyrics.json # final word-level timed lyrics with speakers
lyrics_raw.json # raw WhisperX output
Areas where the current implementation has known gaps:
- No queue persistence - queue lost on refresh
- No keyboard shortcuts - no hotkeys for play/pause, skip, volume, seek
- No playback rate control
- No repeat modes (repeat-one, repeat-all)
- No pre-loading of next song in queue
- No offline/PWA support
- No WebSocket for processing status - uses polling
- No EQ or audio effects beyond volume
- No lyrics editing capability
- No playlist saving/loading
- Mobile UX could be improved (sidebar behavior, touch gestures)
- Accessibility - ARIA attributes exist but keyboard interaction is minimal