Skip to content

BF667-IDLE/VCTrain

Repository files navigation

Python TypeScript Next.js PyTorch FastAPI

20 Optimizers WebSocket Colab Dark Mode Multi GPU

VCTrain

Enhanced RVC Training System with 20 optimizers, full TypeScript WebUI, Python FastAPI backend, WebSocket real-time monitoring, and Google Colab support.

Colab · Install · WebUI · Optimizers · Workflow

Built on PolTrain · Side project of RVC Starter


Architecture

┌────────────────────────────────────────┐
│         TypeScript Frontend            │
│         Next.js 16 · Port 3000         │
│   Dashboard · Config · Monitor · Guide │
├────────────────────────────────────────┤
│         WebSocket Bridge               │
│         Bun · Port 3003                │
│   REST Proxy + WS Relay + Auto-Reconnect│
├────────────────────────────────────────┤
│         Python Backend                 │
│         FastAPI · Port 7861            │
│   Training · GPU · System · WebSocket  │
└────────────────────────────────────────┘
Layer Technology Port Purpose
Frontend Next.js 16, React 19, TypeScript 5, Tailwind CSS 4, shadcn/ui 3000 WebUI with 4 tabs
Bridge Bun, native WebSocket + ws library 3003 REST proxy + WebSocket relay
Backend Python 3.8+, FastAPI, Uvicorn, PyTorch 2.0+ 7861 Training pipeline + GPU management

Google Colab (Free GPU)

Open In Colab

Open colab_webui.ipynb in Google Colab and run all cells. It automatically handles everything:

Step What happens
GPU Check Detects GPU name, VRAM, and temperature
Install Deps Installs PyTorch (CUDA), FastAPI, Node.js, and all dependencies
Download Models Fetches RMVPE + ContentVec pre-trained models
Upload Dataset Connect Google Drive or upload audio files directly
Start Backend Launches Python FastAPI server on port 7861
Build Frontend Installs npm packages and starts Next.js on port 3000
ngrok Tunnels Creates public URLs for remote access from any device

The entire process is idempotent — safe to re-run any cell. Works with Colab's free T4 GPU (16GB VRAM).

Colab Tips

  • Use T4 GPU (free) for models up to batch size 8
  • Set optimizer to AdamW or Ranger for best results on T4
  • Use Adafactor if you hit OOM errors (lowest VRAM usage)
  • Connect Google Drive for persistent storage across sessions
  • 300 epochs takes roughly 1-2 hours on T4 depending on dataset size

Local Installation

Prerequisites

  • Python 3.8+
  • Node.js 18+ / Bun 1.0+
  • PyTorch 2.0+ with CUDA support (optional, CPU/MPS also works)
  • GPU: NVIDIA with CUDA 11.7+ (optional)
  • RAM: 8GB+ recommended

Step 1: Clone & Install Python Dependencies

git clone https://github.com/BF667-IDLE/VCTrain.git
cd VCTrain

# Install Python ML dependencies
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt

# Install Python backend dependencies
pip install fastapi uvicorn[standard] websockets

Step 2: Install Frontend Dependencies

# Using bun (recommended)
bun install

# Or using npm
npm install

Step 3: Install WebSocket Bridge

cd mini-services/ws-bridge
bun install
cd ../..

Step 4: Start All Services

Open 3 terminal windows:

# Terminal 1 — Python Backend (port 7861)
python -m webui.server

# Terminal 2 — WebSocket Bridge (port 3003)
cd mini-services/ws-bridge && bun run dev

# Terminal 3 — Next.js Frontend (port 3000)
bun run dev

Step 5: Open WebUI

Navigate to http://localhost:3000 in your browser.

Note: The WebUI works in demo mode even without the Python backend running. When the backend is offline, it shows mock data with a "Backend Not Connected" banner. Start the backend to switch to live training mode.


WebUI Features

Dashboard Tab

  • Real-time experiment list fetched from the filesystem
  • Active training jobs with live status (running / completed / failed)
  • GPU monitoring — GPU name, memory usage, CUDA version
  • Connection indicator — green dot when backend is online, gray when offline
  • Quick action buttons — New Training, Compare Models, GPU Monitor

Training Config Tab

  • Complete form matching all train.py CLI arguments:
    • Experiment directory, model name, total epochs, save interval, batch size
    • Sample rate (32k / 40k / 48k), vocoder (HiFi-GAN / MRF / RefineGAN)
    • All 19 optimizer selection with live info panel
    • Pretrained model paths, GPU device IDs, save-to-ZIP toggle
  • Live CLI command preview — see the exact command that will run, with copy button
  • Starts real training via FastAPI backend when connected
  • Shows job ID on success and auto-switches to Monitor tab

Training Monitor Tab

  • Real-time WebSocket metrics — losses, mel similarity, gradient norms, learning rate
  • 4 interactive Recharts:
    • Loss Curves (discriminator, generator, mel, KL)
    • Mel Spectrogram Similarity (%) over epochs
    • Gradient Norms (generator vs discriminator)
    • Learning Rate Schedule with cosine decay visualization
  • Scrollable training log viewer — see raw training output in real-time
  • Demo mode — shows realistic mock data when backend is offline

Optimizer Guide Tab

  • 6 quick recommendation cards: Best Overall, Fastest, Memory Efficient, Zero LR Tuning, Maximum Quality, Large Batch
  • All 19 optimizers organized by category with expandable detail cards
  • Star ratings for Speed, Quality, Memory Efficiency, and Stability
  • Click any optimizer card to see full details: description, recommended LR range, key feature, best use case

📁 Project Structure

VCTrain/
├── rvc/                          # Core training code (Python)
│   ├── train/
│   │   ├── train.py              # Main training script (20 optimizers)
│   │   ├── utils/
│   │   │   ├── optimizers/       # 20 optimizer implementations
│   │   │   │   ├── Adam.py
│   │   │   │   ├── AdamW.py
│   │   │   │   ├── AdaBelief.py
│   │   │   │   ├── AdaBeliefV2.py
│   │   │   │   ├── Adafactor.py
│   │   │   │   ├── AMSGrad.py
│   │   │   │   ├── Apollo.py
│   │   │   │   ├── CAME.py
│   │   │   │   ├── DAdaptAdam.py
│   │   │   │   ├── LAMB.py
│   │   │   │   ├── Lion.py
│   │   │   │   ├── Lookahead.py
│   │   │   │   ├── NovoGrad.py
│   │   │   │   ├── Prodigy.py
│   │   │   │   ├── RAdam.py
│   │   │   │   ├── Ranger.py
│   │   │   │   ├── SignSGD.py
│   │   │   │   ├── SGD.py
│   │   │   │   └── Sophia.py
│   │   │   ├── train_utils.py
│   │   │   └── data_utils.py
│   │   ├── preprocess/           # Audio preprocessing
│   │   ├── losses.py             # GAN loss functions
│   │   ├── mel_processing.py     # Mel spectrogram processing
│   │   └── visualization.py      # TensorBoard logging
│   ├── lib/                      # Model architectures
│   │   ├── algorithm/            # Synthesizer, discriminator, generator
│   │   └── configs/              # Sample rate configs (32k/40k/48k)
│   └── configs/                  # JSON config templates
│
├── webui/                        # Python Backend (FastAPI)
│   ├── __init__.py
│   ├── server.py                 # FastAPI server (port 7861)
│   └── requirements.txt          # fastapi, uvicorn, websockets
│
├── src/                          # TypeScript Frontend (Next.js)
│   ├── app/
│   │   ├── page.tsx              # Main page with tab navigation
│   │   ├── layout.tsx            # Root layout with QueryProvider
│   │   ├── globals.css           # Theme colors (amber/orange)
│   │   └── api/training/route.ts # CLI command generator API
│   ├── components/
│   │   ├── vctrain/              # Main tab components
│   │   │   ├── dashboard-tab.tsx
│   │   │   ├── training-config-tab.tsx
│   │   │   ├── training-monitor-tab.tsx
│   │   │   └── optimizer-guide-tab.tsx
│   │   └── ui/                   # shadcn/ui components
│   ├── lib/
│   │   ├── api.ts                # REST client + WebSocket hook
│   │   ├── store.ts              # Zustand state management
│   │   ├── query-provider.tsx    # React Query configuration
│   │   └── training-data.ts      # Optimizer definitions + mock data
│   └── types/
│       └── vctrain.ts            # TypeScript interfaces
│
├── mini-services/                # WebSocket Bridge
│   └── ws-bridge/
│       ├── index.ts              # Bridge server (port 3003)
│       ├── package.json
│       └── tsconfig.json
│
├── colab.ipynb                   # Original Colab notebook (CLI)
├── colab_webui.ipynb             # New Colab notebook (WebUI)
├── package.json                  # Frontend dependencies
├── requirements.txt              # Python ML dependencies
└── download_files.py             # Pre-trained model downloader

🔧 Usage

Command Line Training

# Default training with AdamW
python rvc/train/train.py \
  --experiment_dir "experiments" \
  --model_name "my_voice" \
  --optimizer "AdamW" \
  --total_epoch 300 \
  --batch_size 8 \
  --sample_rate 48000 \
  --gpus "0"

# With Ranger for best generalization
python rvc/train/train.py \
  --experiment_dir "experiments" \
  --model_name "my_voice" \
  --optimizer "Ranger" \
  --total_epoch 300 \
  --batch_size 8

# With Prodigy (no LR tuning needed!)
python rvc/train/train.py \
  --experiment_dir "experiments" \
  --model_name "my_voice" \
  --optimizer "Prodigy" \
  --total_epoch 300

# Memory-efficient with Adafactor
python rvc/train/train.py \
  --experiment_dir "experiments" \
  --model_name "my_voice" \
  --optimizer "Adafactor" \
  --total_epoch 300

# Multi-GPU training (GPUs 0 and 1)
python rvc/train/train.py \
  --experiment_dir "experiments" \
  --model_name "my_voice" \
  --optimizer "Sophia" \
  --gpus "0-1"

WebUI Training

  1. Open http://localhost:3000 (or the Colab ngrok URL)
  2. Go to Training Config tab
  3. Fill in model name and adjust parameters
  4. Select your preferred optimizer from the dropdown
  5. Click Start Training — it switches to the Monitor tab automatically
  6. Watch real-time charts and logs update live

Backend API

Method Endpoint Description
GET /api/health Backend health check
POST /api/training/start Start a new training job
GET /api/training/status Get all job statuses
POST /api/training/stop/{job_id} Stop a running job
DELETE /api/training/job/{job_id} Delete a job record
GET /api/experiments List filesystem experiments
GET /api/system/info GPU and system info
GET /api/optimizers Available optimizers list
WS /ws/training/{job_id} Real-time training metrics stream

🎯 20 Optimizers with Gradient Centralization

All custom optimizers support:

  • torch._foreach acceleration for fast vectorized operations
  • Optional Gradient Centralization (GC) for improved GAN training stability
  • Decoupled weight decay following Loshchilov & Hutter (2019)
  • Both single-tensor and foreach step implementations

Adaptive Methods

Optimizer Description LR Range Key Feature
AdamW Adam + decoupled weight decay + GC 1e-4 to 3e-4 Custom impl with GC
Adam Classic adaptive optimizer + GC 1e-4 to 3e-4 Fast convergence
AMSGrad Adam with max variance tracking + GC 1e-4 to 3e-4 Prevents oscillations
RAdam Rectified Adam + GC 1e-4 to 3e-4 Stable early training
AdaBelief Belief-based adaptive LR + GC 1e-4 to 3e-4 Better generalization
AdaBeliefV2 AdaBelief + AMSGrad 1e-4 to 3e-4 Very stable, long training
Adafactor Factored moments, memory efficient Auto (relative step) Lowest VRAM usage
NovoGrad Normalized gradient, per-layer LR 1e-4 to 3e-4 Naturally per-layer adaptive
LAMB Layer-wise Adaptive Moments 1e-4 to 3e-4 Large-batch training
DAdaptAdam D-Adaptation for automatic LR Auto (set lr=1.0) No LR tuning needed

Sign-Based

Optimizer Description LR Range Key Feature
Lion Evolved Sign Momentum + GC 1e-5 to 5e-5 Only stores momentum
SignSGD Sign of momentum + GC 1e-5 to 5e-5 Ultra memory-efficient

Second-Order / Clipped

Optimizer Description LR Range Key Feature
Sophia Second-order clipping (Sophia-G) + GC 5e-5 to 2e-4 Curvature-aware
CAME Clipped Absolute Moment Estimation + GC 5e-4 to 1e-3 Dual variance estimates
Apollo Curvature-aware near-optimal + GC 1e-3 to 1e-2 Approx. second-order

Projection & Hybrid

Optimizer Description LR Range Key Feature
AdamP Adam with perturbation projection + GC 1e-4 to 3e-4 Anti-filter-noise
Ranger RAdam + Lookahead + GC 1e-4 to 3e-4 Best generalization
SGD Nesterov momentum + GC 1e-3 to 1e-2 Strong regularization
Lookahead Wrapper for any base optimizer N/A Enhances any optimizer

Auto-LR Methods

Optimizer Description LR Range Key Feature
Prodigy Automatic LR via D-Adaptation + GC Auto (set lr=1.0) Zero-tuning
DAdaptAdam D-Adaptation for Adam Auto (set lr=1.0) Self-adjusting

⚙️ Optimizer Guide

Quick Recommendations

Use Case Optimizer Why
Default / General AdamW or Ranger Best overall for RVC
Low VRAM Adafactor Factored moments, least memory
Best Quality Sophia or CAME Fast convergence, stable
No LR Tuning Prodigy or DAdaptAdam Auto-finds optimal LR
Large Batch LAMB Trust ratio prevents divergence
Fast Training Lion or SignSGD Minimal memory, fast per-step
GAN Stability Ranger or AdamW + GC Lookahead + GC
Quick Test SGD Nesterov Simple, strong regularization

Performance Comparison

Optimizer Speed Quality Memory Stability
AdamW ⭐⭐⭐ ⭐⭐⭐ ⭐⭐ ⭐⭐⭐
Adam ⭐⭐⭐ ⭐⭐⭐ ⭐⭐ ⭐⭐⭐
AMSGrad ⭐⭐⭐ ⭐⭐⭐ ⭐⭐ ⭐⭐⭐⭐
RAdam ⭐⭐⭐ ⭐⭐⭐ ⭐⭐ ⭐⭐⭐⭐
Ranger ⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐⭐ ⭐⭐⭐⭐⭐
AdaBelief ⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐ ⭐⭐⭐
AdaBeliefV2 ⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐ ⭐⭐⭐⭐
Adafactor ⭐⭐ ⭐⭐⭐ ⭐⭐⭐⭐⭐ ⭐⭐⭐
Apollo ⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐⭐
CAME ⭐⭐⭐ ⭐⭐⭐⭐⭐ ⭐⭐ ⭐⭐⭐⭐⭐
DAdaptAdam ⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐ ⭐⭐⭐⭐
LAMB ⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐ ⭐⭐⭐
NovoGrad ⭐⭐⭐ ⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐⭐
Prodigy ⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐ ⭐⭐⭐⭐
Lion ⭐⭐⭐⭐ ⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐⭐
SignSGD ⭐⭐⭐⭐⭐ ⭐⭐⭐ ⭐⭐⭐⭐⭐ ⭐⭐⭐
Sophia ⭐⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐ ⭐⭐⭐⭐

📊 Training Workflow

  1. Prepare Data — Collect clean audio files (WAV, 32kHz+). Minimum 10 minutes of speech recommended.
  2. Preprocess — Slice audio, extract features, build filelist. Use command line or WebUI.
  3. Configure — Set parameters in Training Config tab. Choose from 19 optimizers, set epochs, batch size, sample rate, vocoder.
  4. Train — Click Start Training. The backend launches as a subprocess and streams metrics via WebSocket.
  5. Monitor — Watch real-time loss curves, mel similarity, gradient norms, and learning rate in the Monitor tab.
  6. Export — Download trained model weights for inference with RVC.

💡 Tips

Dataset Quality

  • Use clean audio without background noise
  • Minimum 10 minutes of speech recommended
  • Consistent volume levels across samples
  • Remove silence and breaths for best results

Training

  • Start with 100 epochs for quick testing
  • Use 300+ epochs for production quality
  • Monitor mel similarity (target: 70%+)
  • Save checkpoints regularly (every 25 epochs by default)

VRAM Optimization

VRAM Batch Size Recommended Optimizer
4 GB 2-4 Adafactor or SignSGD
8 GB 4-8 AdamW or Lion
12 GB 8-16 Any optimizer
16+ GB 16-32 Sophia or CAME

Optimizer-Specific Tips

  • Prodigy / DAdaptAdam: Set lr=1.0, the optimizer auto-adjusts
  • Lion / SignSGD: Use lower LR than Adam (10x lower typically)
  • Sophia: Update period of 2-3 steps works best
  • Ranger: Good default choice, no tuning needed
  • Adafactor: Uses relative_step=True for automatic LR
  • CAME: Higher LR (10x base) works best due to clipping

🛠️ Tech Stack

Component Technology
Frontend Framework Next.js 16, React 19, TypeScript 5
Styling Tailwind CSS 4, shadcn/ui (New York)
Charts Recharts
Animations Framer Motion
State Management Zustand, React Query (TanStack Query)
WebSocket Bridge Bun, native WebSocket + ws library
Python Backend FastAPI, Uvicorn, PyTorch 2.0+
ML Training PyTorch DDP, TensorBoard

📚 Documentation

Resource Description
colab_webui.ipynb Full WebUI on Google Colab with free GPU
colab.ipynb Original CLI-based Colab notebook
Optimizers README Technical optimizer guide
Backend API REST + WebSocket API reference

🙏 Acknowledgments

  • PolTrain — Base project
  • RVC — Voice conversion technology
  • PyTorch — Deep learning framework
  • Next.js — React framework
  • FastAPI — Python web framework
  • AdamW (Loshchilov & Hutter, 2019)
  • Lion (Chen et al., 2023)
  • Sophia (Liu et al., 2023)
  • RAdam (Liu et al., 2020)
  • Ranger (Less Wright, 2020)
  • AdaBelief (Zhuang et al., 2020)
  • Lookahead (Zhang et al., 2019)
  • Prodigy (Defazio & Jelassi, 2023)
  • D-Adapt (Defazio, 2023)
  • CAME (Luo et al., 2023)
  • Apollo (Shi et al., 2022)
  • LAMB (You et al., 2019)
  • NovoGrad (Golovneva et al., 2019)
  • SignSGD (Bernstein et al., 2018)

📝 License

Same license as the original PolTrain project.


Happy Training! 🎤

About

Enhanced RVC Training System with 20 optimizers, full TypeScript WebUI (Next.js), Python FastAPI backend, and Google Colab support

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Contributors