sidphbot · sidphbot · Mar 11, 2026 · Mar 11, 2026
diff --git a/.claude/skills/hotcb-autopilot/SKILL.md b/.claude/skills/hotcb-autopilot/SKILL.md
@@ -0,0 +1,387 @@
+---
+name: hotcb-autopilot
+description: Launch AI-driven training optimization with hotcb. Checks installation, asks the user for run config, optionally deep-reads training code for context-aware optimization, then runs as the live autopilot — reading metrics, analyzing trends, and issuing hotcb commands to tune hyperparameters, loss weights, and callbacks during training. Use when the user wants to optimize a PyTorch training run with AI assistance.
+---
+
+# hotcb AI Autopilot — Claude Code Skill
+
+You are acting as the AI autopilot for hotcb, a live training control plane for PyTorch.
+Your job: read training metrics, analyze trends, and issue hotcb commands to optimize training — all without restarting.
+
+## Phase 0: Understand the Training Code (optional, user-gated)
+
+**Before anything else**, ask the user one question:
+
+> "Can I read your training code? This lets me understand your model architecture, loss function, optimizer setup, augmentation pipeline, and logging — so I can make smarter decisions during training (and occasionally suggest code changes). This uses extra tokens but gives much better results. (y/n)"
+
+### If the user says yes:
+
+Ask them to point you to the key files (or directories) — e.g. "Which files contain your training loop, model definition, loss, and data pipeline?"
+
+Then read those files and produce a **training context summary** saved to `<run_dir>/hotcb.training_context.md`. This file is reused in future invocations so you don't re-read the codebase every time.
+
+The summary should cover:
+
+```markdown
+# Training Context — <project name>
+Generated by hotcb autopilot on <date>
+
+## Model Architecture
+- Type: (e.g. ResNet-50, GPT-2, ViT-B/16, custom UNet)
+- Parameters: (approx count if visible)
+- Key layers / heads: (what matters for loss routing, freezing, etc.)
+
+## Loss Function
+- Type: (CrossEntropy, MSE, multi-task weighted sum, custom)
+- Terms: (list each term, its weight, whether it's toggleable)
+- Key metric relationship: (which loss terms most affect which metrics)
+
+## Optimizer & Scheduler
+- Optimizer: (Adam, AdamW, SGD, etc.) with initial params (lr, wd, momentum)
+- Scheduler: (cosine, step, warmup+decay, none)
+- Known sensitivities: (e.g. "lr > 1e-3 likely unstable for this arch")
+
+## Data Pipeline
+- Dataset: (name/size if visible)
+- Augmentations: (list key transforms)
+- Batch size / accumulation: (if visible)
+
+## Logging & Metrics
+- What metrics are logged: (train_loss, val_loss, val_acc, grad_norm, lr, etc.)
+- Logging frequency: (every step, every N steps, every epoch)
+- Validation frequency: (every N steps, every epoch)
+
+## Notable Code Patterns
+- (Anything relevant: gradient clipping, mixed precision, EMA, custom callbacks, etc.)
+```
+
+**Keep the summary concise** — aim for 50-100 lines. This is context for the autopilot, not full documentation.
+
+If `hotcb.training_context.md` already exists in the run dir, ask: "I found an existing training context from a previous session. Should I reuse it, or re-read the code?" If reuse, just load it. Do not re-scan.
+
+### If the user says no:
+
+Skip entirely. Proceed to Phase 1. You'll operate on metrics alone (still effective, just less context for decisions).
+
+### Using the context during autopilot
+
+When `hotcb.training_context.md` exists:
+- **Inform your decisions**: e.g. if the model uses cosine scheduling, don't fight the scheduler with lr adjustments unless it's clearly broken
+- **Suggest code changes** (karpathy-style): if you notice something during training that a code change would fix better than a knob turn — e.g. "your augmentation pipeline doesn't include mixup, which could help with the overfitting I'm seeing" or "your warmup is 100 steps but this model typically needs 500" — suggest the edit with a specific diff. Don't force it; suggest and let the user decide.
+- **Loss term awareness**: if you know the loss is a weighted sum of 3 terms, you can make much smarter `loss set_params` calls
+- **Architecture awareness**: know what "grad_norm is spiking" means for this specific model
+
+## Phase 1: Quick Setup
+
+### 1.1 Check hotcb installation
+
+```bash
+python3 -c "import hotcb; print('hotcb OK')" 2>&1 && python3 -c "import fastapi, uvicorn, websockets; print('dashboard deps OK')" 2>&1
+```
+
+If not installed: tell user to run `pip install "hotcb[dashboard]"` (or `pip install -e ".[dev,all]"` if developing from source).
+
+### 1.2 Check for launch config
+
+Look for `hotcb.launch.json` in the project root (or current directory). This file is created during integration and contains everything needed to launch — **if it exists, skip all user questions and go straight to Phase 2.**
+
+```bash
+cat hotcb.launch.json 2>/dev/null || echo "NOT_FOUND"
+```
+
+The file looks like:
+
+```json
+{
+  "train_fn": "my_project.train:train",
+  "run_dir": "./runs",
+  "key_metric": "val_loss",
+  "max_steps": 5000,
+  "max_time": 300,
+  "autopilot": "ai_suggest",
+  "port": 8421
+}
+```
+
+All fields are optional except `train_fn` (or training must already be running). If `hotcb.launch.json` exists:
+- Use its values as defaults
+- Print: "Found hotcb.launch.json — launching with: train_fn=X, key_metric=Y, max_time=Z"
+- Go directly to Phase 2 (still ask Phase 0 code-reading question if `hotcb.training_context.md` doesn't exist yet)
+
+### 1.3 If no launch config — ask the user
+
+Do NOT scan the repo for integration. Ask the user directly:
+
+1. **Run directory**: "Where is your run directory? (path with `hotcb.metrics.jsonl`, or I'll create one)"
+2. **Training status**: "Is training already running, or should I start it? If I should start it, what's the training function? (e.g. `my_module:train`)"
+3. **Key metric**: "What metric should I optimize? (e.g. `val_loss`, `val_accuracy`)"
+4. **Time/step limit**: "How long should training run? (e.g. `300` seconds, `5000` steps, or both)"
+5. **Integration check**: "Does your training loop write `hotcb.metrics.jsonl` and read `hotcb.commands.jsonl`? (If not, I'll show you the 10-line integration snippet)"
+
+If the user says integration isn't done, show this minimal contract:
+
+```python
+# Add to your training loop — 10 lines total
+import json, os
+
+def write_metrics(run_dir, step, metrics_dict):
+    with open(os.path.join(run_dir, "hotcb.metrics.jsonl"), "a") as f:
+        f.write(json.dumps({"step": step, "metrics": metrics_dict}) + "\n")
+
+def read_commands(run_dir, offset=0):
+    path = os.path.join(run_dir, "hotcb.commands.jsonl")
+    if not os.path.exists(path): return [], offset
+    cmds = []
+    with open(path) as f:
+        for i, line in enumerate(f):
+            if i < offset: continue
+            line = line.strip()
+            if line:
+                try: cmds.append(json.loads(line))
+                except: pass
+    return cmds, offset + len(cmds)
+```
+
+Or for adapter users: `HotCBLightning(kernel)` / `HotCBHFCallback(kernel)` handle this automatically.
+
+For the full integration reference (all options, metrics/command format, loss weight exposure, train_fn contract), read `INTEGRATION.md` in the hotcb repo. That file is designed to be self-contained context for AI agents working on external training projects.
+
+## Phase 2: Launch
+
+Build the launch command from either `hotcb.launch.json` values or user answers.
+
+Use the autopilot mode from the config/user — do NOT override it to `off`. If the user specified `ai_suggest` or `ai_auto`, launch with that. If `hotcb.launch.json` has `"autopilot": "ai_suggest"`, use `--autopilot ai_suggest`. If no autopilot was specified, default to `ai_suggest` (since the user invoked this skill, they want AI optimization).
+
+### 2.1 If training needs to be started:
+
+```bash
+hotcb launch \
+    --train-fn <train_fn> \
+    --run-dir <run_dir> \
+    --max-steps <max_steps> \
+    --max-time <max_time> \
+    --key-metric <key_metric> \
+    --port <port> \
+    --autopilot <autopilot_mode> &
+LAUNCH_PID=$!
+echo "Dashboard: http://localhost:<port>"
+```
+
+Omit `--max-steps` or `--max-time` if not specified. Include both if both are set (whichever limit hits first wins).
+
+If `hotcb.launch.json` exists, you can also use:
+```bash
+hotcb launch --config-file hotcb.launch.json &
+```
+
+### 2.2 If training is already running (just attach dashboard):
+
+```bash
+hotcb serve --dir <run_dir> --port <port> --autopilot <autopilot_mode> &
+SERVER_PID=$!
+echo "Dashboard: http://localhost:<port>"
+```
+
+### 2.2 Wait for metrics
+
+Poll until metrics start flowing:
+```bash
+for i in $(seq 1 30); do
+  if [ -s "<run_dir>/hotcb.metrics.jsonl" ]; then
+    echo "Metrics flowing"
+    break
+  fi
+  sleep 1
+done
+```
+
+## Phase 3: Autopilot Loop
+
+**This is where YOU (Claude Code) act as the AI autopilot.**
+
+You replace the external LLM — you read metrics directly, reason about them, and issue hotcb commands via the REST API or CLI. No API key needed.
+
+### 3.1 Read current state
+
+Use the REST API to get current metrics and status:
+
+```bash
+# Get latest metrics
+curl -s http://localhost:8421/api/metrics/history?last_n=20 | python3 -m json.tool
+
+# Get metric names
+curl -s http://localhost:8421/api/metrics/names | python3 -m json.tool
+
+# Get current status
+curl -s http://localhost:8421/api/status | python3 -m json.tool
+
+# Get applied mutations
+curl -s http://localhost:8421/api/applied/history?last_n=10 | python3 -m json.tool
+```
+
+Or read JSONL files directly:
+```bash
+tail -5 <run_dir>/hotcb.metrics.jsonl | python3 -m json.tool
+```
+
+### 3.2 Analyze trends
+
+For each metric, compute:
+- **Direction**: Is it going up, down, or flat?
+- **Rate**: How fast is it changing?
+- **Volatility**: Is it noisy or smooth?
+- **Key events**: Any spikes, plateaus, or trend reversals?
+
+Focus on the **key metric** but monitor all metrics for health.
+
+If `hotcb.training_context.md` exists, cross-reference your analysis with the known model/loss/optimizer setup. For example: if context says cosine LR schedule is active, a slowly decreasing lr is expected, not a problem.
+
+### 3.3 Decide and act
+
+Based on your analysis:
+
+**If training is healthy** (loss decreasing steadily, no divergence):
+→ Do nothing. Report status to user. Check back in 50-100 steps.
+
+**If loss has plateaued** (flat for 20+ steps):
+→ Reduce learning rate by 50%:
+```bash
+curl -s -X POST http://localhost:8421/api/opt/set \
+  -H 'Content-Type: application/json' \
+  -d '{"params": {"lr": <new_lr>}}'
+```
+Or via CLI: `hotcb --dir <run_dir> set lr=<new_lr>`
+
+**If loss is diverging** (increasing sharply):
+→ Reduce learning rate aggressively (10x):
+```bash
+curl -s -X POST http://localhost:8421/api/opt/set \
+  -H 'Content-Type: application/json' \
+  -d '{"params": {"lr": <current_lr * 0.1>}}'
+```
+
+**If overfitting** (train loss << val loss, or val loss rising while train loss falls):
+→ Increase weight decay:
+```bash
+curl -s -X POST http://localhost:8421/api/opt/set \
+  -H 'Content-Type: application/json' \
+  -d '{"params": {"weight_decay": <new_wd>}}'
+```
+
+**If multi-task and one task is dominating**:
+→ Adjust loss weights:
+```bash
+curl -s -X POST http://localhost:8421/api/loss/set \
+  -H 'Content-Type: application/json' \
+  -d '{"params": {"weight_a": 0.5, "weight_b": 0.5}}'
+```
+
+### 3.4 Suggest code changes (if training context available)
+
+When you have `hotcb.training_context.md`, you can occasionally suggest code-level fixes — things a knob turn can't do:
+
+- "Your augmentation is missing X, which would help with the overfitting pattern I'm seeing"
+- "The warmup schedule is too short for this model size — suggest changing line N in train.py"
+- "Consider adding gradient clipping — grad_norm has been volatile"
+- "The validation frequency is too low to catch divergence early — suggest validating every 50 steps instead of every epoch"
+
+**Present as a suggestion with a concrete diff.** Don't apply without user approval. These are between-check-in suggestions, not every cycle.
+
+### 3.5 Report to user
+
+After each analysis cycle, report:
+1. Current step and key metrics
+2. Trend assessment (healthy/plateau/diverging/overfitting)
+3. Action taken (if any) with reasoning
+4. Next check-in time
+
+Format:
+```
+## Step {N} — {Assessment}
+- train_loss: {val} ({trend})
+- val_loss: {val} ({trend})
+- lr: {val}
+- Action: {what you did and why, or "none — training healthy"}
+- Next check: step {N+50}
+```
+
+### 3.6 Cadence
+
+- **Normal**: Check every 50 steps
+- **After intervention**: Check in 20 steps to see effect
+- **If diverging**: Check every 10 steps
+- **If healthy and improving**: Can extend to every 100 steps
+
+### 3.7 Graduation principle
+
+Start with small changes:
+1. First intervention: lr *= 0.5
+2. If no improvement after 30 steps: lr *= 0.5 again
+3. If still no improvement: try weight_decay adjustment
+4. Only after 3+ failed interventions: consider declaring run degenerate
+
+### 3.8 Multi-run awareness
+
+If a run is truly degenerate (NaN loss, diverged beyond recovery):
+1. Save learnings about what failed
+2. Suggest to user: "This run has diverged. I recommend restarting with lr={suggested_lr}."
+3. Don't keep trying to fix an unfixable run
+
+## Phase 4: Finalization
+
+When training completes (metrics stop flowing or max_steps reached):
+
+1. Summarize the run:
+   - Final key metric value
+   - Total interventions made
+   - What worked and what didn't
+2. Export recipe: `hotcb --dir <run_dir> recipe export`
+3. If training context was read, suggest improvements for next run (code-level)
+4. Suggest next steps
+
+## Key Principles
+
+1. **Conservative**: When in doubt, do nothing. Healthy training doesn't need intervention.
+2. **Graduated**: Small changes first. Never change lr by more than 50% in one step.
+3. **Observable**: Wait enough steps after an intervention to see its effect (20-50 steps minimum).
+4. **Transparent**: Always explain your reasoning to the user.
+5. **Reversible**: Prefer changes that can be undone. Don't disable callbacks unless clearly needed.
+6. **Code-aware**: When you have training context, use it. A code suggestion is sometimes better than a knob turn.
+
+## Available hotcb Commands Reference
+
+```bash
+# Optimizer
+hotcb --dir <run_dir> set lr=0.001
+hotcb --dir <run_dir> set weight_decay=0.01
+hotcb --dir <run_dir> opt set_params lr=0.001 weight_decay=0.01
+
+# Loss
+hotcb --dir <run_dir> loss set_params weight_a=0.7 weight_b=0.3
+
+# Callbacks
+hotcb --dir <run_dir> cb enable <id>
+hotcb --dir <run_dir> cb disable <id>
+
+# Status
+hotcb --dir <run_dir> status
+
+# Recipe export
+hotcb --dir <run_dir> recipe export
+```
+
+## REST API Reference
+
+```
+GET  /api/metrics/names         — list discovered metric names
+GET  /api/metrics/history       — recent metric records (?last_n=500)
+GET  /api/applied/history       — applied mutations (?last_n=200)
+GET  /api/status                — run status (freeze, files)
+GET  /api/health                — server health
+GET  /api/train/status          — training thread status
+POST /api/opt/set               — set optimizer params {"params": {"lr": 0.001}}
+POST /api/loss/set              — set loss params {"params": {"weight_a": 0.7}}
+GET  /api/autopilot/status      — autopilot state
+GET  /api/autopilot/ai/status   — AI autopilot state (cost, key_metric, etc.)
+GET  /api/autopilot/ai/history  — AI decision history with reasoning
+```
diff --git a/.gitignore b/.gitignore
@@ -5,7 +5,8 @@ dist/
 *.egg-info/
 .eggs/
 .idea/
-.claude/
+.claude/worktrees/
+.claude/setting*
 lightning_logs/
 data/*
 # (optional) If you use pip editable installs, also ignore: