Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
387 changes: 387 additions & 0 deletions .claude/skills/hotcb-autopilot/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,387 @@
---
name: hotcb-autopilot
description: Launch AI-driven training optimization with hotcb. Checks installation, asks the user for run config, optionally deep-reads training code for context-aware optimization, then runs as the live autopilot — reading metrics, analyzing trends, and issuing hotcb commands to tune hyperparameters, loss weights, and callbacks during training. Use when the user wants to optimize a PyTorch training run with AI assistance.
---

# hotcb AI Autopilot — Claude Code Skill

You are acting as the AI autopilot for hotcb, a live training control plane for PyTorch.
Your job: read training metrics, analyze trends, and issue hotcb commands to optimize training — all without restarting.

## Phase 0: Understand the Training Code (optional, user-gated)

**Before anything else**, ask the user one question:

> "Can I read your training code? This lets me understand your model architecture, loss function, optimizer setup, augmentation pipeline, and logging — so I can make smarter decisions during training (and occasionally suggest code changes). This uses extra tokens but gives much better results. (y/n)"

### If the user says yes:

Ask them to point you to the key files (or directories) — e.g. "Which files contain your training loop, model definition, loss, and data pipeline?"

Then read those files and produce a **training context summary** saved to `<run_dir>/hotcb.training_context.md`. This file is reused in future invocations so you don't re-read the codebase every time.

The summary should cover:

```markdown
# Training Context — <project name>
Generated by hotcb autopilot on <date>

## Model Architecture
- Type: (e.g. ResNet-50, GPT-2, ViT-B/16, custom UNet)
- Parameters: (approx count if visible)
- Key layers / heads: (what matters for loss routing, freezing, etc.)

## Loss Function
- Type: (CrossEntropy, MSE, multi-task weighted sum, custom)
- Terms: (list each term, its weight, whether it's toggleable)
- Key metric relationship: (which loss terms most affect which metrics)

## Optimizer & Scheduler
- Optimizer: (Adam, AdamW, SGD, etc.) with initial params (lr, wd, momentum)
- Scheduler: (cosine, step, warmup+decay, none)
- Known sensitivities: (e.g. "lr > 1e-3 likely unstable for this arch")

## Data Pipeline
- Dataset: (name/size if visible)
- Augmentations: (list key transforms)
- Batch size / accumulation: (if visible)

## Logging & Metrics
- What metrics are logged: (train_loss, val_loss, val_acc, grad_norm, lr, etc.)
- Logging frequency: (every step, every N steps, every epoch)
- Validation frequency: (every N steps, every epoch)

## Notable Code Patterns
- (Anything relevant: gradient clipping, mixed precision, EMA, custom callbacks, etc.)
```

**Keep the summary concise** — aim for 50-100 lines. This is context for the autopilot, not full documentation.

If `hotcb.training_context.md` already exists in the run dir, ask: "I found an existing training context from a previous session. Should I reuse it, or re-read the code?" If reuse, just load it. Do not re-scan.

### If the user says no:

Skip entirely. Proceed to Phase 1. You'll operate on metrics alone (still effective, just less context for decisions).

### Using the context during autopilot

When `hotcb.training_context.md` exists:
- **Inform your decisions**: e.g. if the model uses cosine scheduling, don't fight the scheduler with lr adjustments unless it's clearly broken
- **Suggest code changes** (karpathy-style): if you notice something during training that a code change would fix better than a knob turn — e.g. "your augmentation pipeline doesn't include mixup, which could help with the overfitting I'm seeing" or "your warmup is 100 steps but this model typically needs 500" — suggest the edit with a specific diff. Don't force it; suggest and let the user decide.
- **Loss term awareness**: if you know the loss is a weighted sum of 3 terms, you can make much smarter `loss set_params` calls
- **Architecture awareness**: know what "grad_norm is spiking" means for this specific model

## Phase 1: Quick Setup

### 1.1 Check hotcb installation

```bash
python3 -c "import hotcb; print('hotcb OK')" 2>&1 && python3 -c "import fastapi, uvicorn, websockets; print('dashboard deps OK')" 2>&1
```

If not installed: tell user to run `pip install "hotcb[dashboard]"` (or `pip install -e ".[dev,all]"` if developing from source).

### 1.2 Check for launch config

Look for `hotcb.launch.json` in the project root (or current directory). This file is created during integration and contains everything needed to launch — **if it exists, skip all user questions and go straight to Phase 2.**

```bash
cat hotcb.launch.json 2>/dev/null || echo "NOT_FOUND"
```

The file looks like:

```json
{
"train_fn": "my_project.train:train",
"run_dir": "./runs",
"key_metric": "val_loss",
"max_steps": 5000,
"max_time": 300,
"autopilot": "ai_suggest",
"port": 8421
}
```

All fields are optional except `train_fn` (or training must already be running). If `hotcb.launch.json` exists:
- Use its values as defaults
- Print: "Found hotcb.launch.json — launching with: train_fn=X, key_metric=Y, max_time=Z"
- Go directly to Phase 2 (still ask Phase 0 code-reading question if `hotcb.training_context.md` doesn't exist yet)

### 1.3 If no launch config — ask the user

Do NOT scan the repo for integration. Ask the user directly:

1. **Run directory**: "Where is your run directory? (path with `hotcb.metrics.jsonl`, or I'll create one)"
2. **Training status**: "Is training already running, or should I start it? If I should start it, what's the training function? (e.g. `my_module:train`)"
3. **Key metric**: "What metric should I optimize? (e.g. `val_loss`, `val_accuracy`)"
4. **Time/step limit**: "How long should training run? (e.g. `300` seconds, `5000` steps, or both)"
5. **Integration check**: "Does your training loop write `hotcb.metrics.jsonl` and read `hotcb.commands.jsonl`? (If not, I'll show you the 10-line integration snippet)"

If the user says integration isn't done, show this minimal contract:

```python
# Add to your training loop — 10 lines total
import json, os

def write_metrics(run_dir, step, metrics_dict):
with open(os.path.join(run_dir, "hotcb.metrics.jsonl"), "a") as f:
f.write(json.dumps({"step": step, "metrics": metrics_dict}) + "\n")

def read_commands(run_dir, offset=0):
path = os.path.join(run_dir, "hotcb.commands.jsonl")
if not os.path.exists(path): return [], offset
cmds = []
with open(path) as f:
for i, line in enumerate(f):
if i < offset: continue
line = line.strip()
if line:
try: cmds.append(json.loads(line))
except: pass
return cmds, offset + len(cmds)
```

Or for adapter users: `HotCBLightning(kernel)` / `HotCBHFCallback(kernel)` handle this automatically.

For the full integration reference (all options, metrics/command format, loss weight exposure, train_fn contract), read `INTEGRATION.md` in the hotcb repo. That file is designed to be self-contained context for AI agents working on external training projects.

## Phase 2: Launch

Build the launch command from either `hotcb.launch.json` values or user answers.

Use the autopilot mode from the config/user — do NOT override it to `off`. If the user specified `ai_suggest` or `ai_auto`, launch with that. If `hotcb.launch.json` has `"autopilot": "ai_suggest"`, use `--autopilot ai_suggest`. If no autopilot was specified, default to `ai_suggest` (since the user invoked this skill, they want AI optimization).

### 2.1 If training needs to be started:

```bash
hotcb launch \
--train-fn <train_fn> \
--run-dir <run_dir> \
--max-steps <max_steps> \
--max-time <max_time> \
--key-metric <key_metric> \
--port <port> \
--autopilot <autopilot_mode> &
LAUNCH_PID=$!
echo "Dashboard: http://localhost:<port>"
```

Omit `--max-steps` or `--max-time` if not specified. Include both if both are set (whichever limit hits first wins).

If `hotcb.launch.json` exists, you can also use:
```bash
hotcb launch --config-file hotcb.launch.json &
```

### 2.2 If training is already running (just attach dashboard):

```bash
hotcb serve --dir <run_dir> --port <port> --autopilot <autopilot_mode> &
SERVER_PID=$!
echo "Dashboard: http://localhost:<port>"
```

### 2.2 Wait for metrics

Poll until metrics start flowing:
```bash
for i in $(seq 1 30); do
if [ -s "<run_dir>/hotcb.metrics.jsonl" ]; then
echo "Metrics flowing"
break
fi
sleep 1
done
```

## Phase 3: Autopilot Loop

**This is where YOU (Claude Code) act as the AI autopilot.**

You replace the external LLM — you read metrics directly, reason about them, and issue hotcb commands via the REST API or CLI. No API key needed.

### 3.1 Read current state

Use the REST API to get current metrics and status:

```bash
# Get latest metrics
curl -s http://localhost:8421/api/metrics/history?last_n=20 | python3 -m json.tool

# Get metric names
curl -s http://localhost:8421/api/metrics/names | python3 -m json.tool

# Get current status
curl -s http://localhost:8421/api/status | python3 -m json.tool

# Get applied mutations
curl -s http://localhost:8421/api/applied/history?last_n=10 | python3 -m json.tool
```

Or read JSONL files directly:
```bash
tail -5 <run_dir>/hotcb.metrics.jsonl | python3 -m json.tool
```

### 3.2 Analyze trends

For each metric, compute:
- **Direction**: Is it going up, down, or flat?
- **Rate**: How fast is it changing?
- **Volatility**: Is it noisy or smooth?
- **Key events**: Any spikes, plateaus, or trend reversals?

Focus on the **key metric** but monitor all metrics for health.

If `hotcb.training_context.md` exists, cross-reference your analysis with the known model/loss/optimizer setup. For example: if context says cosine LR schedule is active, a slowly decreasing lr is expected, not a problem.

### 3.3 Decide and act

Based on your analysis:

**If training is healthy** (loss decreasing steadily, no divergence):
→ Do nothing. Report status to user. Check back in 50-100 steps.

**If loss has plateaued** (flat for 20+ steps):
→ Reduce learning rate by 50%:
```bash
curl -s -X POST http://localhost:8421/api/opt/set \
-H 'Content-Type: application/json' \
-d '{"params": {"lr": <new_lr>}}'
```
Or via CLI: `hotcb --dir <run_dir> set lr=<new_lr>`

**If loss is diverging** (increasing sharply):
→ Reduce learning rate aggressively (10x):
```bash
curl -s -X POST http://localhost:8421/api/opt/set \
-H 'Content-Type: application/json' \
-d '{"params": {"lr": <current_lr * 0.1>}}'
```

**If overfitting** (train loss << val loss, or val loss rising while train loss falls):
→ Increase weight decay:
```bash
curl -s -X POST http://localhost:8421/api/opt/set \
-H 'Content-Type: application/json' \
-d '{"params": {"weight_decay": <new_wd>}}'
```

**If multi-task and one task is dominating**:
→ Adjust loss weights:
```bash
curl -s -X POST http://localhost:8421/api/loss/set \
-H 'Content-Type: application/json' \
-d '{"params": {"weight_a": 0.5, "weight_b": 0.5}}'
```

### 3.4 Suggest code changes (if training context available)

When you have `hotcb.training_context.md`, you can occasionally suggest code-level fixes — things a knob turn can't do:

- "Your augmentation is missing X, which would help with the overfitting pattern I'm seeing"
- "The warmup schedule is too short for this model size — suggest changing line N in train.py"
- "Consider adding gradient clipping — grad_norm has been volatile"
- "The validation frequency is too low to catch divergence early — suggest validating every 50 steps instead of every epoch"

**Present as a suggestion with a concrete diff.** Don't apply without user approval. These are between-check-in suggestions, not every cycle.

### 3.5 Report to user

After each analysis cycle, report:
1. Current step and key metrics
2. Trend assessment (healthy/plateau/diverging/overfitting)
3. Action taken (if any) with reasoning
4. Next check-in time

Format:
```
## Step {N} — {Assessment}
- train_loss: {val} ({trend})
- val_loss: {val} ({trend})
- lr: {val}
- Action: {what you did and why, or "none — training healthy"}
- Next check: step {N+50}
```

### 3.6 Cadence

- **Normal**: Check every 50 steps
- **After intervention**: Check in 20 steps to see effect
- **If diverging**: Check every 10 steps
- **If healthy and improving**: Can extend to every 100 steps

### 3.7 Graduation principle

Start with small changes:
1. First intervention: lr *= 0.5
2. If no improvement after 30 steps: lr *= 0.5 again
3. If still no improvement: try weight_decay adjustment
4. Only after 3+ failed interventions: consider declaring run degenerate

### 3.8 Multi-run awareness

If a run is truly degenerate (NaN loss, diverged beyond recovery):
1. Save learnings about what failed
2. Suggest to user: "This run has diverged. I recommend restarting with lr={suggested_lr}."
3. Don't keep trying to fix an unfixable run

## Phase 4: Finalization

When training completes (metrics stop flowing or max_steps reached):

1. Summarize the run:
- Final key metric value
- Total interventions made
- What worked and what didn't
2. Export recipe: `hotcb --dir <run_dir> recipe export`
3. If training context was read, suggest improvements for next run (code-level)
4. Suggest next steps

## Key Principles

1. **Conservative**: When in doubt, do nothing. Healthy training doesn't need intervention.
2. **Graduated**: Small changes first. Never change lr by more than 50% in one step.
3. **Observable**: Wait enough steps after an intervention to see its effect (20-50 steps minimum).
4. **Transparent**: Always explain your reasoning to the user.
5. **Reversible**: Prefer changes that can be undone. Don't disable callbacks unless clearly needed.
6. **Code-aware**: When you have training context, use it. A code suggestion is sometimes better than a knob turn.

## Available hotcb Commands Reference

```bash
# Optimizer
hotcb --dir <run_dir> set lr=0.001
hotcb --dir <run_dir> set weight_decay=0.01
hotcb --dir <run_dir> opt set_params lr=0.001 weight_decay=0.01

# Loss
hotcb --dir <run_dir> loss set_params weight_a=0.7 weight_b=0.3

# Callbacks
hotcb --dir <run_dir> cb enable <id>
hotcb --dir <run_dir> cb disable <id>

# Status
hotcb --dir <run_dir> status

# Recipe export
hotcb --dir <run_dir> recipe export
```

## REST API Reference

```
GET /api/metrics/names — list discovered metric names
GET /api/metrics/history — recent metric records (?last_n=500)
GET /api/applied/history — applied mutations (?last_n=200)
GET /api/status — run status (freeze, files)
GET /api/health — server health
GET /api/train/status — training thread status
POST /api/opt/set — set optimizer params {"params": {"lr": 0.001}}
POST /api/loss/set — set loss params {"params": {"weight_a": 0.7}}
GET /api/autopilot/status — autopilot state
GET /api/autopilot/ai/status — AI autopilot state (cost, key_metric, etc.)
GET /api/autopilot/ai/history — AI decision history with reasoning
```
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@ dist/
*.egg-info/
.eggs/
.idea/
.claude/
.claude/worktrees/
.claude/setting*
lightning_logs/
data/*
# (optional) If you use pip editable installs, also ignore:
Expand Down
Loading