From bae5f30ea7f51d574d55d649d30b84c771b41328 Mon Sep 17 00:00:00 2001 From: sidphbot Date: Thu, 12 Mar 2026 00:51:11 +0100 Subject: [PATCH] claude skill added --- .claude/skills/hotcb-autopilot/SKILL.md | 387 ++++++++++++++++++++++++ .gitignore | 3 +- 2 files changed, 389 insertions(+), 1 deletion(-) create mode 100644 .claude/skills/hotcb-autopilot/SKILL.md diff --git a/.claude/skills/hotcb-autopilot/SKILL.md b/.claude/skills/hotcb-autopilot/SKILL.md new file mode 100644 index 0000000..bc901c9 --- /dev/null +++ b/.claude/skills/hotcb-autopilot/SKILL.md @@ -0,0 +1,387 @@ +--- +name: hotcb-autopilot +description: Launch AI-driven training optimization with hotcb. Checks installation, asks the user for run config, optionally deep-reads training code for context-aware optimization, then runs as the live autopilot — reading metrics, analyzing trends, and issuing hotcb commands to tune hyperparameters, loss weights, and callbacks during training. Use when the user wants to optimize a PyTorch training run with AI assistance. +--- + +# hotcb AI Autopilot — Claude Code Skill + +You are acting as the AI autopilot for hotcb, a live training control plane for PyTorch. +Your job: read training metrics, analyze trends, and issue hotcb commands to optimize training — all without restarting. + +## Phase 0: Understand the Training Code (optional, user-gated) + +**Before anything else**, ask the user one question: + +> "Can I read your training code? This lets me understand your model architecture, loss function, optimizer setup, augmentation pipeline, and logging — so I can make smarter decisions during training (and occasionally suggest code changes). This uses extra tokens but gives much better results. (y/n)" + +### If the user says yes: + +Ask them to point you to the key files (or directories) — e.g. "Which files contain your training loop, model definition, loss, and data pipeline?" + +Then read those files and produce a **training context summary** saved to `/hotcb.training_context.md`. This file is reused in future invocations so you don't re-read the codebase every time. + +The summary should cover: + +```markdown +# Training Context — +Generated by hotcb autopilot on + +## Model Architecture +- Type: (e.g. ResNet-50, GPT-2, ViT-B/16, custom UNet) +- Parameters: (approx count if visible) +- Key layers / heads: (what matters for loss routing, freezing, etc.) + +## Loss Function +- Type: (CrossEntropy, MSE, multi-task weighted sum, custom) +- Terms: (list each term, its weight, whether it's toggleable) +- Key metric relationship: (which loss terms most affect which metrics) + +## Optimizer & Scheduler +- Optimizer: (Adam, AdamW, SGD, etc.) with initial params (lr, wd, momentum) +- Scheduler: (cosine, step, warmup+decay, none) +- Known sensitivities: (e.g. "lr > 1e-3 likely unstable for this arch") + +## Data Pipeline +- Dataset: (name/size if visible) +- Augmentations: (list key transforms) +- Batch size / accumulation: (if visible) + +## Logging & Metrics +- What metrics are logged: (train_loss, val_loss, val_acc, grad_norm, lr, etc.) +- Logging frequency: (every step, every N steps, every epoch) +- Validation frequency: (every N steps, every epoch) + +## Notable Code Patterns +- (Anything relevant: gradient clipping, mixed precision, EMA, custom callbacks, etc.) +``` + +**Keep the summary concise** — aim for 50-100 lines. This is context for the autopilot, not full documentation. + +If `hotcb.training_context.md` already exists in the run dir, ask: "I found an existing training context from a previous session. Should I reuse it, or re-read the code?" If reuse, just load it. Do not re-scan. + +### If the user says no: + +Skip entirely. Proceed to Phase 1. You'll operate on metrics alone (still effective, just less context for decisions). + +### Using the context during autopilot + +When `hotcb.training_context.md` exists: +- **Inform your decisions**: e.g. if the model uses cosine scheduling, don't fight the scheduler with lr adjustments unless it's clearly broken +- **Suggest code changes** (karpathy-style): if you notice something during training that a code change would fix better than a knob turn — e.g. "your augmentation pipeline doesn't include mixup, which could help with the overfitting I'm seeing" or "your warmup is 100 steps but this model typically needs 500" — suggest the edit with a specific diff. Don't force it; suggest and let the user decide. +- **Loss term awareness**: if you know the loss is a weighted sum of 3 terms, you can make much smarter `loss set_params` calls +- **Architecture awareness**: know what "grad_norm is spiking" means for this specific model + +## Phase 1: Quick Setup + +### 1.1 Check hotcb installation + +```bash +python3 -c "import hotcb; print('hotcb OK')" 2>&1 && python3 -c "import fastapi, uvicorn, websockets; print('dashboard deps OK')" 2>&1 +``` + +If not installed: tell user to run `pip install "hotcb[dashboard]"` (or `pip install -e ".[dev,all]"` if developing from source). + +### 1.2 Check for launch config + +Look for `hotcb.launch.json` in the project root (or current directory). This file is created during integration and contains everything needed to launch — **if it exists, skip all user questions and go straight to Phase 2.** + +```bash +cat hotcb.launch.json 2>/dev/null || echo "NOT_FOUND" +``` + +The file looks like: + +```json +{ + "train_fn": "my_project.train:train", + "run_dir": "./runs", + "key_metric": "val_loss", + "max_steps": 5000, + "max_time": 300, + "autopilot": "ai_suggest", + "port": 8421 +} +``` + +All fields are optional except `train_fn` (or training must already be running). If `hotcb.launch.json` exists: +- Use its values as defaults +- Print: "Found hotcb.launch.json — launching with: train_fn=X, key_metric=Y, max_time=Z" +- Go directly to Phase 2 (still ask Phase 0 code-reading question if `hotcb.training_context.md` doesn't exist yet) + +### 1.3 If no launch config — ask the user + +Do NOT scan the repo for integration. Ask the user directly: + +1. **Run directory**: "Where is your run directory? (path with `hotcb.metrics.jsonl`, or I'll create one)" +2. **Training status**: "Is training already running, or should I start it? If I should start it, what's the training function? (e.g. `my_module:train`)" +3. **Key metric**: "What metric should I optimize? (e.g. `val_loss`, `val_accuracy`)" +4. **Time/step limit**: "How long should training run? (e.g. `300` seconds, `5000` steps, or both)" +5. **Integration check**: "Does your training loop write `hotcb.metrics.jsonl` and read `hotcb.commands.jsonl`? (If not, I'll show you the 10-line integration snippet)" + +If the user says integration isn't done, show this minimal contract: + +```python +# Add to your training loop — 10 lines total +import json, os + +def write_metrics(run_dir, step, metrics_dict): + with open(os.path.join(run_dir, "hotcb.metrics.jsonl"), "a") as f: + f.write(json.dumps({"step": step, "metrics": metrics_dict}) + "\n") + +def read_commands(run_dir, offset=0): + path = os.path.join(run_dir, "hotcb.commands.jsonl") + if not os.path.exists(path): return [], offset + cmds = [] + with open(path) as f: + for i, line in enumerate(f): + if i < offset: continue + line = line.strip() + if line: + try: cmds.append(json.loads(line)) + except: pass + return cmds, offset + len(cmds) +``` + +Or for adapter users: `HotCBLightning(kernel)` / `HotCBHFCallback(kernel)` handle this automatically. + +For the full integration reference (all options, metrics/command format, loss weight exposure, train_fn contract), read `INTEGRATION.md` in the hotcb repo. That file is designed to be self-contained context for AI agents working on external training projects. + +## Phase 2: Launch + +Build the launch command from either `hotcb.launch.json` values or user answers. + +Use the autopilot mode from the config/user — do NOT override it to `off`. If the user specified `ai_suggest` or `ai_auto`, launch with that. If `hotcb.launch.json` has `"autopilot": "ai_suggest"`, use `--autopilot ai_suggest`. If no autopilot was specified, default to `ai_suggest` (since the user invoked this skill, they want AI optimization). + +### 2.1 If training needs to be started: + +```bash +hotcb launch \ + --train-fn \ + --run-dir \ + --max-steps \ + --max-time \ + --key-metric \ + --port \ + --autopilot & +LAUNCH_PID=$! +echo "Dashboard: http://localhost:" +``` + +Omit `--max-steps` or `--max-time` if not specified. Include both if both are set (whichever limit hits first wins). + +If `hotcb.launch.json` exists, you can also use: +```bash +hotcb launch --config-file hotcb.launch.json & +``` + +### 2.2 If training is already running (just attach dashboard): + +```bash +hotcb serve --dir --port --autopilot & +SERVER_PID=$! +echo "Dashboard: http://localhost:" +``` + +### 2.2 Wait for metrics + +Poll until metrics start flowing: +```bash +for i in $(seq 1 30); do + if [ -s "/hotcb.metrics.jsonl" ]; then + echo "Metrics flowing" + break + fi + sleep 1 +done +``` + +## Phase 3: Autopilot Loop + +**This is where YOU (Claude Code) act as the AI autopilot.** + +You replace the external LLM — you read metrics directly, reason about them, and issue hotcb commands via the REST API or CLI. No API key needed. + +### 3.1 Read current state + +Use the REST API to get current metrics and status: + +```bash +# Get latest metrics +curl -s http://localhost:8421/api/metrics/history?last_n=20 | python3 -m json.tool + +# Get metric names +curl -s http://localhost:8421/api/metrics/names | python3 -m json.tool + +# Get current status +curl -s http://localhost:8421/api/status | python3 -m json.tool + +# Get applied mutations +curl -s http://localhost:8421/api/applied/history?last_n=10 | python3 -m json.tool +``` + +Or read JSONL files directly: +```bash +tail -5 /hotcb.metrics.jsonl | python3 -m json.tool +``` + +### 3.2 Analyze trends + +For each metric, compute: +- **Direction**: Is it going up, down, or flat? +- **Rate**: How fast is it changing? +- **Volatility**: Is it noisy or smooth? +- **Key events**: Any spikes, plateaus, or trend reversals? + +Focus on the **key metric** but monitor all metrics for health. + +If `hotcb.training_context.md` exists, cross-reference your analysis with the known model/loss/optimizer setup. For example: if context says cosine LR schedule is active, a slowly decreasing lr is expected, not a problem. + +### 3.3 Decide and act + +Based on your analysis: + +**If training is healthy** (loss decreasing steadily, no divergence): +→ Do nothing. Report status to user. Check back in 50-100 steps. + +**If loss has plateaued** (flat for 20+ steps): +→ Reduce learning rate by 50%: +```bash +curl -s -X POST http://localhost:8421/api/opt/set \ + -H 'Content-Type: application/json' \ + -d '{"params": {"lr": }}' +``` +Or via CLI: `hotcb --dir set lr=` + +**If loss is diverging** (increasing sharply): +→ Reduce learning rate aggressively (10x): +```bash +curl -s -X POST http://localhost:8421/api/opt/set \ + -H 'Content-Type: application/json' \ + -d '{"params": {"lr": }}' +``` + +**If overfitting** (train loss << val loss, or val loss rising while train loss falls): +→ Increase weight decay: +```bash +curl -s -X POST http://localhost:8421/api/opt/set \ + -H 'Content-Type: application/json' \ + -d '{"params": {"weight_decay": }}' +``` + +**If multi-task and one task is dominating**: +→ Adjust loss weights: +```bash +curl -s -X POST http://localhost:8421/api/loss/set \ + -H 'Content-Type: application/json' \ + -d '{"params": {"weight_a": 0.5, "weight_b": 0.5}}' +``` + +### 3.4 Suggest code changes (if training context available) + +When you have `hotcb.training_context.md`, you can occasionally suggest code-level fixes — things a knob turn can't do: + +- "Your augmentation is missing X, which would help with the overfitting pattern I'm seeing" +- "The warmup schedule is too short for this model size — suggest changing line N in train.py" +- "Consider adding gradient clipping — grad_norm has been volatile" +- "The validation frequency is too low to catch divergence early — suggest validating every 50 steps instead of every epoch" + +**Present as a suggestion with a concrete diff.** Don't apply without user approval. These are between-check-in suggestions, not every cycle. + +### 3.5 Report to user + +After each analysis cycle, report: +1. Current step and key metrics +2. Trend assessment (healthy/plateau/diverging/overfitting) +3. Action taken (if any) with reasoning +4. Next check-in time + +Format: +``` +## Step {N} — {Assessment} +- train_loss: {val} ({trend}) +- val_loss: {val} ({trend}) +- lr: {val} +- Action: {what you did and why, or "none — training healthy"} +- Next check: step {N+50} +``` + +### 3.6 Cadence + +- **Normal**: Check every 50 steps +- **After intervention**: Check in 20 steps to see effect +- **If diverging**: Check every 10 steps +- **If healthy and improving**: Can extend to every 100 steps + +### 3.7 Graduation principle + +Start with small changes: +1. First intervention: lr *= 0.5 +2. If no improvement after 30 steps: lr *= 0.5 again +3. If still no improvement: try weight_decay adjustment +4. Only after 3+ failed interventions: consider declaring run degenerate + +### 3.8 Multi-run awareness + +If a run is truly degenerate (NaN loss, diverged beyond recovery): +1. Save learnings about what failed +2. Suggest to user: "This run has diverged. I recommend restarting with lr={suggested_lr}." +3. Don't keep trying to fix an unfixable run + +## Phase 4: Finalization + +When training completes (metrics stop flowing or max_steps reached): + +1. Summarize the run: + - Final key metric value + - Total interventions made + - What worked and what didn't +2. Export recipe: `hotcb --dir recipe export` +3. If training context was read, suggest improvements for next run (code-level) +4. Suggest next steps + +## Key Principles + +1. **Conservative**: When in doubt, do nothing. Healthy training doesn't need intervention. +2. **Graduated**: Small changes first. Never change lr by more than 50% in one step. +3. **Observable**: Wait enough steps after an intervention to see its effect (20-50 steps minimum). +4. **Transparent**: Always explain your reasoning to the user. +5. **Reversible**: Prefer changes that can be undone. Don't disable callbacks unless clearly needed. +6. **Code-aware**: When you have training context, use it. A code suggestion is sometimes better than a knob turn. + +## Available hotcb Commands Reference + +```bash +# Optimizer +hotcb --dir set lr=0.001 +hotcb --dir set weight_decay=0.01 +hotcb --dir opt set_params lr=0.001 weight_decay=0.01 + +# Loss +hotcb --dir loss set_params weight_a=0.7 weight_b=0.3 + +# Callbacks +hotcb --dir cb enable +hotcb --dir cb disable + +# Status +hotcb --dir status + +# Recipe export +hotcb --dir recipe export +``` + +## REST API Reference + +``` +GET /api/metrics/names — list discovered metric names +GET /api/metrics/history — recent metric records (?last_n=500) +GET /api/applied/history — applied mutations (?last_n=200) +GET /api/status — run status (freeze, files) +GET /api/health — server health +GET /api/train/status — training thread status +POST /api/opt/set — set optimizer params {"params": {"lr": 0.001}} +POST /api/loss/set — set loss params {"params": {"weight_a": 0.7}} +GET /api/autopilot/status — autopilot state +GET /api/autopilot/ai/status — AI autopilot state (cost, key_metric, etc.) +GET /api/autopilot/ai/history — AI decision history with reasoning +``` diff --git a/.gitignore b/.gitignore index ac6abfb..e6f946f 100644 --- a/.gitignore +++ b/.gitignore @@ -5,7 +5,8 @@ dist/ *.egg-info/ .eggs/ .idea/ -.claude/ +.claude/worktrees/ +.claude/setting* lightning_logs/ data/* # (optional) If you use pip editable installs, also ignore: