Skip to content

feat: Phase 5 MuJoCo Playground PPO benchmarks (54 envs)#535

Merged
kengz merged 2 commits intomasterfrom
feat/phase5-benchmarks
Mar 20, 2026
Merged

feat: Phase 5 MuJoCo Playground PPO benchmarks (54 envs)#535
kengz merged 2 commits intomasterfrom
feat/phase5-benchmarks

Conversation

@kengz
Copy link
Copy Markdown
Owner

@kengz kengz commented Mar 20, 2026

Summary

54 MuJoCo Playground environments benchmarked with PPO across DM Control Suite (25), Locomotion Robots (19), and Manipulation (10). All results audited and graduated to public HuggingFace.

Changes

  • slm_lab/env/playground.py: Suppress MuJoCo C-level stderr warnings; fix dict-obs handling for asymmetric-obs envs
  • slm_lab/spec/benchmark_arc/ppo/ppo_playground.yaml: New spec variants (loco_precise, loco_go1, manip_aloha_peg, manip_dexterous)
  • .dstack/run-gpu-train.yml: Add PYTHONUNBUFFERED=1
  • docs/BENCHMARKS.md: Phase 5 results — 38/54 pass targets
  • docs/plots/: 54 training curve plots

Results

  • 38/54 pass target scores
  • loco_precise (clip=0.2, entropy=0.005) breakthrough for locomotion
  • Obs fix unblocked Go1Getup (0→18) and Go1Handstand (6→18)
  • All data on public SLM-Lab/benchmark with matching scores, links, plots

Test plan

  • uv run python3 -c "from slm_lab.env.playground import PlaygroundVecEnv; print('OK')"
  • Spot-check HF links resolve
  • Phase 1-4 unchanged

🤖 Generated with Claude Code

54 MuJoCo Playground environments benchmarked with PPO:
- Phase 5.1: DM Control Suite (25 envs)
- Phase 5.2: Locomotion Robots (19 envs)
- Phase 5.3: Manipulation (10 envs)

Code changes:
- playground.py: suppress MuJoCo stderr warnings, fix dict-obs to use
  only "state" key for asymmetric-obs envs
- ppo_playground.yaml: add loco_precise, loco_go1, manip_aloha_peg,
  manip_dexterous spec variants
- dstack config: add PYTHONUNBUFFERED=1

Results: 38/54 envs pass targets. All data on public SLM-Lab/benchmark.
Every row has matching score + HF link + plot.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@kengz kengz force-pushed the feat/phase5-benchmarks branch from 33df139 to 3259e30 Compare March 20, 2026 18:58
.githooks/commit-msg validates conventional commits and auto-bumps
pyproject.toml version. Idempotent: always bumps from master base,
so repeated commits on feature branches converge to same version.

Rules: feat → minor, breaking (!) → major, everything else → patch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@kengz kengz force-pushed the feat/phase5-benchmarks branch from 3259e30 to 4d17e07 Compare March 20, 2026 19:03
@kengz kengz merged commit 82b1755 into master Mar 20, 2026
3 checks passed
@kengz kengz deleted the feat/phase5-benchmarks branch March 20, 2026 19:07
@github-actions
Copy link
Copy Markdown

🎉 This PR is included in version 5.3.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant