feat: Phase 5 MuJoCo Playground PPO benchmarks (54 envs) by kengz · Pull Request #535 · kengz/SLM-Lab

kengz · 2026-03-20T18:46:47Z

Summary

54 MuJoCo Playground environments benchmarked with PPO across DM Control Suite (25), Locomotion Robots (19), and Manipulation (10). All results audited and graduated to public HuggingFace.

Changes

slm_lab/env/playground.py: Suppress MuJoCo C-level stderr warnings; fix dict-obs handling for asymmetric-obs envs
slm_lab/spec/benchmark_arc/ppo/ppo_playground.yaml: New spec variants (loco_precise, loco_go1, manip_aloha_peg, manip_dexterous)
.dstack/run-gpu-train.yml: Add PYTHONUNBUFFERED=1
docs/BENCHMARKS.md: Phase 5 results — 38/54 pass targets
docs/plots/: 54 training curve plots

Results

38/54 pass target scores
loco_precise (clip=0.2, entropy=0.005) breakthrough for locomotion
Obs fix unblocked Go1Getup (0→18) and Go1Handstand (6→18)
All data on public SLM-Lab/benchmark with matching scores, links, plots

Test plan

uv run python3 -c "from slm_lab.env.playground import PlaygroundVecEnv; print('OK')"
Spot-check HF links resolve
Phase 1-4 unchanged

🤖 Generated with Claude Code

54 MuJoCo Playground environments benchmarked with PPO: - Phase 5.1: DM Control Suite (25 envs) - Phase 5.2: Locomotion Robots (19 envs) - Phase 5.3: Manipulation (10 envs) Code changes: - playground.py: suppress MuJoCo stderr warnings, fix dict-obs to use only "state" key for asymmetric-obs envs - ppo_playground.yaml: add loco_precise, loco_go1, manip_aloha_peg, manip_dexterous spec variants - dstack config: add PYTHONUNBUFFERED=1 Results: 38/54 envs pass targets. All data on public SLM-Lab/benchmark. Every row has matching score + HF link + plot. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

.githooks/commit-msg validates conventional commits and auto-bumps pyproject.toml version. Idempotent: always bumps from master base, so repeated commits on feature branches converge to same version. Rules: feat → minor, breaking (!) → major, everything else → patch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-03-20T19:07:46Z

🎉 This PR is included in version 5.3.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

kengz force-pushed the feat/phase5-benchmarks branch from 33df139 to 3259e30 Compare March 20, 2026 18:58

kengz force-pushed the feat/phase5-benchmarks branch from 3259e30 to 4d17e07 Compare March 20, 2026 19:03

kengz merged commit 82b1755 into master Mar 20, 2026
3 checks passed

kengz deleted the feat/phase5-benchmarks branch March 20, 2026 19:07

github-actions bot added the released label Mar 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Phase 5 MuJoCo Playground PPO benchmarks (54 envs)#535

feat: Phase 5 MuJoCo Playground PPO benchmarks (54 envs)#535
kengz merged 2 commits intomasterfrom
feat/phase5-benchmarks

kengz commented Mar 20, 2026

Uh oh!

Uh oh!

github-actions bot commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kengz commented Mar 20, 2026

Summary

Changes

Results

Test plan

Uh oh!

Uh oh!

github-actions bot commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant