feat: Phase 5 MuJoCo Playground PPO benchmarks (54 envs)#535
Merged
Conversation
54 MuJoCo Playground environments benchmarked with PPO: - Phase 5.1: DM Control Suite (25 envs) - Phase 5.2: Locomotion Robots (19 envs) - Phase 5.3: Manipulation (10 envs) Code changes: - playground.py: suppress MuJoCo stderr warnings, fix dict-obs to use only "state" key for asymmetric-obs envs - ppo_playground.yaml: add loco_precise, loco_go1, manip_aloha_peg, manip_dexterous spec variants - dstack config: add PYTHONUNBUFFERED=1 Results: 38/54 envs pass targets. All data on public SLM-Lab/benchmark. Every row has matching score + HF link + plot. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
33df139 to
3259e30
Compare
.githooks/commit-msg validates conventional commits and auto-bumps pyproject.toml version. Idempotent: always bumps from master base, so repeated commits on feature branches converge to same version. Rules: feat → minor, breaking (!) → major, everything else → patch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3259e30 to
4d17e07
Compare
|
🎉 This PR is included in version 5.3.0 🎉 The release is available on GitHub release Your semantic-release bot 📦🚀 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
54 MuJoCo Playground environments benchmarked with PPO across DM Control Suite (25), Locomotion Robots (19), and Manipulation (10). All results audited and graduated to public HuggingFace.
Changes
slm_lab/env/playground.py: Suppress MuJoCo C-level stderr warnings; fix dict-obs handling for asymmetric-obs envsslm_lab/spec/benchmark_arc/ppo/ppo_playground.yaml: New spec variants (loco_precise, loco_go1, manip_aloha_peg, manip_dexterous).dstack/run-gpu-train.yml: Add PYTHONUNBUFFERED=1docs/BENCHMARKS.md: Phase 5 results — 38/54 pass targetsdocs/plots/: 54 training curve plotsResults
loco_precise(clip=0.2, entropy=0.005) breakthrough for locomotionSLM-Lab/benchmarkwith matching scores, links, plotsTest plan
uv run python3 -c "from slm_lab.env.playground import PlaygroundVecEnv; print('OK')"🤖 Generated with Claude Code