🤖 BeTTER: Unmasking the Illusion of Embodied Reasoning in Vision-Language-Action Models

[Unmasking the Illusion of Embodied Reasoning in Vision-Language-Action Models]
Haiweng Xu, Sipeng Zheng, Hao Luo, Wanpeng Zhang, Ziheng Xi, Zongqing Lu
Peking University, Tsinghua University, BeingBeyond

This is the official repository for the paper "Unmasking the Illusion of Embodied Reasoning in Vision-Language-Action Models", which introduces the BeTTER benchmark.

📖 About

Recent Vision-Language-Action (VLA) models report impressive success rates on standard robotic benchmarks, projecting an illusion of robust semantic grounding and sequential planning. BeTTER is a diagnostic benchmark designed to break this illusion. By applying targeted causal interventions while enforcing kinematic isolation, BeTTER explicitly decouples high-level reasoning failures from low-level execution limits, unmasking severe cognitive deficits such as behavioral inertia and semantic feature collapse in state-of-the-art VLAs.

🚀 Release Roadmap

We are actively working to clean up and open-source the codebase. To ensure high quality, we will release the components progressively. Watch 👀 and Star ⭐ this repository to stay updated!

Paper Release: ArXiv preprint available.
Phase 1: Asset Curation & Task Generation Pipeline
- VLM-guided task instantiation templates.
- Open-vocabulary 3D asset retrieval and integration (via Objaverse).
Phase 2: The BeTTER Benchmark Suite & Evaluation
- The complete suite of 10 base manipulation tasks and 60 diagnostic variations.
- Standardized evaluation scripts and testing environments.
Phase 3: Data Augmentation & Privileged Logging
- Teleoperation trajectory amplification pipeline (incorporating MimicGen).
- Deterministic privileged state logging and VQA generation scripts.

🛠️ Installation & Usage

(Code and instructions are coming soon. Please stay tuned!)

📝 Citation

If you find our benchmark, analysis, or data pipelines useful in your research, please consider citing our work:

@article{xu2026unmasking,
  title={Unmasking the Illusion of Embodied Reasoning in Vision-Language-Action Models},
  author={Xu, Haiweng and Zheng, Sipeng and Luo, Hao and Zhang, Wanpeng and Xi, Ziheng and Lu, Zongqing},
  journal={arXiv preprint arXiv:2604.18000},
  year={2026}
}

🙏 Acknowledgements

We would like to thank the open-source community, particularly the developers of Objaverse and MimicGen, whose foundational tools greatly facilitated the development of this benchmark.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
docs/static/images		docs/static/images
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 BeTTER: Unmasking the Illusion of Embodied Reasoning in Vision-Language-Action Models

📖 About

🚀 Release Roadmap

🛠️ Installation & Usage

📝 Citation

🙏 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

🤖 BeTTER: Unmasking the Illusion of Embodied Reasoning in Vision-Language-Action Models

📖 About

🚀 Release Roadmap

🛠️ Installation & Usage

📝 Citation

🙏 Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages