Being-H is BeingBeyond's family of human-centric embodied foundation models. Within this repository, Being-H0.7 is our flagship WAM model and Being-H0.5 is our flagship VLA model.
| Project | Positioning | Summary | Links |
|---|---|---|---|
| Being-H0.7 | Flagship WAM | A latent world-action model from egocentric videos with future-aware latent reasoning. | Blog / Paper |
| Being-H0.5 | Flagship VLA | A human-centric VLA model for cross-embodiment generalization with a unified action space. | Blog / Paper / Models |
| Being-H0 | Previous VLA | The first Being-H release for human-video VLA pretraining. | Blog / Paper / Models |
- [2026-05-01]: Being-H0 is accepted by ICML 2026! Welcome to connect with the BeingBeyond Team at the venue then! 🔥🔥
- [2026-04-14]: We publish Being-H0.7, our flagship WAM model. See the blog and paper. Code and checkpoints are coming soon!
- [2026-03-20]: We release the UniHand_Preview dataset, a subset of the Being-H0.5 pre-training mixture.
- [2026-01-24]: We update the H0.5 training, inference, and data preparation docs, and open-source post-training data for PND Adam-U through our Hugging Face dataset collection.
- [2026-01-20]: We publish Being-H0.5, our flagship VLA model for cross-embodiment generalization.
- [2025-08-02]: We release the Being-H0 codebase and pretrained models through the BeingBeyond Hugging Face collections.
- [2025-07-21]: We publish Being-H0, our first human-video VLA release. Read the paper.
We are seeing a growing set of excellent projects built on top of the Being-H family:
- Unmasking the Illusion of Embodied Reasoning in Vision-Language-Action Models. arXiv 26'04 | website | GitHub
- Conservative Offline Robot Policy Learning via Posterior-Transition Reweighting. arXiv 26'03 | website | GitHub
- DexHiL: A Human-in-the-Loop Framework for Vision-Language-Action Model Post-Training in Dexterous Manipulation. arXiv 26'03 | website
- Joint-Aligned Latent Action: Towards Scalable VLA Pretraining in the Wild. arXiv 26'02 | website | GitHub
- Rethinking Visual-Language-Action Model Scaling: Alignment, Mixture, and Regularization. arXiv 26'02 | website | GitHub
- Spatial-Aware VLA Pretraining through Visual-Physical Alignment from Human Videos. arXiv 25'12 | website | GitHub
Feel free to open a pull request if you want to share work built on Being-H.
If you find the Being-H family useful, please consider citing the relevant release:
Being-H0.7
@article{beingbeyond2026beingh07,
title={Being-H0. 7: A Latent World-Action Model from Egocentric Videos},
author={Luo, Hao and Zhang, Wanpeng and Feng, Yicheng and Zheng, Sipeng and Xu, Haiweng and Xu, Chaoyi and Xi, Ziheng and Fu, Yuhui and Lu, Zongqing},
journal={arXiv preprint arXiv:2605.00078},
year={2026}
}Being-H0.5
@article{beingbeyond2026beingh05,
title={Being-H0. 5: Scaling Human-Centric Robot Learning for Cross-Embodiment Generalization},
author={Luo, Hao and Wang, Ye and Zhang, Wanpeng and Zheng, Sipeng and Xi, Ziheng and Xu, Chaoyi and Xu, Haiweng and Yuan, Haoqi and Zhang, Chi and Wang, Yiqing and others},
journal={arXiv preprint arXiv:2601.12993},
year={2026}
}Being-H0
@inproceedings{beingbeyond2025beingh0,
title={Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos},
author={Luo, Hao and Feng, Yicheng and Zhang, Wanpeng and Zheng, Sipeng and Wang, Ye and Yuan, Haoqi and Liu, Jiazheng and Xu, Chaoyi and Jin, Qin and Lu, Zongqing},
booktitle={International Conference on Machine Learning},
year={2026},
organization={PMLR}
}This repository is released under Apache-2.0. See LICENSE.