Skip to content

BeingBeyond/Being-H

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Being-H

Being-H is BeingBeyond's family of human-centric embodied foundation models. Within this repository, Being-H0.7 is our flagship WAM model and Being-H0.5 is our flagship VLA model.

Model Family

Project         Positioning           Summary Links                              
Being-H0.7 Flagship WAM A latent world-action model from egocentric videos with future-aware latent reasoning. Blog / Paper
Being-H0.5 Flagship VLA A human-centric VLA model for cross-embodiment generalization with a unified action space. Blog / Paper / Models
Being-H0 Previous VLA The first Being-H release for human-video VLA pretraining. Blog / Paper / Models

News

  • [2026-05-01]: Being-H0 is accepted by ICML 2026! Welcome to connect with the BeingBeyond Team at the venue then! 🔥🔥
  • [2026-04-14]: We publish Being-H0.7, our flagship WAM model. See the blog and paper. Code and checkpoints are coming soon!
  • [2026-03-20]: We release the UniHand_Preview dataset, a subset of the Being-H0.5 pre-training mixture.
  • [2026-01-24]: We update the H0.5 training, inference, and data preparation docs, and open-source post-training data for PND Adam-U through our Hugging Face dataset collection.
  • [2026-01-20]: We publish Being-H0.5, our flagship VLA model for cross-embodiment generalization.
  • [2025-08-02]: We release the Being-H0 codebase and pretrained models through the BeingBeyond Hugging Face collections.
  • [2025-07-21]: We publish Being-H0, our first human-video VLA release. Read the paper.

Projects Based on Being-H

We are seeing a growing set of excellent projects built on top of the Being-H family:

  • Unmasking the Illusion of Embodied Reasoning in Vision-Language-Action Models. arXiv 26'04 | website | GitHub
  • Conservative Offline Robot Policy Learning via Posterior-Transition Reweighting. arXiv 26'03 | website | GitHub
  • DexHiL: A Human-in-the-Loop Framework for Vision-Language-Action Model Post-Training in Dexterous Manipulation. arXiv 26'03 | website
  • Joint-Aligned Latent Action: Towards Scalable VLA Pretraining in the Wild. arXiv 26'02 | website | GitHub
  • Rethinking Visual-Language-Action Model Scaling: Alignment, Mixture, and Regularization. arXiv 26'02 | website | GitHub
  • Spatial-Aware VLA Pretraining through Visual-Physical Alignment from Human Videos. arXiv 25'12 | website | GitHub

Feel free to open a pull request if you want to share work built on Being-H.

Citation

If you find the Being-H family useful, please consider citing the relevant release:

Being-H0.7

@article{beingbeyond2026beingh07,
  title={Being-H0. 7: A Latent World-Action Model from Egocentric Videos},
  author={Luo, Hao and Zhang, Wanpeng and Feng, Yicheng and Zheng, Sipeng and Xu, Haiweng and Xu, Chaoyi and Xi, Ziheng and Fu, Yuhui and Lu, Zongqing},
  journal={arXiv preprint arXiv:2605.00078},
  year={2026}
}

Being-H0.5

@article{beingbeyond2026beingh05,
  title={Being-H0. 5: Scaling Human-Centric Robot Learning for Cross-Embodiment Generalization},
  author={Luo, Hao and Wang, Ye and Zhang, Wanpeng and Zheng, Sipeng and Xi, Ziheng and Xu, Chaoyi and Xu, Haiweng and Yuan, Haoqi and Zhang, Chi and Wang, Yiqing and others},
  journal={arXiv preprint arXiv:2601.12993},
  year={2026}
}

Being-H0

@inproceedings{beingbeyond2025beingh0,
  title={Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos},
  author={Luo, Hao and Feng, Yicheng and Zhang, Wanpeng and Zheng, Sipeng and Wang, Ye and Yuan, Haoqi and Liu, Jiazheng and Xu, Chaoyi and Jin, Qin and Lu, Zongqing},
  booktitle={International Conference on Machine Learning},
  year={2026},
  organization={PMLR}
}

License

This repository is released under Apache-2.0. See LICENSE.

About

Being-H is BeingBeyond's family of human-centric embodied foundation models.

Resources

License

Stars

Watchers

Forks

Contributors