Skip to content

Coreeze/HY-WorldPlay

ย 
ย 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

46 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐ŸŽฎ HY-World 1.5: A Systematic Framework for Interactive World Modeling with Real-Time Latency and Geometric Consistency

English | ็ฎ€ไฝ“ไธญๆ–‡


"Hold Infinity in the Palm of Your Hand, and Eternity in an Hour"

๐Ÿ”ฅ News

  • December 17, 2025: ๐Ÿ‘‹ We present the technical report and research paper of HY-World 1.5 (WorldPlay), please check out the details and spark some discussion!
  • December 17, 2025: ๐Ÿค— We release the first open-source, real-time interactive, and long-term geometric consistent world model, HY-World 1.5 (WorldPlay)!

Join our Wechat and Discord group to discuss and find help from us.

Wechat Group Xiaohongshu X Discord

๐Ÿ“‹ Table of Contents

๐Ÿ“– Introduction

While HY-World 1.0 is capable of generating immersive 3D worlds, it relies on a lengthy offline generation process and lacks real-time interaction. HY-World 1.5 bridges this gap with WorldPlay, a streaming video diffusion model that enables real-time, interactive world modeling with long-term geometric consistency, resolving the trade-off between speed and memory that limits current methods. Our model draws power from four key designs. 1) We use a Dual Action Representation to enable robust action control in response to the user's keyboard and mouse inputs. 2) To enforce long-term consistency, our Reconstituted Context Memory dynamically rebuilds context from past frames and uses temporal reframing to keep geometrically important but long-past frames accessible, effectively alleviating memory attenuation. 3) We design WorldCompass, a novel Reinforcement Learning (RL) post-training framework designed to directly improve the action-following and visual quality of the long-horizon, autoregressive video model. 4) We also propose Context Forcing, a novel distillation method designed for memory-aware models. Aligning memory context between the teacher and student preserves the student's capacity to use long-range information, enabling real-time speeds while preventing error drift. Taken together, HY-World 1.5 generates long-horizon streaming video at 24 FPS with superior consistency, comparing favorably with existing techniques. Our model shows strong generalization across diverse scenes, supporting first-person and third-person perspectives in both real-world and stylized environments, enabling versatile applications such as 3D reconstruction, promptable events, and infinite world extension.

โœจ Highlights

  • Systematic Overview

    HY-World 1.5 has open-sourced a systematic and comprehensive training framework for real-time world models, covering the entire pipeline and all stages, including data, training, and inference deployment. The technical report discloses detailed training specifics for model pre-training, middle-training, reinforcement learning post-training, and memory-aware model distillation. In addition, the report introduces a series of engineering techniques aimed at reducing network transmission latency and model inference latency, thereby achieving a real-time streaming inference experience for users.

  • Inference Pipeline

    Given a single image or text prompt to describe a world, our model performs a next chunk (16 video frames) prediction task to generate future videos conditioned on action from users. For the generation of each chunk, we dynamically reconstitute context memory from past chunks to enforce long-term temporal and geometric consistency.

๐Ÿ“œ System Requirements

  • GPU: NVIDIA GPU with CUDA support

  • Minimum GPU Memory: 14 GB (with model offloading enabled)

    Note: The memory requirements above are measured with model offloading enabled. If your GPU has sufficient memory, you may disable offloading for improved inference speed.

๐Ÿ› ๏ธ Dependencies and Installation

1. Create Environment

conda create --name worldplay python=3.10 -y
conda activate worldplay
pip install -r requirements.txt

2. Install Flash Attention (Optional but Recommended)

Install Flash Attention for faster inference and reduced GPU memory consumption:

pip install flash-attn --no-build-isolation

Detailed instructions: Flash Attention

3. Download All Required Models

We provide a download script that automatically downloads all required models:

python download_models.py --hf_token <your_huggingface_token>

Important: The vision encoder requires access to a gated model. Before running:

  1. Request access at: https://huggingface.co/black-forest-labs/FLUX.1-Redux-dev
  2. Wait for approval (usually instant)
  3. Create/get your access token at: https://huggingface.co/settings/tokens (select "Read" permission)

If you don't have FLUX access yet, you can skip the vision encoder:

python download_models.py --skip_vision_encoder

The script downloads:

  • HY-WorldPlay action models (~32GB each)
  • HunyuanVideo-1.5 base model (vae, scheduler, 480p transformer)
  • Qwen2.5-VL-7B-Instruct text encoder (~15GB)
  • ByT5 encoders (byt5-small + Glyph-SDXL-v2)
  • SigLIP vision encoder (from FLUX.1-Redux-dev)

After download completes, the script will print the model paths to add to run.sh.

๐ŸŽฎ Quick Start

We provide a demo for the HY-World 1.5 model for quick start.

demo.mp4

Try our online demo without installation: https://3d.hunyuan.tencent.com/sceneTo3D

๐Ÿงฑ Model Checkpoints

Model Download
HY-World1.5-Bidirectional-480P-I2V Link
HY-World1.5-Autoregressive-480P-I2V Link
HY-World1.5-Autoregressive-480P-I2V-distill Link

๐Ÿ”‘ Inference

Configure Model Paths

After running download_models.py, update run.sh with the printed model paths:

# These paths are printed by download_models.py after download completes
MODEL_PATH=<path_printed_by_download_script>
AR_ACTION_MODEL_PATH=<path_printed_by_download_script>/ar_model
BI_ACTION_MODEL_PATH=<path_printed_by_download_script>/bidirectional_model
AR_DISTILL_ACTION_MODEL_PATH=<path_printed_by_download_script>/ar_distilled_action_model

Configuration Options

In run.sh, you can configure:

Parameter Description
PROMPT Text description of the scene
IMAGE_PATH Input image path (required for I2V)
NUM_FRAMES Number of frames to generate (default: 125)
N_INFERENCE_GPU Number of GPUs for parallel inference
POSE_JSON_PATH Camera trajectory file

Model Selection

Uncomment one of the three inference commands in run.sh:

  1. Bidirectional Model:

    --action_ckpt $BI_ACTION_MODEL_PATH --model_type 'bi'
  2. Autoregressive Model:

    --action_ckpt $AR_ACTION_MODEL_PATH --model_type 'ar'
  3. Distilled Model:

    --action_ckpt $AR_DISTILL_ACTION_MODEL_PATH --few_step true --num_inference_steps 4 --model_type 'ar'

Custom Camera Trajectories

Use generate_custom_trajectory.py to create custom camera paths:

python generate_custom_trajectory.py

Prompt Rewriting (Optional)

For better prompts, you can enable prompt rewriting with a vLLM server:

export T2V_REWRITE_BASE_URL="<your_vllm_server_base_url>"
export T2V_REWRITE_MODEL_NAME="<your_model_name>"
REWRITE=true  # in run.sh

Run Inference

After editing run.sh to configure your settings, run:

bash run.sh

๐Ÿ“Š Evaluation

HY-World 1.5 surpasses existing methods across various quantitative metrics, including reconstruction metrics for different video lengths and human evaluations.

Model Real-time Short-term Long-term
PSNR โฌ† SSIM โฌ† LPIPS โฌ‡ $R_{dist}$ โฌ‡ $T_{dist}$ โฌ‡ PSNR โฌ† SSIM โฌ† LPIPS โฌ‡ $R_{dist}$ โฌ‡ $T_{dist}$ โฌ‡
CameraCtrl โŒ 17.93 0.569 0.298 0.037 0.341 10.09 0.241 0.549 0.733 1.117
SEVA โŒ 19.84 0.598 0.313 0.047 0.223 10.51 0.301 0.517 0.721 1.893
ViewCrafter โŒ 19.91 0.617 0.327 0.029 0.543 9.32 0.271 0.661 1.573 3.051
Gen3C โŒ 21.68 0.635 0.278 0.024 0.477 15.37 0.431 0.483 0.357 0.979
VMem โŒ 19.97 0.587 0.316 0.048 0.219 12.77 0.335 0.542 0.748 1.547
Matrix-Game-2.0 โœ… 17.26 0.505 0.383 0.287 0.843 9.57 0.205 0.631 2.125 2.742
GameCraft โŒ 21.05 0.639 0.341 0.151 0.617 10.09 0.287 0.614 2.497 3.291
Ours (w/o Context Forcing) โŒ 21.27 0.669 0.261 0.033 0.157 16.27 0.425 0.495 0.611 0.991
Ours (full) โœ… 21.92 0.702 0.247 0.031 0.121 18.94 0.585 0.371 0.332 0.797

๐ŸŽฌ More Examples

teaser_geo.mp4
event_output_video.mp4
output_video_5.mp4

๐Ÿ“ TODO

  • Acceleration & Quantization
  • Open-source training code

๐Ÿ“š Citation

@article{hyworld2025,
  title={HY-World 1.5: A Systematic Framework for Interactive World Modeling with Real-Time Latency and Geometric Consistency},
  author={Team HunyuanWorld},
  journal={arXiv preprint},
  year={2025}
}

@article{worldplay2025,
    title={WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Model},
    author={Wenqiang Sun and Haiyu Zhang and Haoyuan Wang and Junta Wu and Zehan Wang and Zhenwei Wang and Yunhong Wang and Jun Zhang and Tengfei Wang and Chunchao Guo},
    year={2025},
    journal={arXiv preprint}
}

@inproceedings{wang2025compass,
  title={WorldCompass: Reinforcement Learning for Long-Horizon World Models},
  author={Zehan Wang and Tengfei Wang and Haiyu Zhang and Wenqiang Sun and Junta Wu and Haoyuan Wang and Zhenwei Wang and Hengshuang Zhao and Chunchao Guo and Zhou Zhao},
  journal = {arXiv preprint},
  year = 2025
}

Contact

Please send emails to tengfeiwang12@gmail.com if there is any question

๐Ÿ™ Acknowledgements

We would like to thank HunyuanWorld, HunyuanWorld-Mirror, HunyuanVideo, and FastVideo for their great work.

About

WorldPlay: Interactive World Modeling with Real-Time Latency and Geometric Consistency

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 99.3%
  • Shell 0.7%