Scope plugin providing pipelines for Overworld world models.
- Waypoint 1.5 — Generate worlds at 720p / up to 60 fps using the Waypoint-1.5-1B model (Apache-2.0).
- Waypoint 1.5 (360p) — Lighter-weight variant for laptop-class NVIDIA GPUs via Waypoint-1.5-1B-360P.
Platform note — NVIDIA only. This plugin runs on CUDA-capable NVIDIA GPUs (Linux and Windows). The underlying
world_engineinference library has no Metal / MPS support today, so macOS / Apple Silicon is not supported via this Scope plugin. Mac users who want to try Waypoint-1.5 should use Overworld's native Biome desktop app, which has its own Mac build independent of this plugin.
| Variant | Target hardware | Approx. FPS |
|---|---|---|
| Waypoint 1.5 (720p) | RTX 5090 | 56 fps unquantized, 72 fps with fp8w8a8 |
| Waypoint 1.5 (720p) | RTX 3090 | ~30 fps with intw8a8 |
| Waypoint 1.5 (360p) | Laptop-class NVIDIA GPUs (RTX 30xx mobile and up) | Real-time up to 60 fps |
Quantization options exposed in the UI (load-time setting):
intw8a8— INT8 weights/activations, requires NVIDIA Ampere+ (30xx)fp8w8a8— FP8, requires Ada Lovelace / Hopper+nvfp4— NVFP4, requires Blackwell and the flashinfer kernel library. flashinfer ships Linux-only wheels, so this tier is only available on Linux today; Windows users should pickintw8a8orfp8w8a8.
Model weights may require HuggingFace authentication. See the HuggingFace guide for setup instructions.
Follow the Scope plugins guide to install this plugin using the URL:
https://github.com/daydreamlive/scope-overworld.git
Follow the Scope plugins guide to upgrade this plugin to the latest version.
Both waypoint and waypoint_360p pipelines use world_engine for inference. The model is an autoregressive Diffusion Transformer with a bundled Tiny Hunyuan Autoencoder (taehv1_5) providing 4× temporal and 8× spatial compression. On first load, a JIT warmup pass runs to trigger compilation.
Waypoint-1.5 is controller-driven (keyboard + mouse) with optional starter-image conditioning; it has no text-prompt input. Each inference step: controller input is processed → world_engine generates the next 4-frame chunk at the target resolution → Scope's pipeline processor splits the chunk into per-frame packets for the output stream.
Context window: 512 frames (~10 seconds at 60 fps).