English | δΈζ
VisionClaw brings an interactive 3D AI character to your Apple Vision Pro. A lively animated character sits on your desk, listens to your voice, talks back with real speech, and connects to your Mac's AI brain β all in mixed reality.
"Like having a tiny AI assistant living on your desk, with personality."
Click the preview to watch the full demo video
- 15+ hand-crafted animations β idle, listening, thinking, working, celebrating, sleeping, and more
- Reactive state machine β the character visually responds to every interaction stage
- Gesture control β drag to reposition, pinch to scale, two-hand rotate to turn
- Always alive β idle variations, easter egg dances, drowsy yawns, and sleep cycles
- Tap to talk β tap the character to start listening, tap again to send
- Real-time transcription β see your words appear in a floating speech bubble as you speak
- Chinese speech recognition β powered by Apple's on-device
SFSpeechRecognizer - Text-to-speech responses β the character speaks back with natural Chinese TTS
- OpenClaw integration β connects to your Mac Mini running the OpenClaw AI agent via WebSocket
- Auto-discovery β finds your Mac on the local network via Bonjour
- Live status feedback β see thinking, working, and processing states in real-time
- Progressive timeout β clear feedback at 10s, 30s, 60s if the AI takes long
- Typewriter effect β responses appear character by character with adaptive speed
- Chinese-optimized β slower display for Chinese characters, faster for English, pauses on punctuation
- State icons β π€ listening, β¨ sending, π thinking, βοΈ working, β success,
β οΈ error - Auto-dismiss β generous reading time calculated for Chinese reading speed (~3 chars/sec)
- Mixed reality β character exists in your real environment with shadows
- Free positioning β drag the character anywhere in 3D space (horizontal + vertical)
- Pinch to resize β scale from tiny (1cm) to large (60cm)
- Billboard bubble β speech bubble always faces you automatically
Apple Vision Pro Mac Mini
βββββββββββββββββββββββ ββββββββββββββββββββ
β VisionClaw App β βββWebSocketβββΊ β OpenClaw Bridge β
β β β (Python) β
β βββββββββββββββββ β β ββββββββββββββ β
β β ShrimpEntity β β Bonjour β β OpenClaw β β
β β (3D Character)β β Discovery β β AI Agent β β
β βββββββββββββββββ€ β β ββββββββββββββ β
β β AnimControllerβ β ββββββββββββββββββββ
β β (15+ anims) β β
β βββββββββββββββββ€ β
β β SpeechManager β β
β β (STT + TTS) β β
β βββββββββββββββββ€ β
β β Bubble3D β β
β β (SwiftUI in β β
β β RealityKit) β β
β βββββββββββββββββ β
βββββββββββββββββββββββ
| Component | File | Purpose |
|---|---|---|
ShrimpEntity |
ShrimpEntity.swift |
3D model loading, dual-entity hierarchy (root wrapper + animated model) |
ShrimpAnimationController |
ShrimpAnimationController.swift |
15+ animation clips, state-driven transitions, idle variations |
ShrimpAnimationSystem |
ShrimpAnimationSystem.swift |
RealityKit ECS system for per-frame updates + bubble positioning |
ShrimpBubble3D |
ShrimpBubble3D.swift |
ViewAttachmentComponent-based 3D speech bubble with typewriter |
SpeechManager |
SpeechManager.swift |
SFSpeechRecognizer (async audio setup) + AVSpeechSynthesizer |
SessionManager |
SessionManager.swift |
Central state machine orchestrating all interactions |
NetworkManager |
NetworkManager.swift |
Bonjour discovery + WebSocket to Mac Mini |
OpenClawBridge |
bridge.py |
Python WebSocket bridge between Vision Pro and OpenClaw |
- Apple Vision Pro (or visionOS Simulator)
- Xcode 26+ with visionOS 26 SDK
- Mac Mini (or any Mac) running the OpenClaw bridge (for AI features)
- Microphone permission granted to the app
git clone https://github.com/lhfer/visionclaw.git
cd visionclaw
open ShrimpXR.xcodeproj- Select
Apple Vision Protarget (device or simulator) - Build & Run (βR)
- The control panel window appears
- Start the OpenClaw bridge on your Mac:
cd OpenClawBridge pip install -r requirements.txt python bridge.py - In VisionClaw, tap "ζη΄’ Mac Mini" to auto-discover via Bonjour
- Status turns green when connected
- Tap "ζΎεΊθΎθΎ" to spawn the character
- The character appears on your desk with a greeting animation
- Tap the character to start voice input
- Speak in Chinese β see real-time transcription in the bubble
- Tap again to send your message to the AI
- Watch the character react β casting spell β thinking β celebrating!
| Gesture | Action |
|---|---|
| Tap | Start/stop voice recording |
| Long press | Force character upright |
| Drag | Move character in 3D space |
| Pinch | Scale character size |
| Two-hand rotate | Rotate character facing |
The character has a rich animation state machine:
| State | Animation | Trigger |
|---|---|---|
idle |
Breathing, walking, random poses | Default state |
listening |
Focused attention | User taps to speak |
sendingCommand |
Casting spell β¨ | Voice input sent |
thinking |
Walking/pacing | AI is processing |
working |
Active work gestures | AI is executing |
success |
Victory dance π | AI response received |
error |
Defeat pose | Something went wrong |
sleeping |
Napping π€ | 2 min inactivity |
ShrimpXR/
βββ Sources/
β βββ App/
β β βββ ShrimpXRApp.swift # App entry, ECS registration
β β βββ ControlPanelView.swift # Settings & debug UI
β β βββ SessionManager.swift # Central state orchestration
β βββ Shrimp/
β β βββ ShrimpEntity.swift # 3D model loading & placement
β β βββ ShrimpAnimationController.swift # Animation state machine
β β βββ ShrimpAnimationSystem.swift # ECS per-frame system
β β βββ ShrimpBubble3D.swift # 3D speech bubble
β β βββ ShrimpImmersiveView.swift # Main XR view + gestures
β β βββ ShrimpState.swift # State definitions
β βββ Speech/
β β βββ SpeechManager.swift # STT + TTS
β βββ Network/
β βββ NetworkManager.swift # Bonjour + WebSocket
βββ Resources/
β βββ shrimpboy.usdz # Main character model
β βββ animations/ # 15+ USDZ animation files
βββ OpenClawBridge/
βββ bridge.py # WebSocket bridge server
βββ requirements.txt
- Dual-entity hierarchy: Wrapper entity (gestures/rotation) β Model entity (animations). Prevents animation root motion from conflicting with user gestures.
- Async audio setup:
AVAudioEngineinitialization runs off MainActor vianonisolated static functo prevent UI freezing on Vision Pro. - ViewAttachmentComponent: Native visionOS 26 API for rendering SwiftUI directly in 3D space as speech bubbles.
- BubblePositionComponent: Custom RealityKit ECS component that dynamically tracks the character's head joint and counter-scales to maintain readable text size regardless of character scale.
- Swift 6 strict concurrency: Full compliance with Swift's latest concurrency model.
MIT License β see LICENSE for details.
Contributions are welcome! Feel free to open issues or submit pull requests.
Built with β€οΈ for Apple Vision Pro
VisionClaw β AI meets spatial computing
