Skip to content

lhfer/visionclaw

Repository files navigation

🦐 VisionClaw β€” Your AI Companion on Apple Vision Pro

English | δΈ­ζ–‡

VisionClaw brings an interactive 3D AI character to your Apple Vision Pro. A lively animated character sits on your desk, listens to your voice, talks back with real speech, and connects to your Mac's AI brain β€” all in mixed reality.

"Like having a tiny AI assistant living on your desk, with personality."

πŸŽ₯ Demo

Demo Video

Click the preview to watch the full demo video


✨ Features

🎭 Living 3D Character

  • 15+ hand-crafted animations β€” idle, listening, thinking, working, celebrating, sleeping, and more
  • Reactive state machine β€” the character visually responds to every interaction stage
  • Gesture control β€” drag to reposition, pinch to scale, two-hand rotate to turn
  • Always alive β€” idle variations, easter egg dances, drowsy yawns, and sleep cycles

🎀 Voice Interaction

  • Tap to talk β€” tap the character to start listening, tap again to send
  • Real-time transcription β€” see your words appear in a floating speech bubble as you speak
  • Chinese speech recognition β€” powered by Apple's on-device SFSpeechRecognizer
  • Text-to-speech responses β€” the character speaks back with natural Chinese TTS

πŸ’¬ AI-Powered Conversations

  • OpenClaw integration β€” connects to your Mac Mini running the OpenClaw AI agent via WebSocket
  • Auto-discovery β€” finds your Mac on the local network via Bonjour
  • Live status feedback β€” see thinking, working, and processing states in real-time
  • Progressive timeout β€” clear feedback at 10s, 30s, 60s if the AI takes long

🫧 Smart Speech Bubble

  • Typewriter effect β€” responses appear character by character with adaptive speed
  • Chinese-optimized β€” slower display for Chinese characters, faster for English, pauses on punctuation
  • State icons β€” 🎀 listening, ✨ sending, πŸ’­ thinking, βš™οΈ working, βœ“ success, ⚠️ error
  • Auto-dismiss β€” generous reading time calculated for Chinese reading speed (~3 chars/sec)

🏠 Spatial Awareness

  • Mixed reality β€” character exists in your real environment with shadows
  • Free positioning β€” drag the character anywhere in 3D space (horizontal + vertical)
  • Pinch to resize β€” scale from tiny (1cm) to large (60cm)
  • Billboard bubble β€” speech bubble always faces you automatically

πŸ— Architecture

Apple Vision Pro                          Mac Mini
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  VisionClaw App     β”‚  ◄──WebSocket──► β”‚  OpenClaw Bridge β”‚
β”‚                     β”‚                  β”‚  (Python)        β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚                  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ ShrimpEntity  β”‚  β”‚   Bonjour        β”‚  β”‚  OpenClaw   β”‚  β”‚
β”‚  β”‚ (3D Character)β”‚  β”‚   Discovery      β”‚  β”‚  AI Agent   β”‚  β”‚
β”‚  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€  β”‚                  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚  β”‚ AnimControllerβ”‚  β”‚                  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚  β”‚ (15+ anims)   β”‚  β”‚
β”‚  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€  β”‚
β”‚  β”‚ SpeechManager β”‚  β”‚
β”‚  β”‚ (STT + TTS)   β”‚  β”‚
β”‚  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€  β”‚
β”‚  β”‚ Bubble3D      β”‚  β”‚
β”‚  β”‚ (SwiftUI in   β”‚  β”‚
β”‚  β”‚  RealityKit)  β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Components

Component File Purpose
ShrimpEntity ShrimpEntity.swift 3D model loading, dual-entity hierarchy (root wrapper + animated model)
ShrimpAnimationController ShrimpAnimationController.swift 15+ animation clips, state-driven transitions, idle variations
ShrimpAnimationSystem ShrimpAnimationSystem.swift RealityKit ECS system for per-frame updates + bubble positioning
ShrimpBubble3D ShrimpBubble3D.swift ViewAttachmentComponent-based 3D speech bubble with typewriter
SpeechManager SpeechManager.swift SFSpeechRecognizer (async audio setup) + AVSpeechSynthesizer
SessionManager SessionManager.swift Central state machine orchestrating all interactions
NetworkManager NetworkManager.swift Bonjour discovery + WebSocket to Mac Mini
OpenClawBridge bridge.py Python WebSocket bridge between Vision Pro and OpenClaw

πŸš€ Getting Started

Prerequisites

  • Apple Vision Pro (or visionOS Simulator)
  • Xcode 26+ with visionOS 26 SDK
  • Mac Mini (or any Mac) running the OpenClaw bridge (for AI features)
  • Microphone permission granted to the app

1. Clone & Open

git clone https://github.com/lhfer/visionclaw.git
cd visionclaw
open ShrimpXR.xcodeproj

2. Build & Run

  1. Select Apple Vision Pro target (device or simulator)
  2. Build & Run (⌘R)
  3. The control panel window appears

3. Connect to AI

  1. Start the OpenClaw bridge on your Mac:
    cd OpenClawBridge
    pip install -r requirements.txt
    python bridge.py
  2. In VisionClaw, tap "搜紒 Mac Mini" to auto-discover via Bonjour
  3. Status turns green when connected

4. Meet Your Character

  1. Tap "ζ”Ύε‡Ίθ™Ύθ™Ύ" to spawn the character
  2. The character appears on your desk with a greeting animation
  3. Tap the character to start voice input
  4. Speak in Chinese β€” see real-time transcription in the bubble
  5. Tap again to send your message to the AI
  6. Watch the character react β€” casting spell β†’ thinking β†’ celebrating!

5. Gesture Controls

Gesture Action
Tap Start/stop voice recording
Long press Force character upright
Drag Move character in 3D space
Pinch Scale character size
Two-hand rotate Rotate character facing

🎬 Animation States

The character has a rich animation state machine:

State Animation Trigger
idle Breathing, walking, random poses Default state
listening Focused attention User taps to speak
sendingCommand Casting spell ✨ Voice input sent
thinking Walking/pacing AI is processing
working Active work gestures AI is executing
success Victory dance πŸŽ‰ AI response received
error Defeat pose Something went wrong
sleeping Napping πŸ’€ 2 min inactivity

πŸ“ Project Structure

ShrimpXR/
β”œβ”€β”€ Sources/
β”‚   β”œβ”€β”€ App/
β”‚   β”‚   β”œβ”€β”€ ShrimpXRApp.swift          # App entry, ECS registration
β”‚   β”‚   β”œβ”€β”€ ControlPanelView.swift     # Settings & debug UI
β”‚   β”‚   └── SessionManager.swift       # Central state orchestration
β”‚   β”œβ”€β”€ Shrimp/
β”‚   β”‚   β”œβ”€β”€ ShrimpEntity.swift         # 3D model loading & placement
β”‚   β”‚   β”œβ”€β”€ ShrimpAnimationController.swift  # Animation state machine
β”‚   β”‚   β”œβ”€β”€ ShrimpAnimationSystem.swift      # ECS per-frame system
β”‚   β”‚   β”œβ”€β”€ ShrimpBubble3D.swift       # 3D speech bubble
β”‚   β”‚   β”œβ”€β”€ ShrimpImmersiveView.swift  # Main XR view + gestures
β”‚   β”‚   └── ShrimpState.swift          # State definitions
β”‚   β”œβ”€β”€ Speech/
β”‚   β”‚   └── SpeechManager.swift        # STT + TTS
β”‚   └── Network/
β”‚       └── NetworkManager.swift       # Bonjour + WebSocket
β”œβ”€β”€ Resources/
β”‚   β”œβ”€β”€ shrimpboy.usdz                 # Main character model
β”‚   └── animations/                    # 15+ USDZ animation files
└── OpenClawBridge/
    β”œβ”€β”€ bridge.py                      # WebSocket bridge server
    └── requirements.txt

πŸ›  Technical Highlights

  • Dual-entity hierarchy: Wrapper entity (gestures/rotation) β†’ Model entity (animations). Prevents animation root motion from conflicting with user gestures.
  • Async audio setup: AVAudioEngine initialization runs off MainActor via nonisolated static func to prevent UI freezing on Vision Pro.
  • ViewAttachmentComponent: Native visionOS 26 API for rendering SwiftUI directly in 3D space as speech bubbles.
  • BubblePositionComponent: Custom RealityKit ECS component that dynamically tracks the character's head joint and counter-scales to maintain readable text size regardless of character scale.
  • Swift 6 strict concurrency: Full compliance with Swift's latest concurrency model.

πŸ“„ License

MIT License β€” see LICENSE for details.


🀝 Contributing

Contributions are welcome! Feel free to open issues or submit pull requests.


Built with ❀️ for Apple Vision Pro
VisionClaw β€” AI meets spatial computing

About

🦐 VisionClaw β€” Interactive 3D AI companion for Apple Vision Pro. Voice-powered conversations with an animated character in mixed reality. Built with RealityKit, SwiftUI, and SFSpeechRecognizer.

Topics

Resources

License

Stars

Watchers

Forks

Packages