A Video Agent.
You describe what you want. It observes your project, calls the right tools — FFmpeg, vision models, transcribers, generators — and produces the output.
Pre-alpha. Built in the open.
- Understand video — analyze clips, extract content (Gemini 2.5 Pro)
- Transcribe — speech-to-text from audio / video (Gemini)
- Generate — images, video clips, voiceover (MiniMax)
- Search & fetch — web search and URL fetch as agent tools
- Shell — agent drives FFmpeg / curl / arbitrary pipelines
- Screen capture — macOS native screenshot and recording
- Frame-precise preview (WebCodecs in renderer)
- VS Code-style UI: file tree + video preview + agent chat
- Cut / overlay / retime / caption / render as agent tools
- Remotion project integration
- Project-aware agent memory
- Electron + React — desktop shell
- Node + pi-mono — agent loop
- Swift native — macOS screen capture
- FFmpeg — video processing
- Models — Claude (orchestration), Gemini 2.5 Pro (video / transcribe), MiniMax (image / video / TTS)
- macOS 13+
- Node.js 20+, pnpm
- API keys:
ANTHROPIC_API_KEY(required),GOOGLE_API_KEYandMINIMAX_API_KEY(for transcription and generation)
pnpm install
pnpm -r build
pnpm dev:desktopMIT