Skip to content

fifteen42/vigent

Repository files navigation

Vigent

A Video Agent.

You describe what you want. It observes your project, calls the right tools — FFmpeg, vision models, transcribers, generators — and produces the output.

Pre-alpha. Built in the open.

What it can do today

  • Understand video — analyze clips, extract content (Gemini 2.5 Pro)
  • Transcribe — speech-to-text from audio / video (Gemini)
  • Generate — images, video clips, voiceover (MiniMax)
  • Search & fetch — web search and URL fetch as agent tools
  • Shell — agent drives FFmpeg / curl / arbitrary pipelines
  • Screen capture — macOS native screenshot and recording

What's next

  • Frame-precise preview (WebCodecs in renderer)
  • VS Code-style UI: file tree + video preview + agent chat
  • Cut / overlay / retime / caption / render as agent tools
  • Remotion project integration
  • Project-aware agent memory

Stack

  • Electron + React — desktop shell
  • Node + pi-mono — agent loop
  • Swift native — macOS screen capture
  • FFmpeg — video processing
  • Models — Claude (orchestration), Gemini 2.5 Pro (video / transcribe), MiniMax (image / video / TTS)

Requirements

  • macOS 13+
  • Node.js 20+, pnpm
  • API keys: ANTHROPIC_API_KEY (required), GOOGLE_API_KEY and MINIMAX_API_KEY (for transcription and generation)

Getting started

pnpm install
pnpm -r build
pnpm dev:desktop

License

MIT

About

A Video Agent

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors