AI Window is a minimalist, transparent window assistant that combines real-time voice interaction with immersive background scenes. Powered by Gemini Live and YouTube, it allows you to transform your workspace with just a voice command.
- Real-time Voice Interaction: Bidirectional streaming using
gemini-2.5-flash-native-audio-preview. - Smart Scene Switching: Automatically searches and plays YouTube 4K window views (e.g., "Rainy London", "Swiss Alps") based on conversation keywords.
- Intelligent Audio Management:
- Auto-Pause: Background music/video automatically pauses when you start talking to the AI.
- Auto-Resume: Background audio resumes seamlessly once the conversation ends.
- Jitter Buffer: Advanced latency management to prevent audio stuttering during network bursts.
- Minimalist UI: A sleek, transparent, and "always-on-top" PyQt6 interface.
- Auto Mic Closure: The assistant automatically closes the microphone 6 seconds after a search command is detected, allowing it to finish its verbal confirmation.
- Python 3.10+
- FFmpeg: Required for audio processing.
- MPV: Required for background video playback.
- yt-dlp: Required for searching YouTube content.
- Dependencies:
pip install PyQt6 google-genai nest_asyncio
- Get a Gemini API Key: Visit the Google AI Studio to get your key.
- Set Environment Variable:
export GEMINI_API_KEY='your_api_key_here'
- Launch the Application:
Everything is automated via the start script. Simply run:
This script will automatically clear previous sockets, start MPV in the background with a default rainy scene, launch the AI UI, and clean up processes upon exit.
chmod +x start_window.sh ./start_window.sh
The aiwin.desktop file allows you to launch the AI Window directly from your Ubuntu desktop with a single click.
- Configure Paths: Open
aiwin.desktopand update theExecandIconpaths to match your local project directory (e.g., replace/home/weafon/aiwindow/with your actual path). - Deployment: Copy the file to your desktop:
cp aiwin.desktop ~/Desktop/ - Permissions: Right-click the file on your desktop and select "Allow Launching".
- Voice Command: Click the 🎤 button to start a Live session.
- Switch Scenes: Tell the AI something like:
- "I want to see the rainy streets of London."
- "Show me a snowy mountain view."
- "帮我换成日本街道的风景" (Support for Traditional Chinese).
- Text Entry: You can also type commands into the input field at the bottom.
- Exit: Click the '✕' or press
Esc.
The send2mpv extension allows you to send any YouTube video you are currently watching in Chrome directly to the AI Window for remote playback.
- Installation:
- Open Chrome and navigate to
chrome://extensions/. - Enable "Developer mode" in the top right corner.
- Click "Load unpacked" and select the
send2mpvdirectory from this project.
- Open Chrome and navigate to
- Configuration:
- Open
send2mpv/background.js. - Update
targetIpwith the IP address of the machine running the AI Window. - (Optional) Update
targetPortif you changed the default port (9998).
- Open
- Usage:
- Click the extension icon while on a YouTube video page to send it to the AI Window. The local video will automatically pause.
- Audio Configuration:
- Recording: 16kHz, 16-bit PCM.
- Playback: 24kHz, 16-bit PCM (standard for Gemini Live output).
- Jitter Buffer: Built with a 5-second burst tolerance and 20ms check intervals to ensure smooth playback regardless of network conditions.
- Device Selection: Automatically prioritizes external microphones (USB Audio, ConferenceCam) for better voice quality.
This project is for demonstration and personal use. Powered by Google Gemini.