Major Update v2.4.0: Now supports 30 voices, TTS vs Native Audio switching, and Model Selection!
This extension enables you to use Google's Gemini Multimodal Live API (and TTS API) to read selected text on any webpage. It supports real-time streaming, image understanding (in Native Audio mode), and a wide range of natural voices.
- 🎙️ Two API Modes:
- Native Audio (Live API): Real-time, supports system prompts (translation, style change), understands images. expensive but powerful.
- Text-to-Speech (TTS): Cheaper, standard text reading. Ideal for long articles.
- 🤖 Model Selection: Choose between
gemini-2.5-flash,gemini-2.5-proand other preview models. - 🗣️ 30+ Voices: Full support for all Gemini voices including Kore, Fenrir, Aoede, Charon, and more.
- ⚡ Fixed: Resurrected the extension after Google deprecated old ephemeral models.
- Clone or download this repository.
- Open Chrome and navigate to
chrome://extensions. - Enable Developer Mode (top right).
- Click Load unpacked and select the extension folder.
- Get your API Key from Google AI Studio.
- Click the extension icon and select Options.
- Enter your API Key.
- Select API Type (Native Audio or TTS).
- Choose your favorite Voice and Model.
- (Optional) Set a System Prompt (e.g., "Translate to Spanish and read").
- Select text on any webpage.
- Right-click and choose "Transcribe with Gemini".
- (Or click the extension icon to take a screenshot and have Gemini describe/read it - Native Audio only).
- Chrome Extension Manifest V3
- Gemini Live API (WebSocket)
- Gemini TTS API (REST)
- Native Audio / Web Audio API
Original extension by jansenmtan — Chrome Web Store · GitHub. Fixes and v2.0 updates by tomfalkenberg. Major v2.4 overhaul (UI, Voices, TTS support) by StrangeTeaCreature.