Where voices come alive - AI-powered text-to-speech and instant voice cloning, built with Next.js 16, React 19, and cutting-edge TTS technology.
Sentience isn't just another text-to-speech app - it's a voice ecosystem that gives your content a soul. Whether you're a content creator, developer, or enterprise, Sentience transforms written words into living, breathing audio that resonates with your audience.
-
🎙️ Zero-Shot Voice Cloning - Upload or record just 10 seconds of any voice, and watch Sentience replicate it instantly. No training, no waiting, no technical expertise required.
-
🗣️ 20+ Premium Built-in Voices - A diverse cast of AI voices across 12 categories and 5 locales, ready to bring any project to life.
-
🎚️ Fine-tune Your Sound - Adjust creativity, variety, expression, and flow parameters to make each generation uniquely yours.
-
👥 Team Collaboration - Multi-tenant architecture with Clerk Organizations ensures complete data isolation for teams of any size.
-
💳 Smart Usage-Based Billing - Pay only for what you use with Polar's metered pricing. Start at $0/month and scale with your success.
-
📊 Generation History - Every voice you create, every word you generate - preserved with full metadata for easy recall and re-use.
-
📱 Responsive Everywhere - From desktop studios to mobile workflows, Sentience adapts seamlessly to any screen.
-
🎨 Waveform Visualization - Beautiful WaveSurfer.js audio player with seek, play/pause, and download capabilities.
- Node.js 20.9 or later
- Prisma Postgres database
- Clerk account (with Organizations enabled)
- Cloudflare R2 bucket
- Modal account (for GPU-hosted TTS)
- Polar account (for billing)
git clone https://github.com/AlexGMAY/Sentience.git
cd Sentience
npm installcp .env.example .envFill in the blank values in .env. Sensible defaults (Clerk routes, Polar meter names, APP_URL, etc.) are pre-filled.
In your Polar dashboard, create two meters under Meters:
-
Voice Creation meter
- Filter: Name equals
voice_creation - Aggregation: Count
- Filter: Name equals
-
Text-to-Speech Characters meter
- Filter: Name equals
tts_generation - Aggregation: Sum over
characters
- Filter: Name equals
Then create a new product with Recurring subscription pricing. Under Price Type, add two metered prices:
-
Click Add metered price and select the Text-to-Speech Characters meter
- Set the Amount per unit (price per character, e.g.
$0.003) - Optionally set a Cap amount (e.g.
$100)
- Set the Amount per unit (price per character, e.g.
-
Click Add metered price again and select the Voice Creation meter
- Set the Amount per unit (price per voice generation, e.g.
$0.25) - Optionally set a Cap amount (e.g.
$100)
- Set the Amount per unit (price per voice generation, e.g.
With only metered prices, the subscription starts at $0/month and scales with usage. If you want a baseline subscription fee (e.g. $20/month), add a third price to the same product — select a fixed price instead of a metered price.
Ensure Allow multiple subscriptions is turned off under Settings > Billing (this is the Polar default).
Copy the product ID into POLAR_PRODUCT_ID. The meter filter names and aggregation property must match the POLAR_METER_* env variables.
npx prisma migrate deployThe included chatterbox_tts.py is adapted from Modal's official Chatterbox TTS example, modified to read voice reference audio directly from your R2 bucket instead of a Modal Volume.
Before deploying, update chatterbox_tts.py with your R2 credentials:
R2_BUCKET_NAME = "<your-r2-bucket-name-here>"
R2_ACCOUNT_ID = "<your-r2-account-id-here>"Then create the required secrets in your Modal dashboard:
| Secret Name | Keys | Description |
|---|---|---|
cloudflare-r2 |
AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY |
R2 API credentials (used for bucket mount) |
chatterbox-api-key |
CHATTERBOX_API_KEY |
API key to protect the endpoint (use any strong random string) |
hf-token |
HF_TOKEN |
Hugging Face token (for downloading the Chatterbox model weights) |
Deploy to Modal:
modal deploy chatterbox_tts.pyThis deploys Chatterbox TTS to a serverless NVIDIA A10G GPU on Modal. The container mounts your R2 bucket read-only for direct access to voice reference audio. Use the resulting Modal URL as CHATTERBOX_API_URL in your .env.local.
Note: The first request after a period of inactivity may take longer due to cold starts as Modal provisions the GPU container.
Once deployed, generate the type-safe Chatterbox client from the OpenAPI spec:
npm run sync-apinpx prisma db seedSeeds 20 built-in voices to the database and R2. The system voice WAV files are included in the repository and originate from Modal's voice sample pack.
npm run devOpen http://localhost:3000.
Sentience is designed to be self-hosted. You'll need:
- A PostgreSQL database - Prisma Postgres (recommended), or any managed Postgres
- Cloudflare R2 - For audio storage (S3-compatible, generous free tier)
- Modal - For serverless GPU inference (pay-per-second billing)
- Clerk - For authentication and multi-tenancy
- Polar - For metered billing (use sandbox mode with card
4242 4242 4242 4242for testing)
Deploy the Next.js app to any Node.js host (Vercel, Railway, Docker, etc.).
src/
├── app/ # Next.js App Router
│ ├── (dashboard)/ # Protected routes (home, TTS, voices)
│ ├── api/ # Audio proxy routes + tRPC handler
│ ├── sign-in/ # Clerk auth pages
│ └── sign-up/
├── components/ # Shared UI components (shadcn/ui + custom)
├── features/
│ ├── dashboard/ # Home page, quick actions
│ ├── text-to-speech/ # TTS form, audio player, settings, history
│ ├── voices/ # Voice library, creation, recording
│ └── billing/ # Usage display, checkout
├── hooks/ # App-wide hooks
├── lib/ # Core: db, r2, polar, env, chatterbox client
├── trpc/ # tRPC routers, client, server helpers
├── generated/ # Prisma client
└── types/ # Generated API types
| Command | Description |
|---|---|
npm run dev |
Start dev server |
npm run build |
Production build |
npm run start |
Start production server |
npm run lint |
Lint with ESLint |
npm run sync-api |
Regenerate Chatterbox API types from OpenAPI spec |
- Chatterbox TTS by Resemble AI - the open-source zero-shot voice cloning model powering speech generation
- Modal - serverless GPU deployment and voice sample pack
- shadcn/ui - beautiful, accessible components
- WaveSurfer.js - audio visualization
MIT © AlexGMAY
Built with ❤️ for voices that matter
© 2026 Sentience. All rights reserved.