Skip to content

AlexGMAY/Sentience

Repository files navigation

Sentience

Where voices come alive - AI-powered text-to-speech and instant voice cloning, built with Next.js 16, React 19, and cutting-edge TTS technology.


Deploy on Vercel


Clerk  Polar  Modal  Sentry  Prisma  Cloudflare R2


✨ The Sentience Difference

Sentience isn't just another text-to-speech app - it's a voice ecosystem that gives your content a soul. Whether you're a content creator, developer, or enterprise, Sentience transforms written words into living, breathing audio that resonates with your audience.


🚀 Features

  • 🎙️ Zero-Shot Voice Cloning - Upload or record just 10 seconds of any voice, and watch Sentience replicate it instantly. No training, no waiting, no technical expertise required.

  • 🗣️ 20+ Premium Built-in Voices - A diverse cast of AI voices across 12 categories and 5 locales, ready to bring any project to life.

  • 🎚️ Fine-tune Your Sound - Adjust creativity, variety, expression, and flow parameters to make each generation uniquely yours.

  • 👥 Team Collaboration - Multi-tenant architecture with Clerk Organizations ensures complete data isolation for teams of any size.

  • 💳 Smart Usage-Based Billing - Pay only for what you use with Polar's metered pricing. Start at $0/month and scale with your success.

  • 📊 Generation History - Every voice you create, every word you generate - preserved with full metadata for easy recall and re-use.

  • 📱 Responsive Everywhere - From desktop studios to mobile workflows, Sentience adapts seamlessly to any screen.

  • 🎨 Waveform Visualization - Beautiful WaveSurfer.js audio player with seek, play/pause, and download capabilities.


🏁 Getting Started

Prerequisites

1. Clone and install

git clone https://github.com/AlexGMAY/Sentience.git
cd Sentience
npm install

2. Configure environment

cp .env.example .env

Fill in the blank values in .env. Sensible defaults (Clerk routes, Polar meter names, APP_URL, etc.) are pre-filled.

3. Set up Polar billing

In your Polar dashboard, create two meters under Meters:

  1. Voice Creation meter

    • Filter: Name equals voice_creation
    • Aggregation: Count
  2. Text-to-Speech Characters meter

    • Filter: Name equals tts_generation
    • Aggregation: Sum over characters

Then create a new product with Recurring subscription pricing. Under Price Type, add two metered prices:

  1. Click Add metered price and select the Text-to-Speech Characters meter

    • Set the Amount per unit (price per character, e.g. $0.003)
    • Optionally set a Cap amount (e.g. $100)
  2. Click Add metered price again and select the Voice Creation meter

    • Set the Amount per unit (price per voice generation, e.g. $0.25)
    • Optionally set a Cap amount (e.g. $100)

With only metered prices, the subscription starts at $0/month and scales with usage. If you want a baseline subscription fee (e.g. $20/month), add a third price to the same product — select a fixed price instead of a metered price.

Ensure Allow multiple subscriptions is turned off under Settings > Billing (this is the Polar default).

Copy the product ID into POLAR_PRODUCT_ID. The meter filter names and aggregation property must match the POLAR_METER_* env variables.

4. Set up the database

npx prisma migrate deploy

5. Deploy the TTS engine

The included chatterbox_tts.py is adapted from Modal's official Chatterbox TTS example, modified to read voice reference audio directly from your R2 bucket instead of a Modal Volume.

Before deploying, update chatterbox_tts.py with your R2 credentials:

R2_BUCKET_NAME = "<your-r2-bucket-name-here>"
R2_ACCOUNT_ID = "<your-r2-account-id-here>"

Then create the required secrets in your Modal dashboard:

Secret Name Keys Description
cloudflare-r2 AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY R2 API credentials (used for bucket mount)
chatterbox-api-key CHATTERBOX_API_KEY API key to protect the endpoint (use any strong random string)
hf-token HF_TOKEN Hugging Face token (for downloading the Chatterbox model weights)

Deploy to Modal:

modal deploy chatterbox_tts.py

This deploys Chatterbox TTS to a serverless NVIDIA A10G GPU on Modal. The container mounts your R2 bucket read-only for direct access to voice reference audio. Use the resulting Modal URL as CHATTERBOX_API_URL in your .env.local.

Note: The first request after a period of inactivity may take longer due to cold starts as Modal provisions the GPU container.

Once deployed, generate the type-safe Chatterbox client from the OpenAPI spec:

npm run sync-api

6. Seed voices

npx prisma db seed

Seeds 20 built-in voices to the database and R2. The system voice WAV files are included in the repository and originate from Modal's voice sample pack.

7. Run

npm run dev

Open http://localhost:3000.


🏭 Self-Hosting

Sentience is designed to be self-hosted. You'll need:

  1. A PostgreSQL database - Prisma Postgres (recommended), or any managed Postgres
  2. Cloudflare R2 - For audio storage (S3-compatible, generous free tier)
  3. Modal - For serverless GPU inference (pay-per-second billing)
  4. Clerk - For authentication and multi-tenancy
  5. Polar - For metered billing (use sandbox mode with card 4242 4242 4242 4242 for testing)

Deploy the Next.js app to any Node.js host (Vercel, Railway, Docker, etc.).


📁 Project Structure

src/
├── app/                        # Next.js App Router
│   ├── (dashboard)/            # Protected routes (home, TTS, voices)
│   ├── api/                    # Audio proxy routes + tRPC handler
│   ├── sign-in/                # Clerk auth pages
│   └── sign-up/
├── components/                 # Shared UI components (shadcn/ui + custom)
├── features/
│   ├── dashboard/              # Home page, quick actions
│   ├── text-to-speech/         # TTS form, audio player, settings, history
│   ├── voices/                 # Voice library, creation, recording
│   └── billing/                # Usage display, checkout
├── hooks/                      # App-wide hooks
├── lib/                        # Core: db, r2, polar, env, chatterbox client
├── trpc/                       # tRPC routers, client, server helpers
├── generated/                  # Prisma client
└── types/                      # Generated API types

📜 Scripts

Command Description
npm run dev Start dev server
npm run build Production build
npm run start Start production server
npm run lint Lint with ESLint
npm run sync-api Regenerate Chatterbox API types from OpenAPI spec

🙏 Acknowledgements


📄 License

MIT © AlexGMAY


Built with ❤️ for voices that matter

© 2026 Sentience. All rights reserved.

```

About

SaaS App - AI-powered text-to-speech and voice cloning built with Next.js 16, React 19, and Chatterbox TTS.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors