YouTube-Transcript-Extractor/README.md at main · JoelMoyal/YouTube-Transcript-Extractor

Extract transcripts from YouTube, Vimeo, TikTok, and more — then unlock AI-powered study tools in one click.

About

ScribeSnap is a full-stack web application that pulls transcripts from videos across multiple platforms and enriches them with AI features: summaries, key insights, auto-generated flashcards, study guides, and interactive Q&A chat.

Whether you're a student reviewing a lecture, a researcher skimming a conference talk, or a professional turning webinars into notes, ScribeSnap converts any video into structured, searchable knowledge — instantly.

Features

Transcript Extraction

Multi-platform support — YouTube, Vimeo, TikTok, Twitter/X, Instagram, Facebook, Loom, Wistia, Dailymotion
Auto-subtitle fallback — tries native captions first, then falls back to Supadata for videos without them
Streaming delivery — transcripts stream to the UI via Server-Sent Events (SSE) so you see content as it arrives
Timestamp preservation — retains original timecodes for navigation

AI Tools (powered by Groq + OpenRouter fallback)

Summary — concise TL;DR of any video
Key Insights — bullet-point takeaways extracted from the transcript
Flashcards — auto-generated Q&A cards with topic tags; full-screen flip UI with known/unknown tracking
Study Guide — structured document with overview, learning objectives, key concepts, sections, and review questions
Chapter Detection — splits long videos into labeled chapters automatically
AI Chat — ask follow-up questions about the transcript in a conversational interface

Account & Credits

Supabase Auth — email/password sign-up and login
Credit system — authenticated users get a credit allowance; anonymous users get a limited free tier (configurable)
Per-user rate limiting — separate RPM caps for authenticated vs. anonymous users
Token validation caching — 60-second TTL cache on Supabase JWT checks to reduce auth latency

Export

Copy transcript to clipboard (plain text or formatted)
Download as .txt file
SRT subtitle format export

Tech Stack

Layer	Technology
Frontend	React 18 (CRA), inline styles design system, framer-motion
Backend	Node.js 20+, Express 4
AI — Primary	Groq (`llama-3.3-70b-versatile`)
AI — Fallback	OpenRouter
Transcript API	`youtube-transcript` npm package + Supadata fallback
Auth & Database	Supabase (Postgres + Auth)
Deployment	Railway (backend), Vercel-compatible (frontend)
Containerization	Docker (optional)

Architecture Overview

Browser (React SPA)
  │
  │  REST + SSE
  ▼
Express Server (server.js)
  ├── /api/transcript          ← SSE stream, credit gate
  ├── /api/summary             ← AI endpoint (auth-gated)
  ├── /api/insights            ← AI endpoint
  ├── /api/flashcards          ← AI endpoint → [{question, answer, topic}]
  ├── /api/study-guide         ← AI endpoint → structured JSON
  ├── /api/chapters            ← AI endpoint
  ├── /api/ask                 ← AI chat endpoint
  └── /api/video-meta          ← proxied metadata fetch
        │
        ├── Groq SDK  ──────── primary LLM
        ├── OpenRouter ──────── fallback LLM
        ├── youtube-transcript ─ native captions
        ├── Supadata ────────── caption fallback
        └── Supabase ────────── user auth + credits

Getting Started

Prerequisites

Node.js ≥ 20
npm ≥ 9
A Supabase project (free tier works)
A Groq API key (free tier available)

1. Clone & install

git clone https://github.com/JoelMoyal/YouTube-Transcript-Extractor.git
cd YouTube-Transcript-Extractor

# Install server dependencies
npm install

# Install client dependencies
cd client && npm install && cd ..

2. Configure environment

Copy the example env file and fill in your keys:

cp .env.example .env   # or create .env manually

See the Environment Variables section below for the full list.

3. Run in development

npm run dev
# Starts the Express server on :4999 and the React dev server on :3000

Open http://localhost:3000 in your browser.

4. Build for production

npm run build          # builds React into client/build/
npm start              # serves everything from Express on PORT (default 4999)

Docker

docker build -t scribesnap .
docker run -p 4999:4999 --env-file .env scribesnap

Environment Variables

Create a .env file in the project root:

# ── AI ────────────────────────────────────────────────────────────────────────
GROQ_API_KEY=          # Groq API key (primary LLM)
OPENROUTER_API_KEY=    # OpenRouter API key (fallback LLM)

# ── Supabase ──────────────────────────────────────────────────────────────────
SUPABASE_URL=          # Your Supabase project URL
SUPABASE_SERVICE_KEY=  # Supabase service role key (server-side only)

# ── Transcript APIs ───────────────────────────────────────────────────────────
SUPADATA_API_KEY=      # Supadata API key (caption fallback)

# ── Rate Limits & Credits ─────────────────────────────────────────────────────
ANON_CREDITS_MAX=2            # Free transcript credits for anonymous users (per 7 days)
ANON_AI_MAX_PER_DAY=24        # Max AI calls/day for anonymous users
AI_ANON_RPM=6                 # AI requests-per-minute for anonymous users
AI_AUTH_RPM=20                # AI requests-per-minute for authenticated users
AI_REQUIRE_AUTH=0             # Set to 1 to require login for all AI features

# ── Server ────────────────────────────────────────────────────────────────────
PORT=4999
NODE_ENV=development

Deployment

Railway (recommended)

Connect your GitHub repo to Railway
Set all environment variables in the Railway dashboard
Railway auto-detects package.json and runs npm start

The included railway.json is pre-configured.

Vercel (frontend only)

Deploy client/ as a standalone Vite/CRA app and point REACT_APP_API_URL to your Railway backend URL.

Project Structure

├── server.js              # Express backend — all API routes
├── client/
│   ├── src/
│   │   ├── App.js         # Entire React SPA (~9,800 lines)
│   │   ├── supabase.js    # Supabase client init
│   │   └── components/    # Magic UI components (BorderBeam, ShimmerButton…)
│   └── public/            # Static assets, SEO landing pages
├── supabase/              # DB migrations & edge functions
├── scripts/               # Utility scripts (secrets scan, etc.)
├── Dockerfile
└── railway.json

License

This project is licensed under the MIT License — see the LICENSE file for details.

Made with ♥ by Joel Moyal

↑ Back to top

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About

Features

Transcript Extraction

AI Tools (powered by Groq + OpenRouter fallback)

Account & Credits

Export

Tech Stack

Architecture Overview

Getting Started

Prerequisites

1. Clone & install

2. Configure environment

3. Run in development

4. Build for production

Docker

Environment Variables

Deployment

Railway (recommended)

Vercel (frontend only)

Project Structure

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

About

Features

Transcript Extraction

AI Tools (powered by Groq + OpenRouter fallback)

Account & Credits

Export

Tech Stack

Architecture Overview

Getting Started

Prerequisites

1. Clone & install

2. Configure environment

3. Run in development

4. Build for production

Docker

Environment Variables

Deployment

Railway (recommended)

Vercel (frontend only)

Project Structure

License