Skip to content

ginzlabs/voice2notes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎙️ Voice to Notes - AI Powered Notes Taking

An AI-native voice notes application powered by OpenAI's Realtime API, featuring agentic workflows, semantic search, and edge-first architecture on Cloudflare.

Next.js Cloudflare Workers OpenAI Realtime Durable Objects


Realtime Notes UI

✨ What Makes This Special?

This is not just another notes app—it's a production-ready showcase of cutting-edge web technologies working together seamlessly:

  • 🎤 Voice-First AI Agent — Talk naturally to create, search, update, and organize your notes using OpenAI's Realtime API with WebRTC streaming
  • 🚀 Edge-Native Architecture — Runs entirely on Cloudflare's edge platform with sub-50ms global latency
  • 🔐 User-Scoped SQLite Databases — Each user gets their own isolated SQLite database via Durable Objects—true data sovereignty at scale
  • 🔍 Semantic Search — Vector embeddings with Cloudflare Vectorize and Workers AI (bge-small-en-v1.5) for intelligent note retrieval
  • 🔄 Agentic Workflows — Client-side tool calling with automatic cache synchronization via React Query
  • 📊 Real-Time Usage Telemetry — WebSocket-based sideband connection tracks token usage and costs live
  • 🔒 Production-Grade Auth — Better Auth integration with short-lived JWT rotation and secure session management
  • 📈 Analytics Database — D1-powered usage tracking with per-model cost estimation

🏗️ Architecture Highlights

Frontend (apps/web)

  • Next.js 15 with App Router and React 19
  • OpenAI Agents SDK for realtime voice interaction and tool execution
  • React Query v5 for optimistic updates and cache coherence
  • Tailwind CSS 4 with shadcn/ui primitives

Backend (apps/api)

  • Cloudflare Workers for edge computing
  • Durable Objects for per-user isolated SQLite instances (NotesDO, SessionsDO, UsageLogDO)
  • Vectorize for cosine similarity semantic search (384 dimensions)
  • D1 Analytics DB for aggregated usage and cost tracking
  • Workers AI for on-demand text embeddings
  • WebRTC Bridge to OpenAI Realtime API with session management

Data Flow

User Voice → WebRTC → OpenAI Realtime API → Agent Tools → 
  → Cloudflare Worker → Durable Object (SQLite) → Vectorize Index
                     ↓
              Usage WebSocket → UsageLogDO → D1 Analytics

🚀 Getting Started

For complete setup and deployment instructions:

📚 Key Features Explained

Per-User SQLite Isolation

Unlike traditional multi-tenant databases, each user's notes live in a dedicated SQLite instance inside a Durable Object. This provides:

  • Data sovereignty — Full isolation with no cross-user queries
  • Predictable performance — No noisy neighbor issues
  • Regulatory compliance — GDPR/CCPA friendly data boundaries

Agentic Tool Calling

The OpenAI Realtime agent has access to powerful tools:

  • list_notes — Retrieve all user notes
  • get_note_by_id — Fetch specific note content
  • create_note — Generate new notes from voice
  • update_note — Modify existing notes
  • delete_note — Remove notes
  • search_notes — Semantic search across all notes

All tools update React Query caches optimistically for instant UI feedback.

Vector Search Pipeline

  1. User creates/updates a note
  2. Background task embeds content with Workers AI
  3. 384-dim vector stored in Vectorize with metadata
  4. Voice query "find my meeting notes" → embedded → cosine similarity search
  5. Results ranked and returned with relevance scores

Usage Tracking & Cost Management

Real-time WebSocket streams:

  • Text tokens (input/output)
  • Audio duration (seconds)
  • Model (e.g., gpt-4o-realtime-preview-2025-06-03)

Backend computes estimated costs using src/utils/pricing.js and surfaces them via:

GET /api/usage/summary?start=2025-01-01T00:00:00Z&end=2025-01-31T23:59:59Z

📖 Documentation

🛠️ Tech Stack Summary

Layer Technology
Frontend Framework Next.js 15, React 19
State Management React Query v5, Zustand
Styling Tailwind CSS 4, shadcn/ui
Backend Runtime Cloudflare Workers
Authentication Better Auth with JWT rotation
Database SQLite (Durable Objects), D1 (Analytics)
Vector DB Cloudflare Vectorize
AI Models OpenAI Realtime API, Workers AI
Real-Time Comms WebRTC, WebSockets
Deployment Cloudflare Workers, Vercel

🌟 Use Cases

This architecture is perfect for:

  • Voice-first applications requiring low latency AI responses
  • Privacy-conscious apps needing user data isolation
  • Global applications benefiting from edge deployment
  • Cost-optimized AI leveraging Cloudflare's Workers AI for embeddings
  • Real-time analytics with live usage tracking

🔗 Resources

About

AI-powered voice-to-notes app built with Next.js and OpenAI Realtime API hosted on Cloudflare Durable Objects for user data isolation, agentic workflows, usage tracking, and analytics.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors