Skip to content

mcomi/ai-chat-streaming

Repository files navigation

AI Chat Streaming Lab

A minimal but production-minded ChatGPT-style app focused on learning streaming systems with Next.js App Router, React, Zustand, TailwindCSS, native fetch() streaming, AbortController, Vercel AI Gateway, and the official OpenAI SDK.

The app intentionally avoids event buses, RxJS, and framework-heavy abstractions. The goal is to make the stream lifecycle easy to read.

Features

  • User and assistant messages in one in-memory conversation
  • Token-by-token assistant rendering from a streamed HTTP response
  • Native frontend stream reading with response.body.getReader() and TextDecoder
  • Server-side Vercel AI Gateway proxy so the API key never reaches the browser
  • Model picker for trying OpenAI, Anthropic, and xAI models behind one Gateway key
  • Stop generation with AbortController
  • Stale request prevention with per-stream request ids
  • Retry last failed prompt
  • Auto-scroll, streaming cursor, loading, canceled, and error states
  • Markdown and GitHub-flavored Markdown rendering
  • Small stream metrics panel for debugging buffering and lifecycle behavior
  • Vercel-ready deployment config

Quick Start

npm install
cp .env.example .env.local
npm run dev

Add your key to .env.local:

AI_GATEWAY_API_KEY=vck_your-vercel-ai-gateway-key-here
AI_GATEWAY_MODEL=openai/gpt-5.4-mini

Open http://localhost:3000.

openai/gpt-5.4-mini is the default because Vercel AI Gateway lists it as a cost-efficient model for agentic workloads. You can switch models from the UI, or change the server fallback with AI_GATEWAY_MODEL.

Project Structure

app/
  api/chat/route.ts       Server streaming proxy
  page.tsx                App entry
components/               UI only
hooks/use-chat-stream.ts  React orchestration and cleanup
lib/                      AI Gateway client, model list, ids, constants
services/chat-stream.ts   Browser fetch streaming loop
store/chat-store.ts       Zustand state and stream guards
types/chat.ts             Shared chat types

Streaming Lifecycle

  1. The user submits a prompt from components/chat-composer.tsx.
  2. hooks/use-chat-stream.ts creates a unique request id, an assistant placeholder message, and an AbortController.
  3. Zustand stores the active request id, active assistant message id, streaming status, controller, errors, and metrics.
  4. services/chat-stream.ts calls /api/chat with fetch().
  5. The server route calls Vercel AI Gateway with stream: true.
  6. AI Gateway forwards incremental model events to the server.
  7. The server extracts response.output_text.delta events and enqueues UTF-8 bytes into a ReadableStream.
  8. The browser reads chunks with response.body.getReader().
  9. TextDecoder converts byte chunks into text while preserving partial UTF-8 characters across reads.
  10. Each decoded chunk is appended to the active assistant message.
  11. The stream completes, errors, or is canceled; Zustand clears the active request state in one place.

Why This Is Streaming Over HTTP

Traditional request/response waits until the full assistant answer exists before sending the response body. Here, the response body starts immediately and remains open while chunks are flushed. The browser can render each chunk as soon as it arrives, which is what creates the live typing effect.

The response is plain text/plain over an HTTP stream. It is SSE-style in the sense that the server keeps one HTTP response open and progressively flushes data, but the client intentionally uses native fetch() streaming instead of EventSource.

Why Fetch Streaming Instead Of EventSource

EventSource is convenient for server-sent events, but it is less flexible for this chat flow:

  • It is primarily GET-oriented, while chat submissions naturally use POST.
  • Request bodies and custom cancellation flow are cleaner with fetch().
  • AbortController plugs directly into fetch().
  • Reading a ReadableStream teaches the same primitives used by many modern streaming APIs.

Cancellation

The frontend stores the current AbortController in Zustand. Pressing stop calls abort(), which cancels the browser request. That cancellation propagates to the Next.js route through request.signal. The route passes that signal to the OpenAI-compatible Gateway request and stops forwarding chunks when the client disconnects.

Cancellation is not just a UI state. It prevents wasted model work, closes network resources, and stops old stream loops from appending text after the user has moved on.

Stale Stream Prevention

Stale streams happen when an older async reader loop resolves after a newer request has started. Without a guard, the old loop can append chunks to the wrong assistant message.

This app prevents that with:

  • activeRequestId in Zustand
  • one generated request id per stream
  • store methods that ignore chunks unless the request id still matches
  • cleanup on component unmount
  • cancellation before starting a new request

Common Pitfalls

  • Forgetting to check response.body before calling getReader()
  • Decoding chunks without TextDecoder, which can corrupt split UTF-8 characters
  • Appending chunks after a newer stream starts
  • Treating aborts as user-visible failures
  • Letting proxies buffer streamed responses
  • Exposing AI_GATEWAY_API_KEY to the frontend
  • Updating React state for every stream concern instead of keeping lifecycle state centralized

Deployment To Vercel

  1. Push the repo to GitHub.
  2. Import it in Vercel.
  3. Add environment variables:
    • AI_GATEWAY_API_KEY
    • AI_GATEWAY_MODEL such as openai/gpt-5.4-mini
  4. Deploy.

vercel.json gives the chat route a 60 second max duration. The route also returns:

Cache-Control: no-cache, no-transform
X-Accel-Buffering: no

Those headers discourage proxy buffering so chunks reach the browser progressively.

Notes On AI Gateway And Vercel

The API route uses the official OpenAI SDK pointed at Vercel AI Gateway's OpenAI-compatible base URL. That gives you one Gateway key for multiple providers while keeping the stream mechanics visible for study. The selected model is sent from the client as a model id like openai/gpt-5.4-mini or anthropic/claude-sonnet-4.6; the secret Gateway key stays server-side only.

If you want to compare this manual implementation with Vercel AI SDK helpers later, the clean boundary is app/api/chat/route.ts: replace the route internals while leaving the frontend reader loop intact.

Relevant official references:

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors