AI Chat Streaming Lab

A minimal but production-minded ChatGPT-style app focused on learning streaming systems with Next.js App Router, React, Zustand, TailwindCSS, native fetch() streaming, AbortController, Vercel AI Gateway, and the official OpenAI SDK.

The app intentionally avoids event buses, RxJS, and framework-heavy abstractions. The goal is to make the stream lifecycle easy to read.

Features

User and assistant messages in one in-memory conversation
Token-by-token assistant rendering from a streamed HTTP response
Native frontend stream reading with response.body.getReader() and TextDecoder
Server-side Vercel AI Gateway proxy so the API key never reaches the browser
Model picker for trying OpenAI, Anthropic, and xAI models behind one Gateway key
Stop generation with AbortController
Stale request prevention with per-stream request ids
Retry last failed prompt
Auto-scroll, streaming cursor, loading, canceled, and error states
Markdown and GitHub-flavored Markdown rendering
Small stream metrics panel for debugging buffering and lifecycle behavior
Vercel-ready deployment config

Quick Start

npm install
cp .env.example .env.local
npm run dev

Add your key to .env.local:

AI_GATEWAY_API_KEY=vck_your-vercel-ai-gateway-key-here
AI_GATEWAY_MODEL=openai/gpt-5.4-mini

Open http://localhost:3000.

openai/gpt-5.4-mini is the default because Vercel AI Gateway lists it as a cost-efficient model for agentic workloads. You can switch models from the UI, or change the server fallback with AI_GATEWAY_MODEL.

Project Structure

app/
  api/chat/route.ts       Server streaming proxy
  page.tsx                App entry
components/               UI only
hooks/use-chat-stream.ts  React orchestration and cleanup
lib/                      AI Gateway client, model list, ids, constants
services/chat-stream.ts   Browser fetch streaming loop
store/chat-store.ts       Zustand state and stream guards
types/chat.ts             Shared chat types

Streaming Lifecycle

The user submits a prompt from components/chat-composer.tsx.
hooks/use-chat-stream.ts creates a unique request id, an assistant placeholder message, and an AbortController.
Zustand stores the active request id, active assistant message id, streaming status, controller, errors, and metrics.
services/chat-stream.ts calls /api/chat with fetch().
The server route calls Vercel AI Gateway with stream: true.
AI Gateway forwards incremental model events to the server.
The server extracts response.output_text.delta events and enqueues UTF-8 bytes into a ReadableStream.
The browser reads chunks with response.body.getReader().
TextDecoder converts byte chunks into text while preserving partial UTF-8 characters across reads.
Each decoded chunk is appended to the active assistant message.
The stream completes, errors, or is canceled; Zustand clears the active request state in one place.

Why This Is Streaming Over HTTP

Traditional request/response waits until the full assistant answer exists before sending the response body. Here, the response body starts immediately and remains open while chunks are flushed. The browser can render each chunk as soon as it arrives, which is what creates the live typing effect.

The response is plain text/plain over an HTTP stream. It is SSE-style in the sense that the server keeps one HTTP response open and progressively flushes data, but the client intentionally uses native fetch() streaming instead of EventSource.

Why Fetch Streaming Instead Of EventSource

EventSource is convenient for server-sent events, but it is less flexible for this chat flow:

It is primarily GET-oriented, while chat submissions naturally use POST.
Request bodies and custom cancellation flow are cleaner with fetch().
AbortController plugs directly into fetch().
Reading a ReadableStream teaches the same primitives used by many modern streaming APIs.

Cancellation

The frontend stores the current AbortController in Zustand. Pressing stop calls abort(), which cancels the browser request. That cancellation propagates to the Next.js route through request.signal. The route passes that signal to the OpenAI-compatible Gateway request and stops forwarding chunks when the client disconnects.

Cancellation is not just a UI state. It prevents wasted model work, closes network resources, and stops old stream loops from appending text after the user has moved on.

Stale Stream Prevention

Stale streams happen when an older async reader loop resolves after a newer request has started. Without a guard, the old loop can append chunks to the wrong assistant message.

This app prevents that with:

activeRequestId in Zustand
one generated request id per stream
store methods that ignore chunks unless the request id still matches
cleanup on component unmount
cancellation before starting a new request

Common Pitfalls

Forgetting to check response.body before calling getReader()
Decoding chunks without TextDecoder, which can corrupt split UTF-8 characters
Appending chunks after a newer stream starts
Treating aborts as user-visible failures
Letting proxies buffer streamed responses
Exposing AI_GATEWAY_API_KEY to the frontend
Updating React state for every stream concern instead of keeping lifecycle state centralized

Deployment To Vercel

Push the repo to GitHub.
Import it in Vercel.
Add environment variables:
- AI_GATEWAY_API_KEY
- AI_GATEWAY_MODEL such as openai/gpt-5.4-mini
Deploy.

vercel.json gives the chat route a 60 second max duration. The route also returns:

Cache-Control: no-cache, no-transform
X-Accel-Buffering: no

Those headers discourage proxy buffering so chunks reach the browser progressively.

Notes On AI Gateway And Vercel

The API route uses the official OpenAI SDK pointed at Vercel AI Gateway's OpenAI-compatible base URL. That gives you one Gateway key for multiple providers while keeping the stream mechanics visible for study. The selected model is sent from the client as a model id like openai/gpt-5.4-mini or anthropic/claude-sonnet-4.6; the secret Gateway key stays server-side only.

If you want to compare this manual implementation with Vercel AI SDK helpers later, the clean boundary is app/api/chat/route.ts: replace the route internals while leaving the frontend reader loop intact.

Relevant official references:

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
app		app
components		components
hooks		hooks
lib		lib
services		services
store		store
types		types
.env.example		.env.example
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
README.md		README.md
next-env.d.ts		next-env.d.ts
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json
vercel.json		vercel.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Chat Streaming Lab

Features

Quick Start

Project Structure

Streaming Lifecycle

Why This Is Streaming Over HTTP

Why Fetch Streaming Instead Of EventSource

Cancellation

Stale Stream Prevention

Common Pitfalls

Deployment To Vercel

Notes On AI Gateway And Vercel

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Chat Streaming Lab

Features

Quick Start

Project Structure

Streaming Lifecycle

Why This Is Streaming Over HTTP

Why Fetch Streaming Instead Of EventSource

Cancellation

Stale Stream Prevention

Common Pitfalls

Deployment To Vercel

Notes On AI Gateway And Vercel

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages