A minimal but production-minded ChatGPT-style app focused on learning streaming systems with Next.js App Router, React, Zustand, TailwindCSS, native fetch() streaming, AbortController, Vercel AI Gateway, and the official OpenAI SDK.
The app intentionally avoids event buses, RxJS, and framework-heavy abstractions. The goal is to make the stream lifecycle easy to read.
- User and assistant messages in one in-memory conversation
- Token-by-token assistant rendering from a streamed HTTP response
- Native frontend stream reading with
response.body.getReader()andTextDecoder - Server-side Vercel AI Gateway proxy so the API key never reaches the browser
- Model picker for trying OpenAI, Anthropic, and xAI models behind one Gateway key
- Stop generation with
AbortController - Stale request prevention with per-stream request ids
- Retry last failed prompt
- Auto-scroll, streaming cursor, loading, canceled, and error states
- Markdown and GitHub-flavored Markdown rendering
- Small stream metrics panel for debugging buffering and lifecycle behavior
- Vercel-ready deployment config
npm install
cp .env.example .env.local
npm run devAdd your key to .env.local:
AI_GATEWAY_API_KEY=vck_your-vercel-ai-gateway-key-here
AI_GATEWAY_MODEL=openai/gpt-5.4-miniOpen http://localhost:3000.
openai/gpt-5.4-mini is the default because Vercel AI Gateway lists it as a cost-efficient model for agentic workloads. You can switch models from the UI, or change the server fallback with AI_GATEWAY_MODEL.
app/
api/chat/route.ts Server streaming proxy
page.tsx App entry
components/ UI only
hooks/use-chat-stream.ts React orchestration and cleanup
lib/ AI Gateway client, model list, ids, constants
services/chat-stream.ts Browser fetch streaming loop
store/chat-store.ts Zustand state and stream guards
types/chat.ts Shared chat types
- The user submits a prompt from
components/chat-composer.tsx. hooks/use-chat-stream.tscreates a unique request id, an assistant placeholder message, and anAbortController.- Zustand stores the active request id, active assistant message id, streaming status, controller, errors, and metrics.
services/chat-stream.tscalls/api/chatwithfetch().- The server route calls Vercel AI Gateway with
stream: true. - AI Gateway forwards incremental model events to the server.
- The server extracts
response.output_text.deltaevents and enqueues UTF-8 bytes into aReadableStream. - The browser reads chunks with
response.body.getReader(). TextDecoderconverts byte chunks into text while preserving partial UTF-8 characters across reads.- Each decoded chunk is appended to the active assistant message.
- The stream completes, errors, or is canceled; Zustand clears the active request state in one place.
Traditional request/response waits until the full assistant answer exists before sending the response body. Here, the response body starts immediately and remains open while chunks are flushed. The browser can render each chunk as soon as it arrives, which is what creates the live typing effect.
The response is plain text/plain over an HTTP stream. It is SSE-style in the sense that the server keeps one HTTP response open and progressively flushes data, but the client intentionally uses native fetch() streaming instead of EventSource.
EventSource is convenient for server-sent events, but it is less flexible for this chat flow:
- It is primarily
GET-oriented, while chat submissions naturally usePOST. - Request bodies and custom cancellation flow are cleaner with
fetch(). AbortControllerplugs directly intofetch().- Reading a
ReadableStreamteaches the same primitives used by many modern streaming APIs.
The frontend stores the current AbortController in Zustand. Pressing stop calls abort(), which cancels the browser request. That cancellation propagates to the Next.js route through request.signal. The route passes that signal to the OpenAI-compatible Gateway request and stops forwarding chunks when the client disconnects.
Cancellation is not just a UI state. It prevents wasted model work, closes network resources, and stops old stream loops from appending text after the user has moved on.
Stale streams happen when an older async reader loop resolves after a newer request has started. Without a guard, the old loop can append chunks to the wrong assistant message.
This app prevents that with:
activeRequestIdin Zustand- one generated request id per stream
- store methods that ignore chunks unless the request id still matches
- cleanup on component unmount
- cancellation before starting a new request
- Forgetting to check
response.bodybefore callinggetReader() - Decoding chunks without
TextDecoder, which can corrupt split UTF-8 characters - Appending chunks after a newer stream starts
- Treating aborts as user-visible failures
- Letting proxies buffer streamed responses
- Exposing
AI_GATEWAY_API_KEYto the frontend - Updating React state for every stream concern instead of keeping lifecycle state centralized
- Push the repo to GitHub.
- Import it in Vercel.
- Add environment variables:
AI_GATEWAY_API_KEYAI_GATEWAY_MODELsuch asopenai/gpt-5.4-mini
- Deploy.
vercel.json gives the chat route a 60 second max duration. The route also returns:
Cache-Control: no-cache, no-transform
X-Accel-Buffering: noThose headers discourage proxy buffering so chunks reach the browser progressively.
The API route uses the official OpenAI SDK pointed at Vercel AI Gateway's OpenAI-compatible base URL. That gives you one Gateway key for multiple providers while keeping the stream mechanics visible for study. The selected model is sent from the client as a model id like openai/gpt-5.4-mini or anthropic/claude-sonnet-4.6; the secret Gateway key stays server-side only.
If you want to compare this manual implementation with Vercel AI SDK helpers later, the clean boundary is app/api/chat/route.ts: replace the route internals while leaving the frontend reader loop intact.
Relevant official references: