Skip to content

gautamgb/Context-Aware-Semantic-Router

Repository files navigation

Context-Aware Semantic Router

Classifies user queries by complexity and routes them to the right LLM automatically. Fast model for simple questions, heavy model for complex reasoning. Exposes per-request telemetry: model used, latency, and estimated cost.

Why This Exists

The default approach to LLM routing is either "send everything to GPT-4" (expensive) or "send everything to a small model" (bad quality). Neither works at scale. This router sits in front of your LLM calls and makes the routing decision for you — a fast classifier determines query complexity, then routes to the appropriate model. You get the quality of a heavy model where it matters and the speed/cost of a light model everywhere else.

I built this while working on AI-powered workflows where inference cost and latency were real constraints in production. The telemetry panel was critical — you can't optimize what you can't measure.

Key Features

  • Automatic query classification — fast model (gpt-4o-mini) classifies complexity before routing
  • Smart routing — simple queries go to gpt-4o-mini, complex queries go to gpt-4o
  • Streaming responses — real-time streamed output from whichever model handles the query
  • Per-request telemetry — see which model was used, response latency, and estimated API cost
  • Interactive chat UI — full message history with clean interface

Getting Started

git clone https://github.com/gautamgb/Context-Aware-Semantic-Router.git
cd Context-Aware-Semantic-Router
npm install
cp .env.example .env.local  # Add your OPENAI_API_KEY
npm run dev

Open http://localhost:3000 and start chatting. The telemetry panel shows routing decisions in real time.

Deploy to Vercel

  1. Push to GitHub and import in Vercel
  2. Set OPENAI_API_KEY in environment variables
  3. Optionally customize FAST_MODEL and HEAVY_MODEL
  4. Deploy

Tech Stack

  • Framework: Next.js 16 (App Router)
  • AI: OpenAI API (non-streaming classification, streaming responses)
  • Styling: Tailwind CSS
  • Language: TypeScript

Live Demo

seekgb.com

License

MIT

About

Smart LLM routing — classifies queries by complexity and routes to the right model. Exposes cost and latency telemetry.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages