An intelligent question-answering system that routes questions to different reasoning strategies based on their characteristics (breadth and depth).
# Install dependencies
uv sync
# Run demo (no API keys needed - uses mock LM)
uv run python -m thinking.scripts.demo_proof_of_concept
# Run with real LM via OpenRouter
export OPENROUTER_API_KEY='sk-or-v1-...'
uv run python -m thinking.scripts.lm_proof_of_conceptThis project focuses on routing intelligence—analyzing a question and selecting the optimal reasoning strategy for answering it. This is distinct from model routing, which routes questions to different model tiers based on cost.
| Aspect | Routing Intelligence | Model Routing |
|---|---|---|
| What is routed | Questions → Reasoning forms | Questions → Model tiers |
| Basis for routing | Question complexity (breadth/depth) | Question complexity + cost constraints |
| Goal | Match reasoning strategy to question type | Minimize cost while maintaining quality |
The system implements 6 reasoning modes:
| Mode | Description | Best For |
|---|---|---|
| DIRECT | Single prediction | Simple factual questions |
| COT (Chain of Thought) | Step-by-step reasoning | Questions requiring logical steps |
| TOT (Tree of Thoughts) | Multiple interpretation paths | Ambiguous questions with multiple valid interpretations |
| GoT (Graph of Thoughts) | Interconnected concept analysis | Complex multi-domain queries |
| AoT (Atom of Thoughts) | First-principles decomposition | Technical troubleshooting, deep understanding |
| COMBINED | Multi-strategy synthesis | Complex problems requiring multiple approaches |
Question → Breadth/Depth Analysis → Classifier → Reasoning Mode Selection → Answer
Does this question require information from multiple domains?
- Low: "What's my total revenue this month?"
- High: "How should I restructure my product categories for better SEO and easier inventory management?"
Does this question require understanding underlying mechanisms?
- Low: "Show me my top 10 products"
- High: "Why is my inventory sync failing for this specific SKU?"
- How well can AI classify questions by their reasoning needs?
- Can thresholds improve routing robustness?
- Does GEPA optimization improve classifier accuracy?
src/thinking/
├── core/ # Routing + reasoning modes
├── scripts/ # Runnable demos (python -m thinking.scripts...)
├── experiments/ # Evaluation harness + reports
├── optimizations/ # Training/evaluation utilities (GEPA/teleprompt)
└── docs/ # Detailed documentation
For the Shopify Sidekick use case, see:
- research_summary/Shopify/PROPOSAL.md - Enhancement proposal
- research_summary/Shopify/SHOPIFY.md - Shopify-specific details
- research_summary/Shopify/SHOPIFY_INTEGRATION.md - Integration strategy
This project draws on the public case study available at claude.com/customers/shopify.