Skip to content

phalla-doll/schema-to-sql

Repository files navigation

Schema-to-SQL AI

Turn database schema dumps into natural-language SQL queries using AI.

Upload your database schema catalog, ask questions in plain English, and get safe, schema-aware SQL you can run immediately.


πŸš€ What This Project Does

This web app allows users to:

  1. Upload a database schema catalog dump (SQL Server / Azure SQL supported first)
  2. Visualize the schema in a clean, navigable tree
  3. Ask questions like:

    "Get all flight numbers from the last 2 weeks"

  4. Receive:
    • Valid SQL queries
    • Generated only from the uploaded schema
    • With zero database access required

The AI never touches your database β€” it only reasons over your schema.


✨ Key Features

  • πŸ“‚ Schema upload (catalog dump)
  • 🌳 Interactive schema explorer
  • 🧠 Natural language β†’ SQL generation
  • πŸ”’ Schema-safe AI (no hallucinated tables/columns)
  • 🧾 SQL Server / Azure SQL dialect support
  • 🧩 Foreign-key–aware joins
  • πŸ§ͺ Query validation before output

πŸ—οΈ Architecture Overview

User Input (English)
        ↓
Intent Extraction
        ↓
Relevant Table Detection
        ↓
Schema Subset Selection
        ↓
AI SQL Generation
        ↓
Static Validation
        ↓
Final SQL Output

🧠 How the AI Works (Important)

This project does NOT fine-tune a model.

Instead, it uses a RAG-style prompting strategy:

  • The uploaded schema is parsed into structured JSON
  • Only relevant tables and columns are injected into the prompt
  • The AI is strictly instructed to:
    • Use only provided schema
    • Ask for clarification if ambiguous
    • Output SQL only (no explanations unless requested)

This makes the system:

  • Safer
  • Cheaper
  • Easier to maintain

🧩 Supported Use Cases

  • Legacy databases with hundreds of tables
  • Non-SQL users (PMs, analysts, ops)
  • Onboarding new engineers
  • Exploring unfamiliar schemas
  • Generating safe ad-hoc queries

πŸ› οΈ Tech Stack

Frontend

  • Next.js (App Router)
  • TypeScript
  • Tailwind CSS
  • Schema Tree Viewer (custom or Radix)
  • Monaco Editor (SQL preview)

Backend

  • Next.js API Routes / Server Actions
  • Schema parser (custom)
  • JSON-based schema store

AI

  • OpenAI / Anthropic / compatible LLM
  • Prompt-based RAG (no fine-tuning)

🎨 UI/Layout Design

The interface follows a ChatGPT-inspired layout with a clean, dual-panel design:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                                                                         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚      SCHEMA VIEWER              β”‚  β”‚      CHAT INTERFACE          β”‚ β”‚
β”‚  β”‚                                 β”‚  β”‚                               β”‚ β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚
β”‚  β”‚  β”‚ πŸ” Search tables...       β”‚ β”‚  β”‚  β”‚ User:                   β”‚ β”‚ β”‚
β”‚  β”‚  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚  β”‚  β”‚ "Get all flight numbers β”‚ β”‚ β”‚
β”‚  β”‚  β”‚ β–Ό t_batch                 β”‚ β”‚  β”‚  β”‚  from the last 2 weeks" β”‚ β”‚ β”‚
β”‚  β”‚  β”‚   β”œβ”€ flightNumber         β”‚ β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚
β”‚  β”‚  β”‚   β”œβ”€ flightDateTime       β”‚ β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚
β”‚  β”‚  β”‚   └─ flightStatus         β”‚ β”‚  β”‚  β”‚ AI:                     β”‚ β”‚ β”‚
β”‚  β”‚  β”‚ β–Ό t_customer              β”‚ β”‚  β”‚  β”‚ SELECT DISTINCT          β”‚ β”‚ β”‚
β”‚  β”‚  β”‚   β”œβ”€ customerId           β”‚ β”‚  β”‚  β”‚   flightNumber          β”‚ β”‚ β”‚
β”‚  β”‚  β”‚   β”œβ”€ customerName         β”‚ β”‚  β”‚  β”‚ FROM t_batch            β”‚ β”‚ β”‚
β”‚  β”‚  β”‚   └─ customerEmail        β”‚ β”‚  β”‚  β”‚ WHERE flightDateTime    β”‚ β”‚ β”‚
β”‚  β”‚  β”‚ β–Ά t_booking               β”‚ β”‚  β”‚  β”‚   >= DATEADD(day, -14,  β”‚ β”‚ β”‚
β”‚  β”‚  β”‚ β–Ά t_aircraft              β”‚ β”‚  β”‚  β”‚       GETDATE());      β”‚ β”‚ β”‚
β”‚  β”‚  β”‚ ...                       β”‚ β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚
β”‚  β”‚  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚  β”‚                               β”‚ β”‚
β”‚  β”‚  β”‚ [β–Ό Expand All] [β–² Collapse]β”‚ β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚
β”‚  β”‚  β”‚     [Copy Selected]        β”‚ β”‚  β”‚  β”‚ User:                   β”‚ β”‚ β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚  β”‚  β”‚ [Type your query...]     β”‚ β”‚ β”‚
β”‚  β”‚                                 β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚
β”‚  β”‚ [Schema Stats: 5 tables]       β”‚  β”‚                               β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                                                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Panel Descriptions

Schema Viewer Panel (Left)

  • Hierarchical tree view of database tables and columns
  • Search/filter functionality for quick navigation
  • Click to expand/collapse individual tables
  • Bulk controls: "Expand All" / "Collapse All"
  • Copy selected table/column names to clipboard
  • Schema statistics display (table count, etc.)

Chat Interface Panel (Right)

  • ChatGPT-style conversation interface
  • User messages aligned left, AI responses aligned right
  • AI responses include:
    • Generated SQL queries with syntax highlighting
    • Brief explanations (when requested)
  • Input box at bottom for natural language queries
  • Conversation history with scrollable view
  • Copy SQL button on generated queries

Expand/Collapse Behavior

  • Individual Tables: Click triangle icon (β–Ά/β–Ό) to toggle
  • Bulk Actions:
    • "Expand All" - Opens all tables showing all columns
    • "Collapse All" - Closes all tables to show only names
  • Persistent State: Panel state saved between sessions
  • Responsive: On smaller screens, panels stack vertically
  • Split Control: Drag handle between panels to resize (optional)

Layout Features

  • Clean, minimal design focused on usability
  • Dark/light mode support
  • Keyboard shortcuts (e.g., Cmd+K for search)
  • Mobile-responsive with collapsible schema panel
  • Syntax highlighting for SQL output
  • Copy-to-clipboard for easy query extraction

πŸ“ Project Structure (Proposed)

/app
  /upload
  /schema
  /query
  /history

/lib
  ai/
    promptBuilder.ts
    sqlValidator.ts
  schema/
    parser.ts
    normalizer.ts
    matcher.ts

/api
  upload-schema
  generate-sql

/types
  schema.ts
  query.ts

πŸ“„ Example Workflow

User Input

I want all flight numbers between the last 2 weeks

Detected Schema

  • Table: t_batch
  • Columns:
    • flightNumber
    • flightDateTime

Generated SQL

SELECT DISTINCT flightNumber
FROM t_batch
WHERE flightDateTime >= DATEADD(day, -14, GETDATE())
  AND flightNumber IS NOT NULL;

πŸ§ͺ Query Safety Rules

The AI is constrained by:

  • ❌ No unknown tables
  • ❌ No guessed columns
  • ❌ No destructive queries (DROP, DELETE, UPDATE by default)
  • βœ… SELECT-only for MVP
  • βœ… SQL Server–compatible syntax

πŸ—ΊοΈ Roadmap

Phase 1 (MVP)

  • Schema upload
  • Schema viewer
  • Natural language β†’ SELECT SQL

Phase 2

  • JOINs & aggregates
  • Query explanation
  • Ambiguity resolution

Phase 3

  • Multi-database support
  • Query optimization hints
  • Saved prompts & history
  • Role-based access

πŸ” Security Considerations

  • No database credentials required
  • No query execution on server
  • Schema data isolated per user/project
  • Prompt injection mitigation via strict templates

🧠 Limitations (By Design)

  • Does not execute SQL
  • Requires reasonably clean schema dumps
  • Ambiguous language may require clarification

🀝 Contribution

PRs are welcome!

Ideas to contribute:

  • New schema parsers
  • Better intent detection
  • More SQL dialects
  • Visualization improvements

πŸ“œ License

MIT License


πŸ“Œ Vision

Databases shouldn’t be locked behind SQL fluency.

This project aims to make complex schemas understandable and queryable by anyone β€” safely, transparently, and intelligently.


If you plan to open-source this, this README is already structured for GitHub. If you want, I can:

  • Add badges
  • Make a SaaS version README
  • Write pitch copy for landing page
  • Generate API docs

Just say the word πŸš€

About

Turn database schema dumps into natural-language SQL queries using AI.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors