A Streamlit application using a Few-Shot LangChain SQL Agent (powered by Gemini-2.5-Flash) to query a MySQL database for T-shirt inventory data in natural language.
This project provides a robust, natural language interface for interacting with a hypothetical T-shirt inventory database (atliq_tshirts). It leverages the power of Large Language Models (LLMs) and LangChain's SQL Agent toolkit, specifically utilizing a Few-Shot Prompting technique to ensure high accuracy in converting user questions into correct SQL queries.
- User Input: The user enters a question into the Streamlit app (e.g., "What is the total cost of all small size t-shirts?").
- Few-Shot Selection: An
OpenAIEmbeddingsmodel and aChromavector store are used withSemanticSimilarityExampleSelectorto find the most relevant examples fromfewshots.py. - Prompt Construction: The selected examples are included in a
FewShotPromptTemplatealong with the database schema and the user's question, guiding the LLM (Gemini-2.5-Flash) to generate the correct SQL query. - SQL Execution: The LangChain SQL Agent executes the generated SQL query against the MySQL database (
atliq_tshirts). - Answer Generation: The LLM receives the SQL result and formats a final, human-readable answer, which is then displayed in the Streamlit application.
| Component | Technology / Library | Purpose |
|---|---|---|
| LLM | ChatGoogleGenerativeAI(model="gemini-2.5-flash") |
Core reasoning and SQL generation. |
| Agent Framework | LangChain SQL Agent Toolkit | Manages the NL-to-SQL execution process. |
| Database | MySQL (via SQLDatabase.from_uri) |
Stores the T-shirt inventory data. |
| Example Selector | SemanticSimilarityExampleSelector |
Retrieves the most contextually similar few-shot examples. |
| Embeddings | OpenAIEmbeddings |
Used for calculating semantic similarity between questions. |
| Frontend | Streamlit | Interactive web interface for the Q&A app. |
- Python 3.x
- A running MySQL server (configured with a database named
atliq_tshirtsand atshirtstable that matches the schema implied byfewshots.py). - Google API Key for the LLM (set as
GOOGLE_API_KEYin your environment or a.envfile). - OpenAI API Key (used for
OpenAIEmbeddingsas configured inlangchain_helper.py).
-
Clone the Repository:
git clone [your-repo-link] cd [your-repo-name] -
Set up Environment Variables: Create a file named
.envand/orsecret_key.py/openai_key.pyto store your API keys and database credentials (as referenced bylangchain_helper.py). -
Install Dependencies:
pip install -r requirements.txt # (Assuming you generate this file) # Required libraries include: streamlit, langchain-google-genai, langchain-community, langchain-openai, python-dotenv, pymysql, chromadb
-
Run the Streamlit App:
streamlit run main.py
The application will open in your web browser, ready to accept questions.
The project uses a small set of example question/SQL pairs to maximize the agent's performance.
| Input Example | Generated SQL Query | Answer |
|---|---|---|
| "How many t-shirts do we have left for nike in extra small size and white color?" | SELECT sum(stockquantity) FROM tshirts WHERE brand='Nike' AND color='White' AND size='XS' |
"We have 64 t-shirts left for Nike..." |
| "What is the total cost of all small size t-shirts?" | SELECT SUM(price * stockquantity) FROM tshirts WHERE size='S' |
"The total value of all the T-shirts with size S in the database is 16,379." |
| "If we have to sell all the Levis T-shirts today. How much revenue our store will generate without discount?" | SELECT SUM(price * stockquantity) FROM tshirts WHERE brand='Levi' |
"The total value of Levi brand t-shirts in stock is 24,421." |
You can extend this list in fewshots.py to cover more complex or unique query patterns for improved model accuracy.