-
This project evaluates and compares Microsoft Copilot (Power BI Copilot) with a custom-built Agentic Retrieval-Augmented Generation (RAG) solution.
-
The objective is to measure response accuracy, contextual understanding, and reasoning quality when querying enterprise data.
- Assess the accuracy of natural language responses across three approaches:
- Power BI Inbuilt Q&A
- Power BI Copilot (Fabric Enabled)
- Custom Agentic RAG Solution
- Establish a structured evaluation framework for enterprise data intelligence.
- Present end-to-end architecture, metrics, and findings for organizational demo and assessment.
- Azure SQL Database hosts fact and dimension tables.
- Synthetic data generated dynamically for the last two years.
- SQL Views expose analytical data models for Power BI and RAG components.
- Create semantic models and dashboards from SQL Views.
- Evaluate Power BI Q&A and Copilot for query handling and result accuracy.
- Framework: LangChain / LangGraph
- LLMs: OpenAI GPT models (LLM, embedding, moderation)
- Vector Store: Azure AI Search (Standard SKU)
- Backend: FastAPI
- UI: Chainlit
- Deployment: Azure Container Apps
- CI/CD: GitHub Actions
- Agent Hosting: LangGraph Cloud
| Stage | Description |
|---|---|
| Stage 1 | Evaluate results using Power BI Q&A. |
| Stage 2 | Enable Microsoft Fabric and test with Power BI Copilot. |
| Stage 3 | Build and deploy Agentic RAG solution using Azure SQL, LangGraph, and OpenAI models. |
| Stage 4 | Compare all three systems using consistent test queries and metrics. |
| Metric | Description |
|---|---|
| Accuracy | Alignment between query intent and retrieved results. |
| Relevance | Degree to which responses match business context. |
| Latency | Response generation time per query. |
| Interpretability | Transparency of model reasoning or data source. |
| Consistency | Repeatability of results under identical inputs. |
- Short-term memory (session-level) only.
- No authentication or user profiles.
- No long-term memory persistence.
- Test environment; read-only data access.
- Focus: query comprehension, reasoning, and grounded accuracy.
| Layer | Technology |
|---|---|
| Data | Azure SQL Database |
| Analytics | Power BI, Power BI Copilot |
| LLM Framework | LangChain / LangGraph |
| Model | OpenAI GPT |
| Embeddings | OpenAI Embeddings |
| Search | Azure AI Search |
| Backend | FastAPI |
| UI | Chainlit |
| Hosting | Azure Container Apps |
| CI/CD | GitHub Actions |
| Agent Platform | LangGraph Cloud |
-
Data Preparation:
Generate synthetic data and create SQL Views. -
Semantic Model Creation:
Build Power BI dataset with relationships. -
Containerized RAG System:
- Deploy FastAPI and Chainlit via Azure Container Apps.
- Auto-deploy via GitHub Actions (OIDC Authentication).
-
Agent Deployment:
Deploy RAG Agent to LangGraph Cloud.
- Execute predefined test queries across all three systems.
- Capture query, response, and latency results.
- Score each system on Accuracy, Relevance, and Consistency.
- Document observations and visualize results in Power BI.
- To be added in the final stage of RAG testing .