π€ Advanced AI-Powered Document Analysis with Multimodal RAG Capabilities
Cortex AI Hub integrates multiple Large Language Models (LLMs) with a sophisticated Multimodal Retrieve-and-Generate (RAG) system, enabling you to extract insights from text, visual content, and video transcripts.
β¨ NEW: Premium Dark Theme UI with Glassmorphism - Modern, sleek interface with neon green accents, smooth animations, and frosted glass effects!
- π¬ Video Transcript Extraction: Automatically fetch YouTube video transcripts
- π AI-Powered Summaries: Generate comprehensive video summaries with key takeaways
- π¬ Interactive Chat: Ask questions about video content using RAG technology
- π Hybrid Search: Semantic + keyword search across video transcripts
- β‘ Real-Time Analysis: Instant insights from any YouTube video
- π Visual Content Understanding: Analyze images, charts, graphs, and infographics
- π Unified Text-Image Search: Search across both textual and visual content
- π― Context-Aware Analysis: Enhanced understanding with specialized prompts
- πΎ Persistent Storage: Efficient multimodal embeddings with pickle storage
- π Free & Local: Uses open-source models (BLIP, BLIP-2, GIT)
- π§ Hybrid Search: Combines semantic vector search with BM25 keyword search
- π Multi-Document Support: Upload PDFs or provide URLs
- πΎ Persistent Vector Database: ChromaDB-powered storage
- β Accurate Citations: Source-linked responses with references
- π Real-Time Research: ArXiv, Wikipedia, and Tavily web search tools
- π° Current Information: Up-to-date news and research insights
- β‘ Instant Responses: Fast, context-aware answers
- π Text-to-Speech: Read aloud feature using Edge TTS (en-US-AriaNeural voice)
- π Glassmorphic Dark Theme: Sleek dark interface with frosted glass effects
- β¨ Smooth Animations: Hover effects, transitions, and micro-animations
- π¨ Modern Typography: Inter font family with gradient text effects
- π± Responsive Design: Works beautifully on all screen sizes
- π« Neon Accents: Eye-catching neon green highlights
| Model | Provider | Best For |
|---|---|---|
| llama-3.3-70b-versatile | Meta | Complex reasoning, analysis |
| llama-3.1-8b-instant | Meta | Quick queries, fast responses |
| meta-llama/llama-guard-4-12b | Meta | Safety and content moderation |
| openai/gpt-oss-120b | OpenAI | Complex analysis tasks |
| openai/gpt-oss-20b | OpenAI | Balanced performance |
| Model | Description | Best For |
|---|---|---|
| BLIP | Quick image captioning | Speed, basic analysis |
| BLIP-2 | Advanced understanding | Complex visual content |
| GIT | Detailed descriptions | Charts, graphs, infographics |
Traditional RAG chatbot with document upload and multi-LLM selection
Enhanced multimodal interface with vision model selection and image analysis
AI-powered search agent with real-time research capabilities
Complete RAG chatbot workflow with document processing, hybrid search, and multi-LLM response generation
AI-powered search agent workflow with multi-tool research and intelligent orchestration
Enhanced multimodal workflow combining text and visual content analysis
- Python 3.11+
- Git
- API Keys: Groq and Tavily
-
Clone Repository
git clone https://github.com/RobinMillford/Cortex-AI-Multi-Model-Insights-Hub.git cd Cortex-AI-Multi-Model-Insights-Hub -
Setup Environment
python -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate pip install -r requirements.txt
-
Configure API Keys
cp .env.template .env # Add your GROQ_API_KEY and TAVILY_API_KEY to .env -
Run Application
streamlit run Main_Page.py
- Navigate to "YouTube Analyst" page
- Paste a YouTube URL in the sidebar
- Click "Analyze Video" to extract transcript
- View auto-generated summary
- Ask questions about the video content
- Get AI-powered insights with context from the transcript
- Navigate to "Multimodal RAG" page
- Choose vision model (BLIP for speed, GIT for accuracy)
- Upload PDF with images/charts
- Enable "Extract and analyze images"
- Ask questions about text and visual content
- Go to "RAG Chatbot" page
- Upload PDFs or enter URLs
- Configure retrieval parameters
- Select LLM models for comparison
- Ask questions and get cited responses
- Visit "Search Agent" page
- Enter research queries
- Choose preferred LLM model
- Get real-time answers with sources
- Frontend: Streamlit with premium glassmorphic dark theme
- Backend: Python, LangChain/LangGraph
- Vector DB: ChromaDB (text embeddings)
- Embeddings: HuggingFace sentence-transformers
- Vision: BLIP, BLIP-2, GIT (Hugging Face Transformers)
- LLMs: Groq API (Meta Llama, OpenAI models)
- Search: Tavily, ArXiv, Wikipedia APIs
- Video: YouTube Transcript API
- Text-to-Speech: Edge TTS (Microsoft Azure Neural Voices)
βββ Main_Page.py # App entry point with hero section
βββ multimodal_helpers.py # Multimodal processing utilities
βββ helpers.py # Text processing utilities
βββ chain_setup.py # LLM configuration
βββ styles.py # Premium dark theme CSS
βββ config.py # Model configurations
βββ pages/
β βββ 1_RAG_Chatbot.py # Traditional RAG interface
β βββ 2_Search_Agent.py # Web search agent
β βββ 3_Multimodal_RAG.py # Multimodal interface
β βββ 4_YouTube_Analyst.py # YouTube video analysis β NEW!
βββ chroma_db/ # Text vector storage
βββ multimodal_stores/ # Multimodal embeddings storage
βββ requirements.txt # Python dependencies
- YouTube Integration: Transcript extraction with RAG-powered Q&A
- Two-Layer Vision: Vision models β descriptions, embeddings β search
- Hybrid Search: Semantic + BM25 for optimal retrieval
- Model Caching: Global cache prevents reloading
- Session Management: Streamlit state for persistence
- Glassmorphism UI: Backdrop blur and frosted glass effects
- Vision models cached globally
- Processed embeddings saved for reuse
- Lazy loading when needed
- Real-time progress feedback
- Efficient pickle-based storage
- Optimized ChromaDB collection naming
- Glassmorphic Design: Frosted glass effects with backdrop blur
- Gradient Text Effects: Animated gradient titles
- Smooth Animations: Cubic-bezier transitions
- Neon Glow Effects: Interactive hover states
- Modern Typography: Inter font family
- Custom Scrollbars: Styled with gradient effects
- Enhanced Components: Buttons, inputs, expanders, and more
- πΊ YouTube Analyst: NEW feature for video transcript analysis and chat
- π Text-to-Speech: Read aloud feature in Search Agent using Edge TTS
- π¨ Glassmorphic UI: Complete redesign with frosted glass effects
- π€ Inter Font: Modern typography with gradient text effects
- β¨ Enhanced Animations: Smooth cubic-bezier transitions
- π― Improved Components: All UI elements redesigned
- π Updated Main Page: 2x2 grid layout for 4 tools
- π§ CSS Centralization: Unified styles.py for consistency
- π¨ Premium Dark Theme: Complete UI overhaul with modern design
- π€ Updated Model List: Added llama-guard-4-12b, removed deprecated models
- π§ Dependency Cleanup: Removed pysqlite3-binary for better compatibility
- β¨ Enhanced Animations: Smooth transitions and hover effects
- π Stats Section: Added visual statistics on main page
- π― Improved Navigation: Better sidebar organization
- Fork the repository
- Create feature branch:
git checkout -b feature/your-feature - Make changes and test locally
- Commit and push:
git commit -m "Add feature" - Create Pull Request
- πΊ Enhanced video analysis features
- πΌοΈ New vision models or analysis techniques
- π Better retrieval algorithms
- π¨ UI/UX improvements
- π Analytics and metrics
- π§ͺ Testing and documentation
This project is licensed under the AGPL-3.0 License.
- π Issues: GitHub Issues
- π¬ Discussions: GitHub Discussions
- π€ Hugging Face: Free open-source vision models
- π¦ Meta: Llama models and vision transformers
- π Salesforce: BLIP vision models
- π’ Microsoft: GIT vision model
- β‘ Groq: Fast LLM inference
- π Streamlit: Amazing app framework
- π Tavily: Advanced web search API
- πΊ YouTube Transcript API: Video transcript extraction
Made with β€οΈ by Yamin Hossain