Skip to content

RobinMillford/Cortex-AI-Multi-Model-Insights-Hub

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

39 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Cortex AI: Multi-Model Insights Hub

πŸ€– Advanced AI-Powered Document Analysis with Multimodal RAG Capabilities

Cortex AI Hub integrates multiple Large Language Models (LLMs) with a sophisticated Multimodal Retrieve-and-Generate (RAG) system, enabling you to extract insights from text, visual content, and video transcripts.

✨ NEW: Premium Dark Theme UI with Glassmorphism - Modern, sleek interface with neon green accents, smooth animations, and frosted glass effects!


🌟 Key Features

πŸ“Ί YouTube Analyst ⭐ NEW!

  • 🎬 Video Transcript Extraction: Automatically fetch YouTube video transcripts
  • πŸ“ AI-Powered Summaries: Generate comprehensive video summaries with key takeaways
  • πŸ’¬ Interactive Chat: Ask questions about video content using RAG technology
  • πŸ” Hybrid Search: Semantic + keyword search across video transcripts
  • ⚑ Real-Time Analysis: Instant insights from any YouTube video

πŸ–ΌοΈ Multimodal RAG

  • πŸ“Š Visual Content Understanding: Analyze images, charts, graphs, and infographics
  • πŸ”— Unified Text-Image Search: Search across both textual and visual content
  • 🎯 Context-Aware Analysis: Enhanced understanding with specialized prompts
  • πŸ’Ύ Persistent Storage: Efficient multimodal embeddings with pickle storage
  • πŸ†“ Free & Local: Uses open-source models (BLIP, BLIP-2, GIT)

πŸ” Advanced Search & RAG

  • 🧠 Hybrid Search: Combines semantic vector search with BM25 keyword search
  • πŸ“‚ Multi-Document Support: Upload PDFs or provide URLs
  • πŸ’Ύ Persistent Vector Database: ChromaDB-powered storage
  • βœ… Accurate Citations: Source-linked responses with references

πŸ€– AI-Powered Search Agent

  • 🌐 Real-Time Research: ArXiv, Wikipedia, and Tavily web search tools
  • πŸ“° Current Information: Up-to-date news and research insights
  • ⚑ Instant Responses: Fast, context-aware answers
  • πŸ”Š Text-to-Speech: Read aloud feature using Edge TTS (en-US-AriaNeural voice)

🎨 Premium UI/UX

  • πŸŒ™ Glassmorphic Dark Theme: Sleek dark interface with frosted glass effects
  • ✨ Smooth Animations: Hover effects, transitions, and micro-animations
  • 🎨 Modern Typography: Inter font family with gradient text effects
  • πŸ“± Responsive Design: Works beautifully on all screen sizes
  • πŸ’« Neon Accents: Eye-catching neon green highlights

πŸš€ Supported AI Models

Model Provider Best For
llama-3.3-70b-versatile Meta Complex reasoning, analysis
llama-3.1-8b-instant Meta Quick queries, fast responses
meta-llama/llama-guard-4-12b Meta Safety and content moderation
openai/gpt-oss-120b OpenAI Complex analysis tasks
openai/gpt-oss-20b OpenAI Balanced performance

πŸ–ΌοΈ Vision Models

Model Description Best For
BLIP Quick image captioning Speed, basic analysis
BLIP-2 Advanced understanding Complex visual content
GIT Detailed descriptions Charts, graphs, infographics

πŸ“Έ Application Screenshots

πŸ€– RAG Chatbot Interface

RAG Chatbot Interface Traditional RAG chatbot with document upload and multi-LLM selection

πŸ–ΌοΈ Multimodal RAG Interface

Multimodal RAG Interface Enhanced multimodal interface with vision model selection and image analysis

πŸ” Search Agent Interface

Search Agent Interface AI-powered search agent with real-time research capabilities


πŸ”„ System Architecture

πŸ“Š RAG Chatbot Workflow

RAG Chatbot Workflow Complete RAG chatbot workflow with document processing, hybrid search, and multi-LLM response generation

πŸ€– Search Agent Workflow

Search Agent Workflow AI-powered search agent workflow with multi-tool research and intelligent orchestration

πŸ–ΌοΈ Multimodal RAG Workflow

Multimodal RAG Workflow Enhanced multimodal workflow combining text and visual content analysis


πŸš€ Getting Started

πŸ“‹ Prerequisites

  • Python 3.11+
  • Git
  • API Keys: Groq and Tavily

πŸ“₯ Installation

  1. Clone Repository

    git clone https://github.com/RobinMillford/Cortex-AI-Multi-Model-Insights-Hub.git
    cd Cortex-AI-Multi-Model-Insights-Hub
  2. Setup Environment

    python -m venv venv
    source venv/bin/activate  # Windows: venv\Scripts\activate
    pip install -r requirements.txt
  3. Configure API Keys

    cp .env.template .env
    # Add your GROQ_API_KEY and TAVILY_API_KEY to .env
  4. Run Application

    streamlit run Main_Page.py

🌐 Live Demo

πŸš€ Try it now


πŸ“– Usage Guide

πŸ“Ί YouTube Video Analysis ⭐ NEW!

  1. Navigate to "YouTube Analyst" page
  2. Paste a YouTube URL in the sidebar
  3. Click "Analyze Video" to extract transcript
  4. View auto-generated summary
  5. Ask questions about the video content
  6. Get AI-powered insights with context from the transcript

πŸ–ΌοΈ Multimodal Document Analysis

  1. Navigate to "Multimodal RAG" page
  2. Choose vision model (BLIP for speed, GIT for accuracy)
  3. Upload PDF with images/charts
  4. Enable "Extract and analyze images"
  5. Ask questions about text and visual content

πŸ“„ Traditional Document Chat

  1. Go to "RAG Chatbot" page
  2. Upload PDFs or enter URLs
  3. Configure retrieval parameters
  4. Select LLM models for comparison
  5. Ask questions and get cited responses

πŸ” Research & Web Search

  1. Visit "Search Agent" page
  2. Enter research queries
  3. Choose preferred LLM model
  4. Get real-time answers with sources

πŸ› οΈ Technology Stack

  • Frontend: Streamlit with premium glassmorphic dark theme
  • Backend: Python, LangChain/LangGraph
  • Vector DB: ChromaDB (text embeddings)
  • Embeddings: HuggingFace sentence-transformers
  • Vision: BLIP, BLIP-2, GIT (Hugging Face Transformers)
  • LLMs: Groq API (Meta Llama, OpenAI models)
  • Search: Tavily, ArXiv, Wikipedia APIs
  • Video: YouTube Transcript API
  • Text-to-Speech: Edge TTS (Microsoft Azure Neural Voices)

πŸ“ Project Structure

β”œβ”€β”€ Main_Page.py                 # App entry point with hero section
β”œβ”€β”€ multimodal_helpers.py        # Multimodal processing utilities
β”œβ”€β”€ helpers.py                   # Text processing utilities
β”œβ”€β”€ chain_setup.py               # LLM configuration
β”œβ”€β”€ styles.py                    # Premium dark theme CSS
β”œβ”€β”€ config.py                    # Model configurations
β”œβ”€β”€ pages/
β”‚   β”œβ”€β”€ 1_RAG_Chatbot.py        # Traditional RAG interface
β”‚   β”œβ”€β”€ 2_Search_Agent.py       # Web search agent
β”‚   β”œβ”€β”€ 3_Multimodal_RAG.py     # Multimodal interface
β”‚   └── 4_YouTube_Analyst.py    # YouTube video analysis ⭐ NEW!
β”œβ”€β”€ chroma_db/                   # Text vector storage
β”œβ”€β”€ multimodal_stores/           # Multimodal embeddings storage
└── requirements.txt             # Python dependencies

πŸ”§ Key Technical Features

🧠 Architecture Highlights

  • YouTube Integration: Transcript extraction with RAG-powered Q&A
  • Two-Layer Vision: Vision models β†’ descriptions, embeddings β†’ search
  • Hybrid Search: Semantic + BM25 for optimal retrieval
  • Model Caching: Global cache prevents reloading
  • Session Management: Streamlit state for persistence
  • Glassmorphism UI: Backdrop blur and frosted glass effects

⚑ Performance Optimizations

  • Vision models cached globally
  • Processed embeddings saved for reuse
  • Lazy loading when needed
  • Real-time progress feedback
  • Efficient pickle-based storage
  • Optimized ChromaDB collection naming

🎨 UI/UX Enhancements

  • Glassmorphic Design: Frosted glass effects with backdrop blur
  • Gradient Text Effects: Animated gradient titles
  • Smooth Animations: Cubic-bezier transitions
  • Neon Glow Effects: Interactive hover states
  • Modern Typography: Inter font family
  • Custom Scrollbars: Styled with gradient effects
  • Enhanced Components: Buttons, inputs, expanders, and more

πŸ“ Recent Updates

✨ Version 3.0 (Latest)

  • πŸ“Ί YouTube Analyst: NEW feature for video transcript analysis and chat
  • πŸ”Š Text-to-Speech: Read aloud feature in Search Agent using Edge TTS
  • 🎨 Glassmorphic UI: Complete redesign with frosted glass effects
  • πŸ”€ Inter Font: Modern typography with gradient text effects
  • ✨ Enhanced Animations: Smooth cubic-bezier transitions
  • 🎯 Improved Components: All UI elements redesigned
  • πŸ“Š Updated Main Page: 2x2 grid layout for 4 tools
  • πŸ”§ CSS Centralization: Unified styles.py for consistency

✨ Version 2.0

  • 🎨 Premium Dark Theme: Complete UI overhaul with modern design
  • πŸ€– Updated Model List: Added llama-guard-4-12b, removed deprecated models
  • πŸ”§ Dependency Cleanup: Removed pysqlite3-binary for better compatibility
  • ✨ Enhanced Animations: Smooth transitions and hover effects
  • πŸ“Š Stats Section: Added visual statistics on main page
  • 🎯 Improved Navigation: Better sidebar organization

🀝 Contributing

  1. Fork the repository
  2. Create feature branch: git checkout -b feature/your-feature
  3. Make changes and test locally
  4. Commit and push: git commit -m "Add feature"
  5. Create Pull Request

🎯 Areas for Contribution

  • πŸ“Ί Enhanced video analysis features
  • πŸ–ΌοΈ New vision models or analysis techniques
  • πŸ” Better retrieval algorithms
  • 🎨 UI/UX improvements
  • πŸ“Š Analytics and metrics
  • πŸ§ͺ Testing and documentation

πŸ“ License

This project is licensed under the AGPL-3.0 License.


πŸ†˜ Support


πŸ™ Acknowledgments

  • πŸ€— Hugging Face: Free open-source vision models
  • πŸ¦™ Meta: Llama models and vision transformers
  • πŸ” Salesforce: BLIP vision models
  • 🏒 Microsoft: GIT vision model
  • ⚑ Groq: Fast LLM inference
  • 🌐 Streamlit: Amazing app framework
  • πŸ”Ž Tavily: Advanced web search API
  • πŸ“Ί YouTube Transcript API: Video transcript extraction

Made with ❀️ by Yamin Hossain

About

Cortex AI: Multi-Model Insights Hub is an advanced platform that leverages cutting-edge AI to empower your research, analysis, and data exploration. By integrating multiple Large Language Models (LLMs) with a sophisticated Retrieve-and-Generate (RAG) system

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages