RAG-Ingest: A tool for converting PDFs to markdown and indexing them for enhanced Retrieval Augmented Generation (RAG) capabilities.
-
Updated
Nov 22, 2024 - Python
RAG-Ingest: A tool for converting PDFs to markdown and indexing them for enhanced Retrieval Augmented Generation (RAG) capabilities.
Case study using dotfurther's Open Discover Platform with the RavenDB document store to rapidly create a full-text search/eDiscovery/information governance capable demonstration application.
Self-hosted RAG engine for AI coding assistants. Ingests technical docs & code repositories locally with structure-aware chunking. Serves grounded context via MCP to prevent hallucinations in software development workflows.
Drop-in RAG and AI agent toolkit for Laravel — parse, chunk, embed, search, chat.
A simple RAG toolkit.
A modular, self-hosted RAG pipeline for building a private, searchable personal knowledge base from PDFs and structured documents.
Extension of the original MemPalace with document ingestion, image assets, MCP tooling, and folder watch.
An AI Analytics Dashboard for research labs analytics, collaboration, and email workflow using React and FastAPI.
Production-grade RAG backend for document ingestion and semantic retrieval using embeddings and Pinecone.
Production-grade RAG chatbot with a FastAPI + LangGraph backend (Pinecone vector search + Groq LLM + Tavily web fallback) and a Streamlit chat UI, secured via API key and observable in LangSmith.
Plantilla n8n para la ingesta automatizada de documentos en sistemas RAG. Extrae, fragmenta y vectoriza datos locales hacia Supabase pgvector.
Self-hosted RAG prototype to ingest PDFs/HTML and chat with them via a local UI
An implementation of the GraphRAG pipeline (based on the 2024 paper "From Local to Global" by Edge et al.) for query-focused summarization of large text corpora.
A local AI-powered chat assistant that queries a persistent knowledge wiki built from your documents. Ingests PDFs, URLs, and markdown via Docling, extracts entities/concepts with LLMs, indexes in Qdrant for semantic search, and serves a chat UI—all running offline with Ollama. Inspired by Karpathy's LLM Wiki pattern.
Store millions of text chunks inside ultra-compact MP4 files, index them with local embeddings, and retrieve answers instantly for fully offline RAG with any LLM.
AI-powered RAG assistant for parents to get instant, context-aware answers on Brainwonders’ career counseling programs, pricing, and services. Built with Streamlit, LangChain, ChromaDB, and Google Gemma LLM for fast, multi-document retrieval and conversational Q&A.
Agentic RAG Chatbot using multi-agent architecture and Streamlit. Ingests PDFs, DOCX, PPTX, CSV, TXT, and Markdown files to provide contextually accurate answers with a persistent knowledge base. Supports multi-turn conversations, source citations, and dynamic document uploads.
한국어 GraphRAG 지식 관리 시스템 — 하이브리드 검색, 2-stage 인제스트, 엣지 LLM distill
Enterprise Document Ingestion with AI Embeddings — Multi-source ingestion pipeline with pgvector, GDPR compliance, and MCP server
Procurement intelligence and spend operating system for ingesting messy records, normalizing catalogs, surfacing savings/leakage, and giving operators proof-backed workflows, compliance controls, offers, operations, and pilot dashboards.
Add a description, image, and links to the document-ingestion topic page so that developers can more easily learn about it.
To associate your repository with the document-ingestion topic, visit your repo's landing page and select "manage topics."