document-ingestion

Here are 31 public repositories matching this topic...

irahardianto / qurio

Self-hosted RAG engine for AI coding assistants. Ingests technical docs & code repositories locally with structure-aware chunking. Serves grounded context via MCP to prevent hallucinations in software development workflows.

Updated Feb 15, 2026
Go

iamarunbrahma / rag-ingest

Star

RAG-Ingest: A tool for converting PDFs to markdown and indexing them for enhanced Retrieval Augmented Generation (RAG) capabilities.

information-retrieval aws-s3 document-ingestion hybrid-search qdrant llamaindex retrieval-augmented-generation ollama pdf-to-markdown contextual-retrieval qdrant-rag

Updated Nov 22, 2024
Python

dotfurther / OpenDiscoverPlatformCaseStudy

Star

Case study using dotfurther's Open Discover Platform with the RavenDB document store to rapidly create a full-text search/eDiscovery/information governance capable demonstration application.

metadata text-extraction full-text full-text-search ravendb ediscovery indexing-engine file-format-detection data-breach file-deduplication pii information-governance-catalog personally-identifiable-information archive-extractor pii-detection file-identification full-text-extraction document-ingestion information-governance

Updated May 28, 2024

LarAIgent / larai-kit

Star

Drop-in RAG and AI agent toolkit for Laravel — parse, chunk, embed, search, chat.

agent laravel ai chatbot embeddings gemini laravel-package openai semantic-search claude pinecone rag vector-search document-ingestion ai-agent llm anthropic pgvector laravel-ai

Updated May 2, 2026
PHP

hamittokay / context-window

Star

A simple RAG toolkit.

typescript embeddings openai chunking knowledge-base pinecone ai-toolkit rag vector-search document-ingestion vector-database rag-pipeline source-citations rag-toolkit

Updated Nov 8, 2025
TypeScript

RoodyCode / rag

Star

A modular, self-hosted RAG pipeline for building a private, searchable personal knowledge base from PDFs and structured documents.

redis postgres ai self-hosted embeddings semantic-search personal-knowledge-base rag pdf-processing document-ingestion vector-database llamaindex retrieval-augmented-generation

Updated Apr 6, 2026
Python

qudadd / mempalace-extended

Star

Extension of the original MemPalace with document ingestion, image assets, MCP tooling, and folder watch.

mcp knowledge-management folder-watch multimodal rag local-first document-ingestion chromadb model-context-protocol mempalace

Updated Apr 9, 2026
Python

ScientistSameer / History-of-Lab-Records

Star

An AI Analytics Dashboard for research labs analytics, collaboration, and email workflow using React and FastAPI.

Updated Jan 4, 2026
JavaScript

ankit123nag / pdf-rag-assistant

Star

Production-grade RAG backend for document ingestion and semantic retrieval using embeddings and Pinecone.

nodejs docker redis express typescript embeddings semantic-search pinecone rag document-ingestion vector-database clerk langchain retrieval-augmented-generation

Updated Feb 8, 2026
JavaScript

brej-29 / rag-agent-workbench

Star

Production-grade RAG chatbot with a FastAPI + LangGraph backend (Pinecone vector search + Groq LLM + Tavily web fallback) and a Streamlit chat UI, secured via API key and observable in LangSmith.

chatbot arxiv semantic-search observability pinecone rag fastapi groq streamlit document-ingestion vector-database openalex llm langchain retrieval-augmented-generation langsmith langgraph tavily docling

Updated Jan 17, 2026
Python

LuynoxRD / n8n-rag-system

Star

Plantilla n8n para la ingesta automatizada de documentos en sistemas RAG. Extrae, fragmenta y vectoriza datos locales hacia Supabase pgvector.

rag n8n document-ingestion n8n-workflow supabase pgvector ollama

Updated Mar 26, 2026

msmrexe / graphrag-query-summarization

Star

An implementation of the GraphRAG pipeline (based on the 2024 paper "From Local to Global" by Edge et al.) for query-focused summarization of large text corpora.

university-project course-project knowledge-graph-construction rag hierarchial-language-model document-ingestion qfs query-focused-summarization leiden-algorithm system-2 retrieval-augmented-generation graphrag global-query llm-graph

Updated Nov 5, 2025
Python

SamD / selfhosted-rag-doc-chat-prototype

Star

Self-hosted RAG prototype to ingest PDFs/HTML and chat with them via a local UI

redis open-source ocr self-hosted astro multi-processing rag fastapi vector-search huggingface pdf-processing document-ingestion llm qdrant-vector-database chromadb local-llm retrieval-augmented-generation

Updated Apr 30, 2026
Python

framerecall / FrameRecall

Star

Store millions of text chunks inside ultra-compact MP4 files, index them with local embeddings, and retrieve answers instantly for fully offline RAG with any LLM.

python ai chatbot qr-code openai chunking semantic-search video-encoding memory-systems multimodal file-processing document-ingestion llm context-retrieval

Updated Jul 2, 2025
Python

JoshPola96 / brainwonders-parent-rag-qa

Star

AI-powered RAG assistant for parents to get instant, context-aware answers on Brainwonders’ career counseling programs, pricing, and services. Built with Streamlit, LangChain, ChromaDB, and Google Gemma LLM for fast, multi-document retrieval and conversational Q&A.