Skip to content

rishivejani15/Intellifile

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧠 IntelliFile

IntelliFile is an AI-powered, privacy-first desktop search engine and conversational document intelligence system.

It scans your local hardware, indexes your documents by their semantic meaning, and lets you search for files using natural language (e.g. searching "tax return last year" instantly finds the relevant document even if those exact words are not in the filename). Additionally, you can seamlessly open specific documents and chat directly with them via an integrated local LLM.

Everything runs 100% offline. Zero data leaves your machine.

Built beautifully with Electron + React (Frontend) and highly optimized with Python + FAISS + SQLite + Llama-cpp (Backend).


📌 Complete Overview: What We Built

  1. Full-Device Drive Indexing: We migrated from basic folder scanning to rapid, complete full-drive hierarchical indexing. IntelliFile aggressively crawls, extracts and categorizes PDFs, Word documents, Excel spreadsheets, Text formats and code structures across Windows drives.
  2. Hybrid Semantic Search Engine: A fusion search engine utilizing vector embeddings (semantic meaning), BM25 (exact keyword match), and file-path hierarchy matching.
  3. Isolated 'Chat with File' Pipeline: A private Retrieval-Augmented Generation (RAG) system. Users double-click a search result to securely isolate that document context, ingest it locally via Qwen models running under Llama-cpp, and perform deep Q&A interactions directly against the file's text with zero hallucinations.
  4. Smart Versioning & Diffing: Intelligent tracking of file changes with specific format engines (Excel Diff Engine, Word Diff Engine, etc.) allowing users to spot detailed alterations across file versions over time.

🎯 Accuracy Strategies

To ensure that IntelliFile returns accurate searches and perfectly aligned chat generations, we implemented rigorous ML accuracy loops spanning document ingestion, query ranking, and chat inference:

1. Hybrid Reciprocal Rank Fusion (RRF)

Relying strictly on Vector Similarity or keyword parsing alone creates blind spots. IntelliFile merges three paradigms into a mathematical Rank Fusion score to guarantee maximum precision:

  • Semantic Search (FAISS): Translates paragraphs into 384-dimensional dense vectors. Captures intent, tone, and synonymous meanings (e.g. searching "fiscal reporting" easily scores a local "2024_tax_filing.pdf" file).
  • Keyword Match (SQLite FTS5 - BM25): Vector embeddings fail on distinct jargon, raw code snippets, and strict alphanumeric IDs. BM25 algorithm guarantees that exact phrasing constraints return 100% hits.
  • Path & Filename Tiers: Documents whose literal filenames strongly match user queries are algorithmically boosted in weight, as filename intent naturally denotes file content strongly.

2. SOTA Embedding Model Selection

  • We bypassed massive, slow API-dependent models (like text-embedding-3-large) for the local BAAI/bge-small-en-v1.5 architecture.
  • Why? It generates 384-dimensional embeddings (yielding a ~4x smaller RAM and Disk footprint) while ranking at the absolute top of the MTEB (Massive Text Embedding Benchmark) for semantic retrieval against models ten times its size.

3. Contextual Document Chunking (Sentence-Aware)

  • Splitting documents arbitrarily breaks thoughts mid-sentence, destroying LLM reasoning in the RAG loop.
  • IntelliFile uses aggressive sentence-aware chunking boundaries (e.g. limiting to ~512 tokens) with a minimum 50-token trailing overlap. This ensures ideas persist across vector borders implicitly. We also isolate structural chunks depending on the source format (e.g., handling row/column data in Excel distinctively from narrative paragraphs in Word documents).

4. Pure Isolated Chat Context (No Cross-Talk)

  • A major flaw in desktop search AIs is "Context Bleed," where asking a question about an NDA accidentally pulls context from an employee handbook.
  • Our custom chat_store.py orchestrates Isolated RAG Extraction. When you double click to chat with a file, its semantic chunks are strictly temporarily housed in an isolated memory buffer. The Qwen Models (1.5b or 3b) receive precise, fenced contexts restricted exclusively to the target file. We then dynamically clear the history state if file targets change, achieving absolute zero cross-document hallucination.

⚡ Performance Optimizations

Indexing an entire hard drive and executing 3-Billion-Parameter models locally necessitates heavy systems optimization. We implemented the following breakthroughs to sustain 60FPS UI and microsecond delays:

1. Hardware-Accelerated Quantized LLMs

  • Dynamic Context Windowing (n_ctx=4096): Standard LLM endpoints easily OOM (Out-of-Memory) on local rigs. We strictly tune Llama.cpp to limit maximum token context parsing memory natively, ensuring large parsed document slices can reliably fit in context without crashing the system pipeline.
  • Massive Batching Processing (n_batch=512): Prompt evaluation speed is massively accelerated.
  • GPU Matrix Offloading (n_gpu_layers = -1): Utilizing Llama-cpp Python binaries, we completely delegate tensor math to your local GPU (CUDA, Metal, Vulkan) whenever available.
  • Maximized CPU Thread Binding (n_threads = max_cores): On hardware lacking discrete GPUs, the engine programmatically harnesses maximum logical core parallelization to brute-force batched executions.
  • GGUF Quantization (q4_k_m, q5_k_m format): Shrinks model VRAM footprints by 70%. A standard 6GB+ FP16 model fits into leaner ~2.5GB sizes on disk and RAM while maintaining 98% perplexity parity.

2. Deep-Engine Indexing Acceleration

  • Pytorch to ONNX Backend Parsing: The Sentence-Transformers models natively fall back to onnxruntime. Compiling embedder graph operators to ONNX creates up to a 3x–5x CPU processing structural speed boost compared to raw PyTorch evaluation graphs.
  • FAISS vs Heavy DB Servers: Rather than installing bloated Vector databases (like Milvus, Qdrant, or Postgres pgvector plugin), vector computations are housed dynamically via C++ Map integrations natively inside FAISS.
  • Indexed File Caps & Exclusions: Deep outlier files (like a 40,000 line debug log) throttle indexing pipelines. Our core explicitly intercepts arbitrary limits (text length caps, chunk caps in extractor.py) and systematically ignores known bottleneck temp directories (node_modules, .git, system caches).
  • Parallel Extraction Worker Pools: Local indexing handles document parses across independent background CPU pools to prevent synchronous I/O blocking.
  • Incremental Diff Indexing: First runs utilize full-device mappings, but daily usage is processed incrementally. The crypto_utils and SQLite state manager simply hash modified_time paths. We strictly parse Delta mutations on files, compressing indexing scans to the multi-millisecond range.
  • SQLite WAL Mode & Pragma Tuning: Memory cache sizes, Synchronous mechanisms, and Write-Ahead Logging (WAL) pragmas are strictly instantiated in db.py to allow non-blocking concurrent reads (UI search) during massive background indexing writes.

🛠️ Architecture Stack

Frontend

  • Electron: Cross-platform native shell rendering.
  • React.js: Component-driven UI (File Explorer, Chat Sidebar, Search Dashboard).
  • Service IPC: Asynchronous communication channels to Python core.

Backend Core

  • Python 3.11+: Multi-threaded ingestion application runtime.
  • Llama-cpp Python: Sub-system binding bringing quantization capability.
  • Sentence-Transformers: Integrated context embedding module.
  • FAISS: N-Dimensional similarity search backbone.
  • SQLite3 / FTS5: Local associative metadata structuring and highly polished Keyword Indexing.

🚀 Getting Started

Option 1: End-User Installation (Windows .exe)

The easiest way to use IntelliFile is to build and install the standalone executable.

  1. Ensure you have Node.js 18+ and Python 3.11+ installed.
  2. Run the automated build script from PowerShell in the project root:
    .\build.ps1
  3. Once the build completes, find the installer in frontend\dist\IntelliFile Setup 1.0.0.exe.
  4. Run the installer. On first launch, the app will automatically present an Offline Setup screen to download the necessary AI models directly to your AppData folder.

Option 2: Developer Setup (Running from Source)

1. System Prerequisites

  • Python 3.11+
  • Node.js 18+
  • C++ Build Tools/Visual Studio Redistributable (Mandatory for compiling Local Models on Windows).

2. Setup the Data Backend Engine

cd backend
python -m venv .venv
# On Windows powershell:
.venv\Scripts\activate  

# Install Requirements
pip install -r requirements.txt
pip install onnxruntime optimum

Note: Make sure to install llama-cpp-python with the correct hardware accelerated flags based on your environment (cuBLAS / CUDA).

3. Prepare Sub-Models

You can manually run the offline setup script to fetch the required models (BAAI/bge-small-en-v1.5 and qwen2.5-3b-instruct-q5_k_m.gguf):

python backend/setup_offline.py

(In production, the Electron frontend automatically runs a frozen version of this script on first launch.)

4. Launch Frontend App

cd frontend
npm install
npm start

Click Index Device to construct your initial baseline embeddings database, then begin securely searching and chatting natively offline!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors