Infinite Context

Experimental context extension for local LLMs via hierarchical retrieval.

Status: Early-Stage Research Prototype / Under Active Development

This project explores extending LLM effective context using the Hierarchical Attention Tree (HAT) for retrieval-augmented memory. Current benchmarks measure retrieval recall on synthetic data — not end-to-end task accuracy. The "100% retrieval" figures below mean HAT finds the right chunks in controlled tests, not that the LLM produces correct answers 100% of the time. Real-world performance depends on many factors (query quality, chunk boundaries, model capability) that have not been rigorously evaluated.

This is a research prototype, not production-ready software. Use at your own risk. Rigorous benchmarking is in progress.

Try It NOW (Pick Your Favorite)

Zero Install - Just Run

# Docker (one command, works everywhere)
docker run -it --rm --network host andrewmang/infinite-context

# Or with docker-compose for full stack
curl -O https://raw.githubusercontent.com/Lumi-node/infinite-context/main/docker-compose.yml
docker-compose up -d

Live Demo (No Install At All)

Try it on Hugging Face Spaces - See HAT in action right in your browser!

One-Line Installer

# Linux/macOS - installs everything automatically
curl -sSL https://raw.githubusercontent.com/Lumi-node/infinite-context/main/install.sh | bash

Install from Source

# Clone the repo
git clone https://github.com/Lumi-node/infinite-context
cd infinite-context

# Install Python package (recommended - full HAT support)
pip install maturin sentence-transformers
maturin develop --release

# Or build Rust CLI (benchmarks only)
cargo build --release

Retrieval Benchmarks (Synthetic Data)

Model	Native Context	Addressable via HAT	Extension (retrieval only)
gemma3:1b	8K	11.3M+	1,413x
phi4	16K	11.3M+	706x
llama3.2	8K	11.3M+	1,413x

These figures represent the amount of stored text HAT can search through — not that the model "understands" all 11M tokens simultaneously. Retrieved chunks are injected into the model's native context window. End-to-end task accuracy (does the model answer correctly?) has not been formally benchmarked.

The Problem

Local models like Gemma 3 (8K) and Phi 4 (16K) are powerful — but they forget everything outside their tiny context window. RAG systems try to help but deliver ~70% accuracy at best, losing critical information.

The Solution

Hierarchical Attention Tree (HAT) — exploits the natural hierarchy of conversations:

Instead of searching all chunks O(n), HAT does O(log n) beam search through the hierarchy — achieving high retrieval recall in synthetic benchmarks. Real-world accuracy depends on data structure, embedding quality, and query characteristics.

Detailed Setup

Docker Usage

# Pull and run immediately
docker run -it --rm --network host andrewmang/infinite-context

# Run benchmark
docker run -it --rm andrewmang/infinite-context infinite-context bench --chunks 100000

# Full stack with Ollama
docker-compose up -d
docker-compose exec infinite-context infinite-context chat --model gemma3:1b

Python API (Recommended - Full HAT Support)

The Python API uses real embeddings + HAT retrieval + Ollama. Note: This is experimental research software, not a production-ready system.

# From the repo (after cloning)
pip install maturin sentence-transformers
maturin develop --release

from infinite_context import InfiniteContext

# Initialize - connects to Ollama
ctx = InfiniteContext(model="gemma3:1b")

# Add information (automatically embedded with sentence-transformers and indexed in HAT)
ctx.add("My name is Alex and I work on quantum computing.")
ctx.add("The latest experiment showed 47% improvement in coherence.")

# Chat - HAT retrieves relevant context, injects it into prompt, queries Ollama
response = ctx.chat("What were the quantum experiment results?")
print(response)  # References the 47% improvement

# Save memory to disk
ctx.save("my_memory.hat")

# Load later
ctx = InfiniteContext.load("my_memory.hat", model="gemma3:1b")

Low-Level API

from infinite_context import HatIndex
from sentence_transformers import SentenceTransformer

# Setup
embedder = SentenceTransformer('all-MiniLM-L6-v2')
index = HatIndex.cosine(384)

# Add embeddings
embedding = embedder.encode("Important info", normalize_embeddings=True)
index.add(embedding.tolist())

# Query
query_emb = embedder.encode("What's important?", normalize_embeddings=True)
results = index.near(query_emb.tolist(), k=10)

# Persist
index.save("index.hat")

Rust CLI (Benchmarks & Testing)

The Rust CLI is useful for benchmarking HAT performance and testing Ollama connectivity.

Note: For actual chat with HAT memory retrieval, use the Python API above.

# Build the CLI
cargo build --release

# Run HAT performance benchmark
./target/release/infinite-context bench --chunks 100000

# Test Ollama connection
./target/release/infinite-context test --model gemma3:1b

# List available models
./target/release/infinite-context models

System Requirements

Rust: 1.70+ (for CLI)
Python: 3.9+ (for Python API)
Ollama: Any version
RAM: 4GB minimum

Building from Source

git clone https://github.com/Lumi-node/infinite-context
cd infinite-context

# Rust CLI
cargo build --release
./target/release/infinite-context --help

# Python wheel
pip install maturin
maturin develop --release

Why This Exists

We're exploring whether local, hierarchical retrieval can meaningfully extend context for small LLMs — without sending data to cloud APIs.

Design goals:

Local: Runs on your hardware, data stays on your machine
Free: No API costs
Fast retrieval: Sub-millisecond HAT queries in synthetic benchmarks
High retrieval recall: 100% on synthetic hierarchical test data (real-world accuracy not yet validated)

Note: This is a research project exploring an idea, not a finished product. The retrieval layer works well in controlled tests, but end-to-end quality (does the LLM actually give better answers?) needs rigorous evaluation. We are actively working on this.

Research

Based on the Hierarchical Attention Tree (HAT) algorithm. Key hypothesis: conversations naturally form hierarchies (sessions → documents → chunks), and exploiting this structure may enable O(log n) retrieval with high recall. Validating this hypothesis rigorously is ongoing work.

License

MIT

Get Started in 10 Seconds

Method	Command	Notes
Docker	`docker run -it --rm --network host andrewmang/infinite-context`	Full setup
Browser	Hugging Face Spaces	Try HAT live
Source	`git clone ... && maturin develop --release`	Python API (recommended)

An experiment in local, hierarchical AI memory. Contributions and feedback welcome.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
huggingface-space		huggingface-space
python/infinite_context		python/infinite_context
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
INSTALL.md		INSTALL.md
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
install.sh		install.sh
pyproject.toml		pyproject.toml
realistic_demo_20260112_144744.json		realistic_demo_20260112_144744.json
wow_demo_results_20260112_144007.json		wow_demo_results_20260112_144007.json
wow_demo_v2_20260112_144122.json		wow_demo_v2_20260112_144122.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Infinite Context

Try It NOW (Pick Your Favorite)

Zero Install - Just Run

Live Demo (No Install At All)

One-Line Installer

Install from Source

Retrieval Benchmarks (Synthetic Data)

The Problem

The Solution

Detailed Setup

Docker Usage

Python API (Recommended - Full HAT Support)

Low-Level API

Rust CLI (Benchmarks & Testing)

System Requirements

Building from Source

Why This Exists

Research

License

Get Started in 10 Seconds

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Infinite Context

Try It NOW (Pick Your Favorite)

Zero Install - Just Run

Live Demo (No Install At All)

One-Line Installer

Install from Source

Retrieval Benchmarks (Synthetic Data)

The Problem

The Solution

Detailed Setup

Docker Usage

Python API (Recommended - Full HAT Support)

Low-Level API

Rust CLI (Benchmarks & Testing)

System Requirements

Building from Source

Why This Exists

Research

License

Get Started in 10 Seconds

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages