SmartSearch — Fault-Aware Asynchronous Ingestion + Semantic Retrieval

🚀 SmartSearch

Fault-Aware Async Ingestion + Semantic Retrieval Backend

A production-style backend that ingests documents asynchronously, generates embeddings, and serves semantic search and RAG — with explicit guarantees for idempotency, crash recovery, and job lifecycle correctness.

🎯 What this project is

This is not just another RAG demo.

It is a correctness-first ingestion and retrieval system designed to answer:

What happens when things fail?

What if the worker crashes mid-processing?
What if Kafka replays messages?
What if the database goes down?
What if duplicate requests arrive?

This system is built to handle those scenarios deterministically.

🧠 Core Design Philosophy

Correctness over convenience
Explicit state over hidden progress
Failure-aware design
Deterministic recovery

🏗️ Architecture Overview

End-to-end architecture: API → Kafka → Async Workers → pgvector → Vector Search

Components

API Service
- Accepts ingestion requests
- Produces messages to Kafka
- Exposes search + RAG endpoints
Kafka
- Decouples ingestion from processing
- Enables retry and replay
Worker
- Consumes messages
- Generates embeddings
- Writes to Postgres
Postgres + pgvector
- Stores embeddings
- Enables similarity search

📈 System Evolution (KV-Engine Style)

🟢 Stage 0 — Synchronous ingestion (baseline) API directly:

generates embeddings
stores in DB

❌ Problems:

slow
no retry
no failure recovery

🟡 Stage 1 — Asynchronous ingestion

API produces message to Kafka
Worker processes asynchronously

✔️ Benefits:

decoupling
retry support
resilience

🟠 Stage 2 — Job lifecycle state machine

Each request has explicit states: PENDING -> PROCESSING -> READY | FAILED

✔️ Guarantees:

no hidden progress
observable system state
debuggable failures

🔵 Stage 3 — Idempotent ingestion

Duplicate messages (Kafka replay / retries) are safe
Enforced via:
- unique constraints (e.g., docId + chunkId)
- deterministic writes

✔️ Guarantees:

no duplicate embeddings
safe reprocessing

🔴 Stage 4 — Failure handling + retries

Worker retries failed jobs (bounded attempts)
After retry exhaustion:
- job marked FAILED
- message sent to DLQ

✔️ Guarantees:

no infinite retry loops
bad data is isolated

🟣 Stage 5 — Observability

System exposes:

/api/system/pressure
- pending jobs
- processing jobs
- failed jobs
- ready jobs
logs:
- job transitions
- end-to-end latency

✔️ Guarantees:

system is observable
failures are visible

✅ System Guarantees

Correctness

At-least-once ingestion (via Kafka)
Idempotent processing (no duplicate chunks)
Deterministic job lifecycle

Failure Safety

Worker crash does not corrupt state
Partial processing is recoverable
Retries are safe

Data Integrity

No duplicate embeddings
No partial chunk visibility
Consistent DB state after recovery

🔒 System Invariants

Lifecycle invariant: requestId transitions monotonically: PENDING → PROCESSING → READY | FAILED (no backward transitions)
Idempotency invariant: Reprocessing the same request does not change final DB state.
Visibility invariant: A document is searchable iff state == READY.
Failure isolation invariant: A FAILED job does not corrupt other documents.
Recovery invariant: After crash/restart, system state = last committed DB state + replay-safe Kafka processing.

⚠️ Failure Matrix

Failure Scenario	Expected Behavior
Worker crash mid-processing	Job is retried and completes successfully
Worker crash after DB write	Reprocessing occurs but duplicates are prevented
Kafka broker restart	Processing resumes with no data loss
Postgres outage	Worker retries; job eventually READY or FAILED
Poison message	Retries exhausted → FAILED + DLQ
Duplicate request	No duplicate embeddings created

Observability & Monitoring

SmartSearch includes a full observability stack using Prometheus and Grafana to monitor system behavior, performance, and reliability.

Metrics

The system exposes metrics via Spring Boot Actuator and Micrometer:

HTTP request rate and latency
Ingestion pipeline metrics:
- received, succeeded, failed, retries, DLQ
Processing age (staleness metric):
- Measures how long a job waits before being processed
- Helps detect backlog and async system pressure
Database metrics:
- Hikari connection pool (active, idle, total)
- DB write throughput

Dashboards

1. Pipeline Overview

Requests/sec
Ingest success/failure/retry
Latency and DLQ

2. Kafka & Worker

Success vs failure rates
Retry behavior
Worker health
Processing age (key signal for backlog detection)

3. DB & Persistence

Connection pool usage
DB latency signals
Write throughput

Local Setup

docker compose up -d

Access:

Grafana: http://localhost:3000
Prometheus: http://localhost:9090
📸 Screenshot (VERY IMPORTANT)

👉 Dashboard 2 (Kafka & Worker)

Save as:

docs/dashboard-kafka-worker.png


## 🧪 Failure Proof (Reproducible Tests)
**T1 — Crash mid-processing**
- Submit document
- Kill worker during processing

✔️ Expected:
- Job resumes
- No duplicate chunks

**T2 — Crash after DB write**
- Kill worker after write but before commit acknowledgment

✔️ Expected:
- Reprocessing occurs
- No duplicates (idempotency holds)

**T3 — Kafka restart**
- Stop Kafka during ingestion
- Restart Kafka

✔️ Expected:
- Worker resumes
- No message loss

**T4 — Database outage**
- Stop Postgres
- Submit job
- Restart DB

✔️ Expected:
- Worker retries
- Job becomes READY or FAILED

**T5 — Poison message**
- Submit malformed document

✔️ Expected:
- Retries attempted
- Job marked FAILED
- Message sent to DLQ

## 🔍 API Endpoints (example)
**Ingestion**
```bash
POST /api/documents

Search

GET /api/search?q=...

RAG

POST /api/rag

System Pressure

GET /api/system/pressure

🚀 Why this matters

Most AI systems focus on:

embeddings
LLMs
retrieval quality

This project focuses on: What happens when the system breaks. That’s what differentiates:

demos -> production systems
prototypes -> infrastructure

⚖️ Design Tradeoffs

At-least-once vs Exactly-once: Chose at-least-once + idempotency for simplicity and robustness.
Kafka vs Direct Processing: Kafka adds operational complexity but enables replay and durability.
Async vs Sync ingestion: Async improves resilience and scalability but introduces eventual consistency.
Postgres + pgvector vs Dedicated Vector DB: Simpler stack and transactional guarantees, but not optimized for very large-scale vector search.

🧭 Future Work

Exactly-once semantics (Kafka transactions)
Distributed workers + partition-aware scaling
Backpressure-aware scheduling
Streaming ingestion
BFT-style replicated ingestion pipeline (long-term vision)

🔗 Positioning

This project sits at the intersection of:

Distributed systems
AI infrastructure
Fault-tolerant pipelines It is designed as a foundation layer for reliable AI systems.

💡 Author Note

This system reflects a broader focus on:

fault tolerance
correctness guarantees
distributed system design

Similar to how storage engines ensure durability and consistency, this system ensures reliable AI data pipelines under failure.

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
grafana		grafana
prometheus		prometheus
.mvn/wrapper		.mvn/wrapper
docs		docs
prometheus		prometheus
scripts		scripts
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
application.yml		application.yml
docker-compose.yml		docker-compose.yml
mvnw		mvnw
mvnw.cmd		mvnw.cmd
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SmartSearch — Fault-Aware Asynchronous Ingestion + Semantic Retrieval

🎯 What this project is

What happens when things fail?

🧠 Core Design Philosophy

🏗️ Architecture Overview

📈 System Evolution (KV-Engine Style)

✅ System Guarantees

🔒 System Invariants

⚠️ Failure Matrix

Observability & Monitoring

Metrics

Dashboards

1. Pipeline Overview

2. Kafka & Worker

3. DB & Persistence

Local Setup

📸 Screenshot (VERY IMPORTANT)

🚀 Why this matters

⚖️ Design Tradeoffs

🧭 Future Work

🔗 Positioning

💡 Author Note

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SmartSearch — Fault-Aware Asynchronous Ingestion + Semantic Retrieval

🎯 What this project is

What happens when things fail?

🧠 Core Design Philosophy

🏗️ Architecture Overview

📈 System Evolution (KV-Engine Style)

✅ System Guarantees

🔒 System Invariants

⚠️ Failure Matrix

Observability & Monitoring

Metrics

Dashboards

1. Pipeline Overview

2. Kafka & Worker

3. DB & Persistence

Local Setup

📸 Screenshot (VERY IMPORTANT)

🚀 Why this matters

⚖️ Design Tradeoffs

🧭 Future Work

🔗 Positioning

💡 Author Note

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages