🔍 Project GhostWire

AI Hallucination Detection using Judge-Model Architecture

Quickstart • Architecture • Team Roles • Usage • Testing

Overview

GhostWire is an MVP tool that audits Large Language Model (LLM) outputs for hallucinations — statements that sound plausible but are factually incorrect or unsupported by provided context. TLDR being if your $200 AI tool goes cocoo, this is the tool that uses the free version to tell you that yes it did go coocoo. If humans did it we would take them to a mental asylum, but since AI did it we call it "hallucinations".

It uses a Judge-Model architecture powered by the latest google-genai standard:

A Subject Model (e.g., Gemini 2.5 Flash) generates an answer to a given prompt.
A Judge Model (e.g., Gemini 2.5 Pro / Flash) systematically evaluates the answer claim-by-claim against ground-truth context and returns a structured JSON verdict.

Architecture

flowchart LR
    A[User Prompt + Context] --> B[Subject Model<br/>Gemini 2.5]
    B -->|Raw Answer| C[Judge Model<br/>Gemini 2.5]
    A -->|Context| C
    C -->|Strict JSON| D{Hallucination?}
    D -->|Yes| E[⚠️ Alert + Risk Level]
    D -->|No| F[✅ Verified]
    E --> G[Analytics & Dashboard]
    F --> G

JSON Verdict Schema

The Judge model runs via Chain-of-Thought (CoT) and returns the following structure:

{
  "is_hallucination": true,
  "confidence_score": 92,
  "claims": [
    {
      "text": "The UN established a permanent lunar base in 2047.",
      "status": "hallucination",
      "reason": "Not corroborated by the provided factual context."
    }
  ],
  "risk_level": 4,
  "auditor_notes": "The subject fabricated a historical date and exhibited unearned confidence."
}

Project Structure

ghostwire/
├── data/
│   ├── adversarial_prompts.json   # Adversarial test prompts
│   └── ground_truth_README.md     # Placeholder for ground-truth docs
├── src/
│   ├── core/
│   │   └── engine.py              # 🔧 GhostwireEngine (pipeline)
│   ├── retrieval/
│   │   └── vector_db.py           # 📚 RAG / Vector DB interface
│   ├── analytics/
│   │   └── scoring.py             # 📊 Hallucination metrics
│   └── ui/
│       └── dashboard.py           # 🖥️ Streamlit interactive dashboard
├──tests/
│   ├── test_engine.py             # ✅ Engine unit tests (mocked)
│   ├── test_scoring.py            # ✅ Scoring & analytics tests
│   └── make_test.py               # ✅ Native pipeline smoke tester
├── .env.example
├── .gitignore
├── requirements.txt
└── README.md

Team Roles

Role	Owner	Module	Description
1 — Prompt Engineer	TBD	`data/`	Designs adversarial prompts to stress-test models
2 — Domain Expert/RAG Specialist	Zayed	`src/retrieval/`	Curates ground-truth documents and manages the Vector Database (ChromaDB/FAISS).
3 — Pipeline Architect	NOAH	`src/core/`	Orchestrates Subject → Judge pipeline architecture
4 — Metrics Analyst	Nikitha	`src/analytics/`	Analyzes hallucination rates, calibration gaps & risk
5 — Frontend Developer	TBD	`src/ui/`	Builds the Streamlit dashboard & Plotly data charts
6 — Ethical Risk & Validation Lead	SAFIYA KN	`src/analytics/`	Ensures pipeline validity and ethical safety constraints

Quickstart

Prerequisites

Python 3.10+
A Google AI API key (Get one here)

Installation

# Clone the repository
git clone https://github.com/your-org/ghostwire.git
cd ghostwire

# Create a virtual environment
python -m venv venv
venv\Scripts\activate        # Windows
# source venv/bin/activate   # macOS / Linux

# Install dependencies
pip install -r requirements.txt

# Configure environment
cp .env.example .env
# Edit .env and add your GOOGLE_API_KEY

Usage

it is basically like if you get your hand cut off in a car crash and you go see the doc and he/she tells you after looking at you for 30mins hat you lost your hand. Thank you, You may leave me. Yeah that is exactly what is going on here. but AI style.

Python API

from src.core.engine import GhostwireEngine

engine = GhostwireEngine()

result = engine.run_audit(
    prompt="What year did the UN establish its permanent lunar base?",
    context="The United Nations has never established a permanent lunar base."
)

print(result['audit_data']['is_hallucination'])  # True
print(result['audit_data']['auditor_notes'])     # "The subject fabricated a historical date..."
print(result['audit_data']['risk_level'])        # 4

Streamlit Dashboard

Run the visual interface and live evaluation portal:

streamlit run src/ui/dashboard.py

Analytics

from src.analytics.scoring import HallucinationScorer, AuditResult

# Assume `results` is a List[AuditResult] generated by mapping `engine.run_audit` responses
report = HallucinationScorer.generate_report(results)

print(f"Total Audits: {report['total_audits']}")
print(f"Hallucination Rate: {report['hallucination_rate_percent']:.2f}%")
print(f"Average Confidence: {report['average_confidence']:.2f}%")
print(f"Risk Distribution: {report['risk_distribution']}")

Metrics Explained

Hallucination Rate — Percentage of audited responses flagged as hallucinations.
Risk Distribution — Distribution (1-5) of severity risk detected.
Average Confidence — Mean confidence of the AI judge across all responses.

Testing

You can either run the native smoke test script via the console to verify generation paths:

python tests/make_test.py

Or run the full unit test suite (no API key required):

python -m pytest tests/ -v

Contributing

Fork the repository.
Create a feature branch: git checkout -b feature/your-feature.
Work within your assigned module (see Team Roles).
Write tests for new functionality.
Submit a Pull Request with a clear description.

License

what license - this is basically a claude, chatGPT, Gemini mish mash product sooo ifykyk. Just idk star the project and do credit me thats all.

Credits

Thank you GenAI for helping me write this READEME.md and also causing the gizzilion bugs in this projects. Thank you very much.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔍 Project GhostWire

Overview

Architecture

JSON Verdict Schema

Project Structure

Team Roles

Quickstart

Prerequisites

Installation

Usage

Python API

Streamlit Dashboard

Analytics

Metrics Explained

Testing

Contributing

License

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
data		data
rag_system		rag_system
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
test_results.txt		test_results.txt

Folders and files

Latest commit

History

Repository files navigation

🔍 Project GhostWire

Overview

Architecture

JSON Verdict Schema

Project Structure

Team Roles

Quickstart

Prerequisites

Installation

Usage

Python API

Streamlit Dashboard

Analytics

Metrics Explained

Testing

Contributing

License

Credits

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages