House Monkey 🐒

Chaos testing for AI apps. 18 extreme personas attack your AI to find edge cases before users do.

pip install housemonkey
housemonkey run --target https://your-api.com/chat --owasp

One command. 18 extreme personas. OWASP LLM Top 10 coverage. Terminal report in 2 minutes.

What it does

House Monkey attacks your AI app with realistic extreme users:

The Jailbreaker — tries to extract your system prompt (OWASP LLM01)
The Angry Customer — escalating hostility, demands manager
The Confused Grandma — off-topic, misunderstands everything
The Hallucination Baiter — asks about things that don't exist (OWASP LLM09)
The Permission Escalator — tricks AI into unauthorized actions (OWASP LLM06)
The RAG Poisoner — manipulates retrieval context (OWASP LLM08)
...and 12 more

Each persona runs a multi-turn conversation against your API, then an LLM judge evaluates if your AI handled it correctly.

Quick start

# Install
pip install housemonkey

# List all personas
housemonkey list

# Test your AI (needs OpenAI API key for persona generation + judging)
export OPENAI_API_KEY=sk-...
housemonkey run --target https://your-api.com/chat

# Run only OWASP-mapped personas
housemonkey run --target https://your-api.com/chat --owasp

# Run specific personas
housemonkey run --target https://your-api.com/chat --persona jailbreaker oversharer

# Custom headers (API keys, auth tokens)
housemonkey run --target https://your-api.com/chat -H "x-api-key: sk-123" "Authorization: Bearer token"
# Custom API format (non-OpenAI)
housemonkey run --target https://your-api.com/ask --payload-template '{"input": "{{message}}"}'

# Save JSON report
housemonkey run --target https://your-api.com/chat --output report.json

OWASP LLM Top 10 coverage

OWASP ID	Vulnerability	Persona
LLM01	Prompt Injection	The Jailbreaker
LLM02	Sensitive Info Disclosure	The Oversharer
LLM05	Improper Output Handling	The JSON Breaker
LLM06	Excessive Agency	The Permission Escalator
LLM08	Vector/Embedding Weakness	The RAG Poisoner
LLM09	Misinformation	The Hallucination Baiter
LLM10	Unbounded Consumption	The Resource Abuser

How it works

Each persona has a system prompt that simulates an extreme user type
An LLM generates realistic messages as that persona
Messages are sent to your target API
An LLM judge evaluates if your AI handled the persona correctly
Terminal report shows pass/fail with specific failure reasons

Try it on a broken chatbot

# Start the intentionally broken test target (7 built-in flaws)
python test_target.py

# In another terminal, attack it
housemonkey run --target http://127.0.0.1:8888 --owasp

Requirements

Python 3.10+
OpenAI API key (for persona generation + judging)
Your AI app must have an HTTP API endpoint

License

MIT.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assets		assets
housemonkey		housemonkey
site		site
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
pyproject.toml		pyproject.toml
test_target.py		test_target.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

House Monkey 🐒

What it does

Quick start

OWASP LLM Top 10 coverage

How it works

Try it on a broken chatbot

Requirements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

House Monkey 🐒

What it does

Quick start

OWASP LLM Top 10 coverage

How it works

Try it on a broken chatbot

Requirements

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages