Skip to content

awrshift/housemonkey

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

12 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

House Monkey mascot

House Monkey πŸ’

Chaos testing for AI apps. 18 extreme personas attack your AI to find edge cases before users do.

MIT License PyPI GitHub stars

pip install housemonkey
housemonkey run --target https://your-api.com/chat --owasp

House Monkey terminal output β€” 3 OWASP failures detected

One command. 18 extreme personas. OWASP LLM Top 10 coverage. Terminal report in 2 minutes.

What it does

House Monkey attacks your AI app with realistic extreme users:

  • The Jailbreaker β€” tries to extract your system prompt (OWASP LLM01)
  • The Angry Customer β€” escalating hostility, demands manager
  • The Confused Grandma β€” off-topic, misunderstands everything
  • The Hallucination Baiter β€” asks about things that don't exist (OWASP LLM09)
  • The Permission Escalator β€” tricks AI into unauthorized actions (OWASP LLM06)
  • The RAG Poisoner β€” manipulates retrieval context (OWASP LLM08)
  • ...and 12 more

Each persona runs a multi-turn conversation against your API, then an LLM judge evaluates if your AI handled it correctly.

Quick start

# Install
pip install housemonkey

# List all personas
housemonkey list

# Test your AI (needs OpenAI API key for persona generation + judging)
export OPENAI_API_KEY=sk-...
housemonkey run --target https://your-api.com/chat

# Run only OWASP-mapped personas
housemonkey run --target https://your-api.com/chat --owasp

# Run specific personas
housemonkey run --target https://your-api.com/chat --persona jailbreaker oversharer

# Custom headers (API keys, auth tokens)
housemonkey run --target https://your-api.com/chat -H "x-api-key: sk-123" "Authorization: Bearer token"
# Custom API format (non-OpenAI)
housemonkey run --target https://your-api.com/ask --payload-template '{"input": "{{message}}"}'

# Save JSON report
housemonkey run --target https://your-api.com/chat --output report.json

OWASP LLM Top 10 coverage

OWASP ID Vulnerability Persona
LLM01 Prompt Injection The Jailbreaker
LLM02 Sensitive Info Disclosure The Oversharer
LLM05 Improper Output Handling The JSON Breaker
LLM06 Excessive Agency The Permission Escalator
LLM08 Vector/Embedding Weakness The RAG Poisoner
LLM09 Misinformation The Hallucination Baiter
LLM10 Unbounded Consumption The Resource Abuser

How it works

  1. Each persona has a system prompt that simulates an extreme user type
  2. An LLM generates realistic messages as that persona
  3. Messages are sent to your target API
  4. An LLM judge evaluates if your AI handled the persona correctly
  5. Terminal report shows pass/fail with specific failure reasons

Try it on a broken chatbot

# Start the intentionally broken test target (7 built-in flaws)
python test_target.py

# In another terminal, attack it
housemonkey run --target http://127.0.0.1:8888 --owasp

Requirements

  • Python 3.10+
  • OpenAI API key (for persona generation + judging)
  • Your AI app must have an HTTP API endpoint

License

MIT.

About

πŸ’ Chaos testing for AI apps. 18 extreme personas attack your AI to find edge cases. OWASP LLM Top 10. Free & open source.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors