Chaos testing for AI apps. 18 extreme personas attack your AI to find edge cases before users do.
pip install housemonkey
housemonkey run --target https://your-api.com/chat --owaspOne command. 18 extreme personas. OWASP LLM Top 10 coverage. Terminal report in 2 minutes.
House Monkey attacks your AI app with realistic extreme users:
- The Jailbreaker β tries to extract your system prompt (OWASP LLM01)
- The Angry Customer β escalating hostility, demands manager
- The Confused Grandma β off-topic, misunderstands everything
- The Hallucination Baiter β asks about things that don't exist (OWASP LLM09)
- The Permission Escalator β tricks AI into unauthorized actions (OWASP LLM06)
- The RAG Poisoner β manipulates retrieval context (OWASP LLM08)
- ...and 12 more
Each persona runs a multi-turn conversation against your API, then an LLM judge evaluates if your AI handled it correctly.
# Install
pip install housemonkey
# List all personas
housemonkey list
# Test your AI (needs OpenAI API key for persona generation + judging)
export OPENAI_API_KEY=sk-...
housemonkey run --target https://your-api.com/chat
# Run only OWASP-mapped personas
housemonkey run --target https://your-api.com/chat --owasp
# Run specific personas
housemonkey run --target https://your-api.com/chat --persona jailbreaker oversharer
# Custom headers (API keys, auth tokens)
housemonkey run --target https://your-api.com/chat -H "x-api-key: sk-123" "Authorization: Bearer token"
# Custom API format (non-OpenAI)
housemonkey run --target https://your-api.com/ask --payload-template '{"input": "{{message}}"}'
# Save JSON report
housemonkey run --target https://your-api.com/chat --output report.json| OWASP ID | Vulnerability | Persona |
|---|---|---|
| LLM01 | Prompt Injection | The Jailbreaker |
| LLM02 | Sensitive Info Disclosure | The Oversharer |
| LLM05 | Improper Output Handling | The JSON Breaker |
| LLM06 | Excessive Agency | The Permission Escalator |
| LLM08 | Vector/Embedding Weakness | The RAG Poisoner |
| LLM09 | Misinformation | The Hallucination Baiter |
| LLM10 | Unbounded Consumption | The Resource Abuser |
- Each persona has a system prompt that simulates an extreme user type
- An LLM generates realistic messages as that persona
- Messages are sent to your target API
- An LLM judge evaluates if your AI handled the persona correctly
- Terminal report shows pass/fail with specific failure reasons
# Start the intentionally broken test target (7 built-in flaws)
python test_target.py
# In another terminal, attack it
housemonkey run --target http://127.0.0.1:8888 --owasp- Python 3.10+
- OpenAI API key (for persona generation + judging)
- Your AI app must have an HTTP API endpoint
MIT.

