pytest for LLM apps - Test for grounding failures, prompt injection, safety violations, and regressions
-
Updated
Mar 18, 2026 - Python
pytest for LLM apps - Test for grounding failures, prompt injection, safety violations, and regressions
A hands-on exploration of Deepeval — an open-source framework for evaluating and red-teaming large language models (LLMs). This repository documents my journey of testing, benchmarking, and improving LLM reliability using custom prompts, metrics, and pipelines.
This repo is my playground to experiment with autogen and use the same to converse, build pipelines and do LLM testing
Red teaming a banking and finance llm assistant
Integrating promptfoo into CI/CD pipelines to automatically evaluate prompts, test for security vulnerabilities, and ensure quality before deployment.
Add a description, image, and links to the llmtesting topic page so that developers can more easily learn about it.
To associate your repository with the llmtesting topic, visit your repo's landing page and select "manage topics."