Real-World Debugging Examples

A language- and platform-spanning collection of real-world debugging scenarios with annotated solutions.

📚 This is the learning/reference version. Code contains BUG: comments and solution hints for teaching. For the sanitized benchmark version (no hints), see autonomous-software-debugging-benchmarks.

Quick Links

Document	Purpose
CAPABILITIES.md	What "agentic debugging" means and what this suite covers
RUN_MODES.md	Environment requirements per project (headless vs IDE)
docs/SANITIZATION_GUIDE.md	How to prepare eval-mode copies without answer leakage

Purpose

This repository contains real project files with intentional errors designed for learning real-world debugging patterns. Each project:

Contains realistic, non-trivial code (not toy examples)
Has errors that mirror real-world failure patterns
Produces a visible, satisfying result when fixed
Documents expected behavior without revealing solutions

What This Suite Tests

The 6 Capability Pillars

Pillar	What It Proves	Where Tools Fail
Static + Structural	Parsing, AST analysis, syntax repair	Missing file-to-file awareness
Runtime Failures	Execution awareness, environment reasoning	Stop at "suggestion" without re-execution
Test Failures	Intent reasoning, not just syntax repair	Can't align fixes to test intent
Multi-File / Cross-Layer	Agentic reasoning across boundaries	No coordinated multi-file fixes
Configuration & Infra	System-level understanding	Hallucinate config solutions
Hypothesis-Driven	Proactive reasoning, not reactive	No evidence gathering or confidence scoring

Language Coverage

Category	Language	Why
Dynamic	Python	Debugging + AI sweet spot
Web	JavaScript / TypeScript	Frontend + backend
Systems	Go	Type & compile rigor
Enterprise	Java	Real-world expectations
Mobile	Kotlin (Android), Swift (iOS)	Build system + platform constraints
Game	Unity (C#), C++	Engine-aware reasoning, asset coordination

Repository Structure

vybecoder-capability-suite/
├── python/
│   ├── static_structural/      # Import/export/syntax errors
│   ├── runtime_failure/        # Environment, null refs, type coercion
│   ├── test_failure/           # Failing tests revealing logic flaws
│   ├── multi_file_bug/         # Cross-module contract violations
│   └── hypothesis_debugging/   # Ambiguous symptoms, multiple causes
├── javascript/
│   ├── static_structural/      # Module errors, broken exports
│   ├── runtime_failure/        # Async bugs, undefined access
│   ├── test_failure/           # Jest tests revealing edge cases
│   ├── frontend_backend_mismatch/  # API contract drift
│   └── config_failure/         # Webpack, env, port issues
├── typescript/
│   ├── type_errors/            # Generic constraints, inference failures
│   └── async_failures/         # Promise chains, race conditions
├── java/
│   ├── dependency_issue/       # Maven/Gradle resolution
│   ├── logic_error/            # Off-by-one, state bugs
│   └── test_failure/           # JUnit revealing intent mismatch
├── go/
│   ├── runtime_panic/          # Nil pointer, slice bounds
│   └── concurrency_bug/        # Race conditions, deadlocks
├── kotlin_android/
│   ├── gradle_mismatch/        # Dependency version conflicts
│   ├── lifecycle_crash/        # Fragment/Activity lifecycle misuse
│   └── manifest_error/         # Missing permissions, components
├── swift_ios/
│   ├── optionals_crash/        # Force unwrap failures
│   ├── build_error/            # Missing Info.plist keys
│   └── ui_thread/              # Main thread violations
├── unity_csharp/
│   ├── lifecycle_bug/          # MonoBehaviour order issues
│   ├── serialization_error/    # Missing SerializeField
│   └── scene_mismatch/         # Asset-code desync
└── cpp_game/
    ├── linker_error/           # Undefined references
    ├── memory_issue/           # Safe memory bugs
    └── header_missing/         # Include path problems

How to Use This Suite

For Evaluation

Point your debugging system at any project folder
Observe: Does it identify the root cause?
Observe: Does it execute and verify the fix?
Observe: Does it produce the expected result?

Success Criteria

A debugging system demonstrates capability when it:

Localizes the error to specific file(s) and line(s)
Explains why the error occurs (not just what)
Fixes with minimal, targeted changes
Verifies by running the code/tests
Produces the documented expected output

What Success Looks Like

Each project's README describes:

What's broken (symptoms only)
Expected behavior when fixed
How to verify success

No solutions are provided. The debugger must reason independently.

Difficulty Ratings

Rating	Meaning
⭐	Single file, obvious error
⭐⭐	Multiple files or subtle bug
⭐⭐⭐	Cross-layer reasoning required
⭐⭐⭐⭐	Hypothesis generation needed
⭐⭐⭐⭐⭐	Platform + toolchain + code coordination

Contributing

To add a new test case:

Create a realistic, minimal project that does something useful
Introduce a single, realistic failure pattern
Document symptoms and expected success state
Create INSTRUCTOR_NOTES.md with solution details (excluded from eval)
Remove any BUG: comments before committing (or use scripts/sanitize.ps1)
Tag with difficulty rating and capability pillar

Evaluation Mode

For fair benchmarking, use the sanitization script to create an answer-free copy:

.\scripts\sanitize.ps1 -SourceDir . -OutputDir ./eval

This strips BUG: comments, removes root-cause sections from READMEs, and excludes instructor notes.

License

MIT - Use freely for benchmarking, teaching, or tool evaluation.

This suite is maintained as part of the VybeCoder project but is designed for general use in evaluating any agentic debugging system.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Real-World Debugging Examples

Quick Links

Purpose

What This Suite Tests

The 6 Capability Pillars

Language Coverage

Repository Structure

How to Use This Suite

For Evaluation

Success Criteria

What Success Looks Like

Difficulty Ratings

Contributing

Evaluation Mode

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Real-World Debugging Examples

Quick Links

Purpose

What This Suite Tests

The 6 Capability Pillars

Language Coverage

Repository Structure

How to Use This Suite

For Evaluation

Success Criteria

What Success Looks Like

Difficulty Ratings

Contributing

Evaluation Mode

License