Skip to content

Latest commit

 

History

History
157 lines (121 loc) · 6.47 KB

File metadata and controls

157 lines (121 loc) · 6.47 KB

Real-World Debugging Examples

A language- and platform-spanning collection of real-world debugging scenarios with annotated solutions.

📚 This is the learning/reference version. Code contains BUG: comments and solution hints for teaching. For the sanitized benchmark version (no hints), see autonomous-software-debugging-benchmarks.

Quick Links

Document Purpose
CAPABILITIES.md What "agentic debugging" means and what this suite covers
RUN_MODES.md Environment requirements per project (headless vs IDE)
docs/SANITIZATION_GUIDE.md How to prepare eval-mode copies without answer leakage

Purpose

This repository contains real project files with intentional errors designed for learning real-world debugging patterns. Each project:

  • Contains realistic, non-trivial code (not toy examples)
  • Has errors that mirror real-world failure patterns
  • Produces a visible, satisfying result when fixed
  • Documents expected behavior without revealing solutions

What This Suite Tests

The 6 Capability Pillars

Pillar What It Proves Where Tools Fail
Static + Structural Parsing, AST analysis, syntax repair Missing file-to-file awareness
Runtime Failures Execution awareness, environment reasoning Stop at "suggestion" without re-execution
Test Failures Intent reasoning, not just syntax repair Can't align fixes to test intent
Multi-File / Cross-Layer Agentic reasoning across boundaries No coordinated multi-file fixes
Configuration & Infra System-level understanding Hallucinate config solutions
Hypothesis-Driven Proactive reasoning, not reactive No evidence gathering or confidence scoring

Language Coverage

Category Language Why
Dynamic Python Debugging + AI sweet spot
Web JavaScript / TypeScript Frontend + backend
Systems Go Type & compile rigor
Enterprise Java Real-world expectations
Mobile Kotlin (Android), Swift (iOS) Build system + platform constraints
Game Unity (C#), C++ Engine-aware reasoning, asset coordination

Repository Structure

vybecoder-capability-suite/
├── python/
│   ├── static_structural/      # Import/export/syntax errors
│   ├── runtime_failure/        # Environment, null refs, type coercion
│   ├── test_failure/           # Failing tests revealing logic flaws
│   ├── multi_file_bug/         # Cross-module contract violations
│   └── hypothesis_debugging/   # Ambiguous symptoms, multiple causes
├── javascript/
│   ├── static_structural/      # Module errors, broken exports
│   ├── runtime_failure/        # Async bugs, undefined access
│   ├── test_failure/           # Jest tests revealing edge cases
│   ├── frontend_backend_mismatch/  # API contract drift
│   └── config_failure/         # Webpack, env, port issues
├── typescript/
│   ├── type_errors/            # Generic constraints, inference failures
│   └── async_failures/         # Promise chains, race conditions
├── java/
│   ├── dependency_issue/       # Maven/Gradle resolution
│   ├── logic_error/            # Off-by-one, state bugs
│   └── test_failure/           # JUnit revealing intent mismatch
├── go/
│   ├── runtime_panic/          # Nil pointer, slice bounds
│   └── concurrency_bug/        # Race conditions, deadlocks
├── kotlin_android/
│   ├── gradle_mismatch/        # Dependency version conflicts
│   ├── lifecycle_crash/        # Fragment/Activity lifecycle misuse
│   └── manifest_error/         # Missing permissions, components
├── swift_ios/
│   ├── optionals_crash/        # Force unwrap failures
│   ├── build_error/            # Missing Info.plist keys
│   └── ui_thread/              # Main thread violations
├── unity_csharp/
│   ├── lifecycle_bug/          # MonoBehaviour order issues
│   ├── serialization_error/    # Missing SerializeField
│   └── scene_mismatch/         # Asset-code desync
└── cpp_game/
    ├── linker_error/           # Undefined references
    ├── memory_issue/           # Safe memory bugs
    └── header_missing/         # Include path problems

How to Use This Suite

For Evaluation

  1. Point your debugging system at any project folder
  2. Observe: Does it identify the root cause?
  3. Observe: Does it execute and verify the fix?
  4. Observe: Does it produce the expected result?

Success Criteria

A debugging system demonstrates capability when it:

  • Localizes the error to specific file(s) and line(s)
  • Explains why the error occurs (not just what)
  • Fixes with minimal, targeted changes
  • Verifies by running the code/tests
  • Produces the documented expected output

What Success Looks Like

Each project's README describes:

  • What's broken (symptoms only)
  • Expected behavior when fixed
  • How to verify success

No solutions are provided. The debugger must reason independently.

Difficulty Ratings

Rating Meaning
Single file, obvious error
⭐⭐ Multiple files or subtle bug
⭐⭐⭐ Cross-layer reasoning required
⭐⭐⭐⭐ Hypothesis generation needed
⭐⭐⭐⭐⭐ Platform + toolchain + code coordination

Contributing

To add a new test case:

  1. Create a realistic, minimal project that does something useful
  2. Introduce a single, realistic failure pattern
  3. Document symptoms and expected success state
  4. Create INSTRUCTOR_NOTES.md with solution details (excluded from eval)
  5. Remove any BUG: comments before committing (or use scripts/sanitize.ps1)
  6. Tag with difficulty rating and capability pillar

Evaluation Mode

For fair benchmarking, use the sanitization script to create an answer-free copy:

.\scripts\sanitize.ps1 -SourceDir . -OutputDir ./eval

This strips BUG: comments, removes root-cause sections from READMEs, and excludes instructor notes.

License

MIT - Use freely for benchmarking, teaching, or tool evaluation.


This suite is maintained as part of the VybeCoder project but is designed for general use in evaluating any agentic debugging system.