Epistemic Question
Core Question: How do we measure the true productivity impact of AI coding agents, distinguishing between perceived speed and actual delivered value?
Why This Matters
Agile Agentic Analytics documents the "verification bottleneck and illusion of speed" failure mode: controlled evidence shows AI tools can slow experienced developers in familiar codebases because time shifts into prompting, waiting, and verification—yet developers may still believe they are faster.
This creates a measurement paradox: traditional velocity metrics become less meaningful when agents can spike output, but what should replace them?
Related Framework
The Agile Agentic Analytics page recommends a balanced KPI set:
- DORA metrics (deployment frequency, lead time, change failure rate)
- Agent productivity metrics (cycle time, PR merge rate, thrash rate)
- Quality metrics (escaped defects, flaky tests, maintainability)
- Hallucination/plausibility-failure rate
- Cost and latency (tokens, CI minutes)
But the question remains: How do we know if we're measuring the right things? How do we prevent gaming these metrics?
Open Questions to Explore
- Counterfactual measurement: How do we establish a reliable baseline for "what would have happened without agents"?
- Value vs. volume: Should we measure lines of code generated, or business value delivered? If the latter, how?
- Learning curve effects: How long until teams reach "steady state" productivity, and how do we account for the transition period?
- Maintenance burden: How do we capture the hidden cost of reviewing/debugging AI-generated code that may increase work for senior developers?
- Long-term code health: Do current velocity gains come at the expense of technical debt accumulation?
Potential Research Directions
Success Criteria for Answering This Question
We will know we've made progress when we can:
- Define 3-5 leading indicators that predict sustainable productivity (not just short-term throughput)
- Establish measurement protocols that work across different team skill levels and codebases
- Distinguish between "tool-amplified productivity" and "tool-induced toil"
- Provide decision rules: "When should we increase agent autonomy?" vs. "When should we pull back?"
Cross-References
Epistemic Question
Core Question: How do we measure the true productivity impact of AI coding agents, distinguishing between perceived speed and actual delivered value?
Why This Matters
Agile Agentic Analytics documents the "verification bottleneck and illusion of speed" failure mode: controlled evidence shows AI tools can slow experienced developers in familiar codebases because time shifts into prompting, waiting, and verification—yet developers may still believe they are faster.
This creates a measurement paradox: traditional velocity metrics become less meaningful when agents can spike output, but what should replace them?
Related Framework
The Agile Agentic Analytics page recommends a balanced KPI set:
But the question remains: How do we know if we're measuring the right things? How do we prevent gaming these metrics?
Open Questions to Explore
Potential Research Directions
Success Criteria for Answering This Question
We will know we've made progress when we can:
Cross-References