Skip to content

Epistemic Question: How do we measure the true productivity impact of AI coding agents? #1

@weisberg

Description

@weisberg

Epistemic Question

Core Question: How do we measure the true productivity impact of AI coding agents, distinguishing between perceived speed and actual delivered value?

Why This Matters

Agile Agentic Analytics documents the "verification bottleneck and illusion of speed" failure mode: controlled evidence shows AI tools can slow experienced developers in familiar codebases because time shifts into prompting, waiting, and verification—yet developers may still believe they are faster.

This creates a measurement paradox: traditional velocity metrics become less meaningful when agents can spike output, but what should replace them?

Related Framework

The Agile Agentic Analytics page recommends a balanced KPI set:

  • DORA metrics (deployment frequency, lead time, change failure rate)
  • Agent productivity metrics (cycle time, PR merge rate, thrash rate)
  • Quality metrics (escaped defects, flaky tests, maintainability)
  • Hallucination/plausibility-failure rate
  • Cost and latency (tokens, CI minutes)

But the question remains: How do we know if we're measuring the right things? How do we prevent gaming these metrics?

Open Questions to Explore

  1. Counterfactual measurement: How do we establish a reliable baseline for "what would have happened without agents"?
  2. Value vs. volume: Should we measure lines of code generated, or business value delivered? If the latter, how?
  3. Learning curve effects: How long until teams reach "steady state" productivity, and how do we account for the transition period?
  4. Maintenance burden: How do we capture the hidden cost of reviewing/debugging AI-generated code that may increase work for senior developers?
  5. Long-term code health: Do current velocity gains come at the expense of technical debt accumulation?

Potential Research Directions

  • Survey existing empirical studies on AI coding assistant productivity (academic + industry)
  • Design A/B test framework for team-level productivity measurement
  • Develop "productivity toxicity" indicators (high velocity but declining quality)
  • Create rubric for "maintenance burden ratio" (time spent fixing AI outputs vs. manual coding baseline)
  • Explore causal inference methods for attribution (what % of productivity change is genuinely due to AI?)

Success Criteria for Answering This Question

We will know we've made progress when we can:

  1. Define 3-5 leading indicators that predict sustainable productivity (not just short-term throughput)
  2. Establish measurement protocols that work across different team skill levels and codebases
  3. Distinguish between "tool-amplified productivity" and "tool-induced toil"
  4. Provide decision rules: "When should we increase agent autonomy?" vs. "When should we pull back?"

Cross-References

Metadata

Metadata

Assignees

No one assigned

    Labels

    agile-agenticRelated to Agile Agentic Analytics and Scrum for AI teamsepistemic-questionDeep questions that guide knowledge base exploration and researchresearch-neededRequires further research or literature review

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions