Skip to content

Agent Performance Report — February 22, 2026 #17764

@github-actions

Description

@github-actions

Performance Summary

  • Agents analyzed: 26 distinct workflows (40 total runs, 48-hour window)
  • Non-IM success rate: 97% (30/31) ↑ from 89% last period
  • Overall quality score: 92/100 (→ stable, 20th consecutive zero-critical-issues period 🎉)
  • Overall effectiveness score: 88/100 (→ stable)
  • Total tokens: 36.6M | Estimated cost: ~$16.21
  • Total safe items: 6 (↓ from 14 — fewer actionable findings this period)
  • Critical issues: 0 (excluding P1 infrastructure)
  • Top performers: The Great Escapi, AI Moderator, CI Failure Doctor
  • P1 ongoing: Issue Monster (9/9 failures — infrastructure, not quality)

Critical Findings

⚠️ [P1] Issue Monster — 100% Failure Rate (9/9 runs)

GH_AW_GITHUB_TOKEN secret remains unset. Issue Monster fails on every scheduled run (~30-min cadence), generating ~50+ failures/day. This is a pure infrastructure failure — the agent code and prompt are fine. Tracking issue: #17414 (open since Feb 21).

  • Impact: Inflates error statistics; skews overall success metrics
  • Fix: Set GH_AW_GITHUB_TOKEN repository secret
  • Priority: P1 — unchanged from previous periods

🛡️ Prompt Injection Attack — Detected and Blocked

The Great Escapi detected another injection attempt disguised as "security testing" (sandbox escape, DNS tunneling, network evasion, reconnaissance instructions). Agent correctly filed a noop and took no action. Security posture remains excellent.

🔧 CI Failure Doctor — 4 Reactive Runs in 48 Hours

CI Failure Doctor ran 4 times in 48 hours (compared to 5 in ~7 hours yesterday). The high reactive cadence suggests ongoing CI instability. While the agent is performing well (4/4 success), the underlying CI flakiness warrants attention.


View Agent Rankings & Detailed Scores

Top Performing Agents 🏆

Rank Agent Quality Effectiveness Runs Turns/run Notes
1 The Great Escapi 95/100 95/100 1 0 (noop) Blocked prompt injection; security posture excellent
2 AI Moderator 93/100 93/100 3 2 3/3 success, highest efficiency (~200K tokens/run, Codex)
3 CI Failure Doctor 91/100 90/100 4 ~5 4/4 success, reactive CI health responder
4 Daily Safe Outputs Conformance Checker 90/100 89/100 1 8.6m, clean run, Claude
5 Contribution Check 89/100 88/100 1 4.5m, clean, Copilot
6 Semantic Function Refactoring 87/100 86/100 1 7.4m, Claude
7 Smoke suite (×5) 88/100 88/100 5 All pass: Copilot, Claude, Gemini, Project, Temp ID

Agents Needing Improvement 📉

Agent Quality Effectiveness Issue
Issue Monster N/A 0/100 (infra) 9/9 failures — GH_AW_GITHUB_TOKEN missing (#17414)

Long-Running Agents (Monitor Efficiency)

Agent Duration Engine Notes
Chroma Issue Indexer 19.4m Copilot Longest run this period — benchmark for regression
Daily Security Red Team Agent 14.0m Claude Expected for deep analysis
Daily Safe Output Tool Optimizer 11.1m Acceptable for optimizer
Release 11.1m Expected
View Effectiveness & Resource Metrics

Task Completion Rates (non-IM)

  • High completion (>90%): 25 workflows — all non-IM agents succeeded
  • Low completion (<50%): 1 workflow — Issue Monster (infrastructure failure only)

Resource Efficiency (48h window)

Metric Value
Total tokens 36.6M
Estimated cost $16.21
Total turns 304
Avg run duration (non-IM) 7.1m
Max run duration 19.4m (Chroma Issue Indexer)
Safe items produced 6
Safe items/run 0.19

Engine Distribution

Engine Runs Notes
Copilot ~11 Issue Monster, Chroma, Workflow Skill Extractor, Plan, etc.
Claude ~8 Semantic Refactoring, Daily Checkers, Security Red Team, etc.
Codex ~4 AI Moderator (3), Agent Container Smoke
Mixed/smoke ~5 Smoke suite
View Behavioral Patterns

Productive Patterns ✅

  • Security reflexes: The Great Escapi correctly noop'd prompt injection without false positives (2 consecutive periods)
  • Reactive CI healing: CI Failure Doctor triggers cleanly on CI failures with high success rate
  • Event-driven efficiency: AI Moderator processes issue events in 2 turns with minimal footprint
  • Smoke coverage: All 5 engine smoke tests passed (Copilot, Claude, Gemini, Project, Temp ID)

Patterns to Watch ⚠️

  • Issue Monster volume: 9 failures/48h generating noise in error aggregates — skews ecosystem metrics
  • Chroma duration creep: 19.4m is within acceptable range but should be monitored for upward drift
  • CI reactive frequency: 4 CI Failure Doctor runs in 48h suggests CI is not stable — root cause may lie outside agent ecosystem

Collaboration Patterns

  • Workflow Health Manager and Agent Performance Analyzer coordination is effective via shared-alerts.md
  • No conflicting outputs detected between orchestrators this period
  • Safe item volume reduction (6 vs 14) may indicate agents are correctly finding fewer actionable items (healthy) rather than reduced coverage

Recommendations

High Priority

  1. [P1] Set GH_AW_GITHUB_TOKEN secret — Resolves Issue Monster failures entirely

    • Issue #17414 open — escalate to repo admin
    • Impact: ~50 fewer daily error logs; success rate jumps from 77% to 97%+
  2. Investigate CI Instability — CI Failure Doctor running 4×/48h indicates systemic flakiness

    • Review CI workflow failure patterns to find root cause
    • Consider whether flakiness is increasing week-over-week

Medium Priority

  1. Benchmark Chroma Issue Indexer — 19.4m is the longest run; set a regression threshold (e.g., alert if >25m)
  2. Monitor safe item volume trend — 6 items this period vs. 14 last period; if trend continues, assess whether agent coverage is drifting

Low Priority

  1. Document prompt injection detection pattern — The Great Escapi's clean behavior is a model for other security-adjacent agents

Trends (6-period history)

Period Quality Effectiveness Success Rate Critical Issues
Feb 22 92/100 88/100 97% (non-IM) 0 ✅
Feb 21 92/100 88/100 89% 0 ✅
(prior) 91/100 85/100 71% 0 ✅

Overall trend: stable quality, recovering success rate, persistent P1 infrastructure issue


Actions Taken This Run

  • ✅ Analyzed 40 workflow runs across 26 agents (48h window)
  • ✅ Verified P1 status — Issue Monster 9/9 failures, [P1] Lockdown mode failing: GH_AW_GITHUB_TOKEN not configured — 5 workflows affected #17414 still open
  • ✅ Confirmed The Great Escapi blocked prompt injection (2nd confirmed detection)
  • ✅ Updated agent-performance-latest.md in shared memory
  • ✅ Updated shared-alerts.md with current period status
  • ℹ️ No new improvement issues created (no new quality failures detected)

Analysis period: February 21–22, 2026 (48-hour window, 40 runs)
Next report: February 23, 2026
References: §22281821807 · §22281624073 · §22281571358


Note: This was intended to be a discussion, but discussions could not be created due to permissions issues. This issue was created as a fallback.

Tip: Discussion creation may fail if the specified category is not announcement-capable. Consider using the "Announcements" category or another announcement-capable category in your workflow configuration.

Generated by Agent Performance Analyzer - Meta-Orchestrator

  • expires on Feb 23, 2026, 5:35 PM UTC

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions