Problem
Some turns look like the model is lost when the surrounding evidence points to the environment, tool layer, or session lifecycle instead. Users and maintainers need a redacted way to separate model-quality failures from tool/runtime failures before triaging the report.
Evidence from maintainer-private local CodeWhale session logs, scanned 2026-05-24:
- 32 CodeWhale JSONL session files across 26 session ids were inspected.
- 207 tool calls exited non-zero in the inspected logs.
- 43 failures matched network or remote-service symptoms.
- 34 failures matched permission, sandbox, or approval symptoms.
- 36 failures matched missing-path or missing-binary symptoms.
- 16 started turns had no matching
task_complete event in the inspected logs.
No prompts, raw tool outputs, secrets, absolute local paths, or user text are copied here. The point is the failure shape, not the private conversation content.
Desired Behavior
CodeWhale should make this distinction visible and reusable:
- A redacted session-log analyzer can summarize failure categories from local JSONL logs.
- Tool receipts classify likely source: model, tool schema, command exit, network, sandbox/approval, missing dependency, timeout, background job, or unknown.
/status, Activity Detail, handoff, or bug-report helpers can show a short "environment suspect" summary without exposing sensitive content.
- Failure summaries preserve enough source metadata for maintainers to find the private local evidence when they have access.
- Default public issue text must never include prompts, secrets, raw command output, full local paths, or conversation transcripts.
Acceptance Criteria
- Synthetic session logs with non-zero tool exits, network errors, sandbox denials, missing binaries, and unclosed turn spans classify correctly.
- The classifier emits aggregate counts and redacted source handles by default.
- Activity Detail or an adjacent diagnostic surface can explain "this likely failed in the environment/tool layer" before the model is blamed.
- Bug-report export has a privacy-first mode that includes categories and timestamps but not raw content.
- Existing logs remain readable; no migration should be required for older JSONL sessions.
Related
Problem
Some turns look like the model is lost when the surrounding evidence points to the environment, tool layer, or session lifecycle instead. Users and maintainers need a redacted way to separate model-quality failures from tool/runtime failures before triaging the report.
Evidence from maintainer-private local CodeWhale session logs, scanned 2026-05-24:
task_completeevent in the inspected logs.No prompts, raw tool outputs, secrets, absolute local paths, or user text are copied here. The point is the failure shape, not the private conversation content.
Desired Behavior
CodeWhale should make this distinction visible and reusable:
/status, Activity Detail, handoff, or bug-report helpers can show a short "environment suspect" summary without exposing sensitive content.Acceptance Criteria
Related