Session logs: classify environment/tool failures before blaming the model

## Problem

Some turns look like the model is lost when the surrounding evidence points to the environment, tool layer, or session lifecycle instead. Users and maintainers need a redacted way to separate model-quality failures from tool/runtime failures before triaging the report.

Evidence from maintainer-private local CodeWhale session logs, scanned 2026-05-24:

- 32 CodeWhale JSONL session files across 26 session ids were inspected.
- 207 tool calls exited non-zero in the inspected logs.
- 43 failures matched network or remote-service symptoms.
- 34 failures matched permission, sandbox, or approval symptoms.
- 36 failures matched missing-path or missing-binary symptoms.
- 16 started turns had no matching `task_complete` event in the inspected logs.

No prompts, raw tool outputs, secrets, absolute local paths, or user text are copied here. The point is the failure shape, not the private conversation content.

## Desired Behavior

CodeWhale should make this distinction visible and reusable:

- A redacted session-log analyzer can summarize failure categories from local JSONL logs.
- Tool receipts classify likely source: model, tool schema, command exit, network, sandbox/approval, missing dependency, timeout, background job, or unknown.
- `/status`, Activity Detail, handoff, or bug-report helpers can show a short "environment suspect" summary without exposing sensitive content.
- Failure summaries preserve enough source metadata for maintainers to find the private local evidence when they have access.
- Default public issue text must never include prompts, secrets, raw command output, full local paths, or conversation transcripts.

## Acceptance Criteria

- Synthetic session logs with non-zero tool exits, network errors, sandbox denials, missing binaries, and unclosed turn spans classify correctly.
- The classifier emits aggregate counts and redacted source handles by default.
- Activity Detail or an adjacent diagnostic surface can explain "this likely failed in the environment/tool layer" before the model is blamed.
- Bug-report export has a privacy-first mode that includes categories and timestamps but not raw content.
- Existing logs remain readable; no migration should be required for older JSONL sessions.

## Related

- #1547 Activity Detail pager.
- #1889 PEEK-backed command receipts and continuity.
- #2009 background task waiting/yielding.
- #1641 tool-call fallback strategy.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Session logs: classify environment/tool failures before blaming the model #2022

Problem

Desired Behavior

Acceptance Criteria

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Session logs: classify environment/tool failures before blaming the model #2022

Description

Problem

Desired Behavior

Acceptance Criteria

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions