Integrate agent transcripts into the DAG (Phase C)#99
Conversation
Agent (sidechain) entries are now part of the DAG instead of being
partitioned out and reordered heuristically. _integrate_agent_entries()
parents agent roots to their anchor tool_result and assigns synthetic
session IDs ({sessionId}#agent-{agentId}) so they form separate
DAG-lines spliced at the correct junction point.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
📝 WalkthroughWalkthroughIntegrates agent/sidechain transcript entries into the DAG by adding Changes
Sequence DiagramsequenceDiagram
participant Directory as Directory Scanner
participant Converter as Converter
participant Integrator as _integrate_agent_entries()
participant DAGBuilder as DAG Builder
participant Renderer as Renderer
Directory->>Converter: load_directory_transcripts()
Converter->>Converter: scan & collect all messages
Converter->>Integrator: _integrate_agent_entries(all_messages)
Integrator->>Integrator: build agentId → anchor UUID map
Integrator->>Integrator: set parentUuid on agent roots
Integrator->>Integrator: assign synthetic sessionIds ("#agent-")
Integrator-->>Converter: modified messages (in-place)
Converter->>DAGBuilder: build_dag_from_entries(all_messages, sidechain_uuids)
DAGBuilder->>DAGBuilder: construct DAG including agent child sessions
DAGBuilder-->>Converter: DAG structure
Converter->>Renderer: generate HTML/navigation
Renderer->>Renderer: filter out agent sessions from nav
Renderer->>Renderer: use effective parent session for agent messages
Renderer-->>Converter: rendered content grouped under parent sessions
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
test/test_dag_integration.py (1)
741-986: Add one branch-scoped agent regression.These cases only exercise agents anchored on ordinary session trunks. A case where the anchor lives inside a within-session fork would catch regressions where agent messages get regrouped under the base session instead of the branch pseudo-session.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/test_dag_integration.py` around lines 741 - 986, Add a new test (e.g., test_agent_branch_scoped_session) that creates a within-session fork and places an agent anchor inside one branch to ensure agent entries are attached to the branch pseudo-session rather than the base session: use _make_user_entry and _make_assistant_entry to build a main trunk (u1→a1), then create two divergent branch entries both referencing a1 (branch anchor u2_branch and a continuation a2_branch) and include agent sidechain entries with agent_id tied to u2_branch; write them with _write_jsonl and call load_directory_transcripts, then assert the synthetic agent session appears in tree.sessions with parent_session_id equal to the specific branch session id and attachment_uuid equal to the branch anchor's uuid (using tree.sessions, parent_session_id, and attachment_uuid) and that agent entries are ordered between the branch anchor and its continuation in the returned result.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@claude_code_log/renderer.py`:
- Around line 2025-2031: The code incorrectly derives agent parent scope by
slicing msg_session_id (msg_session_id.split("#agent-") ...) which loses branch
context; instead, use the session-tree/explicit render parent information (e.g.
prefer message.render_session_id or call the SessionTree method that resolves
the parent/anchor for a session) to compute agent_parent_session so branch
pseudo-sessions are preserved; update occurrences around
msg_session_id/agent_parent_session (and any logic that feeds
_reorder_session_template_messages) to stop splitting on "#agent-" and use the
SessionTree API or message.render_session_id to determine the true parent
session.
---
Nitpick comments:
In `@test/test_dag_integration.py`:
- Around line 741-986: Add a new test (e.g., test_agent_branch_scoped_session)
that creates a within-session fork and places an agent anchor inside one branch
to ensure agent entries are attached to the branch pseudo-session rather than
the base session: use _make_user_entry and _make_assistant_entry to build a main
trunk (u1→a1), then create two divergent branch entries both referencing a1
(branch anchor u2_branch and a continuation a2_branch) and include agent
sidechain entries with agent_id tied to u2_branch; write them with _write_jsonl
and call load_directory_transcripts, then assert the synthetic agent session
appears in tree.sessions with parent_session_id equal to the specific branch
session id and attachment_uuid equal to the branch anchor's uuid (using
tree.sessions, parent_session_id, and attachment_uuid) and that agent entries
are ordered between the branch anchor and its continuation in the returned
result.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a1c21610-c91a-40de-856a-11a147b04f98
📒 Files selected for processing (6)
claude_code_log/converter.pyclaude_code_log/dag.pyclaude_code_log/renderer.pydev-docs/dag.mdtest/test_dag_integration.pywork/phase-c-agent-transcripts.md
Use session hierarchy to determine the correct parent session for agent messages (may be a branch pseudo-session when the anchor is inside a within-session fork). Fallback to extracting from synthetic ID. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extract `is_agent_session()` helper to `utils.py` and use it to filter agent sessions from: session metadata building, cache population, pagination session collection, and message grouping for pages. Agent messages in pagination are grouped under their parent session. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
claude_code_log/converter.py (1)
1431-1437:⚠️ Potential issue | 🟠 MajorFold agent entries into the parent session cache instead of skipping them.
These messages are grouped back under the parent session later when rendering/paginating, so dropping
#agent-...sessions here makes the cachedmessage_countand token totals too low for any session with agents. That skews page assignment and session/page stats away from what the UI actually renders. Coalesce synthetic agent sessions back to the parent session ID here, and mirror the same normalization in_build_session_data_from_messages().💡 Suggested fix
- session_id = getattr(message, "sessionId", "") - if not session_id or is_agent_session(session_id): + session_id = getattr(message, "sessionId", "") + if not session_id: continue + if is_agent_session(session_id): + session_id = session_id.split("#agent-")[0]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@claude_code_log/converter.py` around lines 1431 - 1437, The current branch skips any message whose sessionId is an agent session (is_agent_session(session_id)), which drops those messages from the session cache and undercounts message_count/tokens; instead, when processing session-level data in this block (and similarly inside _build_session_data_from_messages()), detect agent sessions and remap/coalesce them into their parent session id (e.g., derive parent_id from session_id rather than continue), then proceed to accumulate counts/tokens under that parent_id; update both this message-processing branch (where message and SummaryTranscriptEntry are checked) and the _build_session_data_from_messages() normalization so agent sessions are folded into the parent session consistently.
🧹 Nitpick comments (1)
test/test_dag_integration.py (1)
738-1072: Add a regression for session-metadata/pagination coalescing.These tests cover DAG/tree/rendering well, but the PR also changes cache/session aggregation and paginated grouping. A focused assertion around
_update_cache_with_session_data()or paginated HTML generation would catch cases where agent messages render inline but are omitted from the parent session’smessage_countor token totals.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/test_dag_integration.py` around lines 738 - 1072, Add a regression test that exercises _update_cache_with_session_data (via load_directory_transcripts) to ensure agent sidechain entries are included in the parent session's pagination/aggregation metadata: create a main session with an agent anchor and sidechain agent_entries (like existing tests), call load_directory_transcripts(tmp_path, silent=True) then call or verify the cache update path (_update_cache_with_session_data) ran, and assert that session_tree.sessions["s1"] (or the parent session object) has message_count and total token-related fields updated to include the agent messages (e.g., message_count increased by agent entries, total_tokens or similar totals reflect agent content) so paginated HTML generation will include agent messages in parent session aggregates.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@claude_code_log/converter.py`:
- Around line 1431-1437: The current branch skips any message whose sessionId is
an agent session (is_agent_session(session_id)), which drops those messages from
the session cache and undercounts message_count/tokens; instead, when processing
session-level data in this block (and similarly inside
_build_session_data_from_messages()), detect agent sessions and remap/coalesce
them into their parent session id (e.g., derive parent_id from session_id rather
than continue), then proceed to accumulate counts/tokens under that parent_id;
update both this message-processing branch (where message and
SummaryTranscriptEntry are checked) and the _build_session_data_from_messages()
normalization so agent sessions are folded into the parent session consistently.
---
Nitpick comments:
In `@test/test_dag_integration.py`:
- Around line 738-1072: Add a regression test that exercises
_update_cache_with_session_data (via load_directory_transcripts) to ensure agent
sidechain entries are included in the parent session's pagination/aggregation
metadata: create a main session with an agent anchor and sidechain agent_entries
(like existing tests), call load_directory_transcripts(tmp_path, silent=True)
then call or verify the cache update path (_update_cache_with_session_data) ran,
and assert that session_tree.sessions["s1"] (or the parent session object) has
message_count and total token-related fields updated to include the agent
messages (e.g., message_count increased by agent entries, total_tokens or
similar totals reflect agent content) so paginated HTML generation will include
agent messages in parent session aggregates.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3db4e5f1-dedf-4f81-9ffa-473893d3cb36
📒 Files selected for processing (4)
claude_code_log/converter.pyclaude_code_log/renderer.pyclaude_code_log/utils.pytest/test_dag_integration.py
Instead of dropping agent messages from session metadata, remap their synthetic session IDs to the parent session using get_parent_session_id(). This ensures message counts, token totals, and timestamps in pagination and cache include agent contributions. Add regression test verifying agent messages fold into parent session's message_count and token totals via _build_session_data_from_messages(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@claude_code_log/converter.py`:
- Around line 1649-1658: The fallback path still groups by raw sessionId and
thus leaks synthetic agent sessions; update _collect_project_sessions() (or the
process_projects_hierarchy() fallback call) to exclude any session_id where
is_agent_session(session_id) is true and to skip warmup_session_ids, mirroring
the filtering used earlier (session_id not in warmup_session_ids and not
is_agent_session(session_id)); ensure the grouping step uses the filtered
session_ids variable so pseudo-sessions like "#agent-..." are never emitted as
standalone project-index sessions.
- Around line 416-456: The agent anchoring logic in _integrate_agent_entries
only collects anchors from non-sidechain entries so nested agents (whose anchor
is in a sidechain) remain unanchored; update the agent_anchors build to include
any entry with an agentId (i.e., check msg.agentId regardless of
msg.isSidechain) and when multiple candidates exist prefer a non-sidechain
anchor over a sidechain one (so preserve an existing non-sidechain anchor and
only set/override when the current candidate is a better choice), keeping
references to BaseTranscriptEntry, agentId, isSidechain, agent_anchors,
parentUuid, and sessionId to locate the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 047fec33-895e-4754-b83c-c8d84df4dae8
📒 Files selected for processing (4)
claude_code_log/converter.pyclaude_code_log/renderer.pyclaude_code_log/utils.pytest/test_dag_integration.py
| def _integrate_agent_entries(messages: list[TranscriptEntry]) -> None: | ||
| """Parent agent entries and assign synthetic session IDs. | ||
|
|
||
| Agent (sidechain) entries share sessionId with their parent session | ||
| but form separate conversation threads. This function: | ||
|
|
||
| 1. Builds a map of agentId -> anchor UUID (the main-session User entry | ||
| whose agentId matches, i.e. the tool_result that references the agent) | ||
| 2. For each agent's root entry (parentUuid=None, isSidechain=True), | ||
| sets parentUuid to the anchor UUID | ||
| 3. Assigns a synthetic sessionId ("{sessionId}#agent-{agentId}") to all | ||
| agent entries so they form separate DAG-lines | ||
|
|
||
| Mutates entries in place (Pydantic v2 models are mutable by default). | ||
| """ | ||
| # Build agentId -> anchor UUID map from main-session entries | ||
| agent_anchors: dict[str, str] = {} | ||
| for msg in messages: | ||
| if not isinstance(msg, BaseTranscriptEntry): | ||
| continue | ||
| if msg.isSidechain: | ||
| continue | ||
| # Main-session entries with agentId reference an agent transcript | ||
| if msg.agentId: | ||
| agent_anchors[msg.agentId] = msg.uuid | ||
|
|
||
| if not agent_anchors: | ||
| return | ||
|
|
||
| # Process sidechain entries: parent roots and assign synthetic sessionIds | ||
| for msg in messages: | ||
| if not isinstance(msg, BaseTranscriptEntry): | ||
| continue | ||
| if not msg.isSidechain or not msg.agentId: | ||
| continue | ||
| agent_id = msg.agentId | ||
| # Assign synthetic session ID to separate from main session | ||
| msg.sessionId = f"{msg.sessionId}#agent-{agent_id}" | ||
| # Parent the root entry to the anchor | ||
| if msg.parentUuid is None and agent_id in agent_anchors: | ||
| msg.parentUuid = agent_anchors[agent_id] |
There was a problem hiding this comment.
Nested agent transcripts still won't get anchored.
load_transcript() already recurses into agent files, but agent_anchors is built only from non-sidechain entries. If an agent launches another agent, that inner anchor lives in a sidechain entry, so the nested root keeps parentUuid=None and becomes a separate root instead of attaching under the outer agent branch.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@claude_code_log/converter.py` around lines 416 - 456, The agent anchoring
logic in _integrate_agent_entries only collects anchors from non-sidechain
entries so nested agents (whose anchor is in a sidechain) remain unanchored;
update the agent_anchors build to include any entry with an agentId (i.e., check
msg.agentId regardless of msg.isSidechain) and when multiple candidates exist
prefer a non-sidechain anchor over a sidechain one (so preserve an existing
non-sidechain anchor and only set/override when the current candidate is a
better choice), keeping references to BaseTranscriptEntry, agentId, isSidechain,
agent_anchors, parentUuid, and sessionId to locate the change.
| # Find all unique session IDs (excluding warmup and agent sessions) | ||
| session_ids: set[str] = set() | ||
| for message in messages: | ||
| if hasattr(message, "sessionId"): | ||
| session_id: str = getattr(message, "sessionId") | ||
| if session_id and session_id not in warmup_session_ids: | ||
| if ( | ||
| session_id | ||
| and session_id not in warmup_session_ids | ||
| and not is_agent_session(session_id) | ||
| ): |
There was a problem hiding this comment.
The no-cache project-summary fallback still leaks synthetic agent sessions.
Filtering agent sessions out here fixes individual session files, but process_projects_hierarchy() falls back to _collect_project_sessions() when cached project data is missing, and that helper still groups by raw sessionId. In that path, #agent-... pseudo-sessions will still show up as standalone project-index sessions, which contradicts the inline-only behavior.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@claude_code_log/converter.py` around lines 1649 - 1658, The fallback path
still groups by raw sessionId and thus leaks synthetic agent sessions; update
_collect_project_sessions() (or the process_projects_hierarchy() fallback call)
to exclude any session_id where is_agent_session(session_id) is true and to skip
warmup_session_ids, mirroring the filtering used earlier (session_id not in
warmup_session_ids and not is_agent_session(session_id)); ensure the grouping
step uses the filtered session_ids variable so pseudo-sessions like "#agent-..."
are never emitted as standalone project-index sessions.
Summary
_integrate_agent_entries()inconverter.pyparents agent roots to their anchortool_resultand assigns synthetic session IDs ({sessionId}#agent-{agentId}) to form separate DAG-linesDetails
How it works
agentId(from entry-level ortoolUseResult.agentId) become anchorsparentUuid=null) get theirparentUuidset to the anchor's UUIDsessionId(s1#agent-xyz) separates agents from their parent session in the DAGWhat's kept (conservative approach)
_reorder_sidechain_template_messages()— now effectively a no-op for integrated agents, kept as fallback_cleanup_sidechain_duplicates()— still needed for Task tool input/output dedupsidechain_uuidsinbuild_dag()— still needed for unloaded subagent files (e.g. aprompt_suggestion)Key constraint
entry.sessionIdon disk / in cache is never mutated. Synthetic IDs are only assigned in-memory duringload_directory_transcripts().Test plan
TestAgentDagIntegrationtests passtest_load_directory_with_sidechainsfor new DAG orderingty checkandruff checkclean-src-experiments-claude_p)🤖 Generated with Claude Code
Summary by CodeRabbit
Refactoring
Documentation
Tests