Skip to content

Integrate agent transcripts into the DAG (Phase C)#99

Open
cboos wants to merge 4 commits intodev/dagfrom
dev/dag-agents
Open

Integrate agent transcripts into the DAG (Phase C)#99
cboos wants to merge 4 commits intodev/dagfrom
dev/dag-agents

Conversation

@cboos
Copy link
Collaborator

@cboos cboos commented Mar 8, 2026

Summary

  • Agent (sidechain) entries are now included in the DAG instead of being partitioned out and heuristically reordered
  • New _integrate_agent_entries() in converter.py parents agent roots to their anchor tool_result and assigns synthetic session IDs ({sessionId}#agent-{agentId}) to form separate DAG-lines
  • Agent sessions appear inline (no separate session headers, no individual session files, no nav entries) — visual output unchanged
  • 4 new integration tests covering agent parenting, session tree structure, header suppression, and multi-agent ordering

Details

How it works

  1. Anchor discovery: Main-session entries with agentId (from entry-level or toolUseResult.agentId) become anchors
  2. Parenting: Agent root entries (parentUuid=null) get their parentUuid set to the anchor's UUID
  3. Session separation: Synthetic sessionId (s1#agent-xyz) separates agents from their parent session in the DAG
  4. Traversal: Existing DAG machinery (build_session_tree, traverse_session_tree) naturally handles agent sessions as child sessions at junction points

What's kept (conservative approach)

  • _reorder_sidechain_template_messages() — now effectively a no-op for integrated agents, kept as fallback
  • _cleanup_sidechain_duplicates() — still needed for Task tool input/output dedup
  • sidechain_uuids in build_dag() — still needed for unloaded subagent files (e.g. aprompt_suggestion)

Key constraint

entry.sessionId on disk / in cache is never mutated. Synthetic IDs are only assigned in-memory during load_directory_transcripts().

Test plan

  • 4 new TestAgentDagIntegration tests pass
  • Updated test_load_directory_with_sidechains for new DAG ordering
  • Full test suite (801 passed, 7 skipped)
  • Snapshot tests pass (no rendering regression)
  • ty check and ruff check clean
  • Visual check with real agent data (-src-experiments-claude_p)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Refactoring

    • Agent/sidechain messages are now integrated into parent sessions: they render nested under the parent task, no longer show separate session headers, and are excluded from session navigation and per-session exports.
  • Documentation

    • Updated docs to reflect DAG-based integration of agent transcripts and next-phase roadmap.
  • Tests

    • Expanded integration tests for agent transcript ordering, parent-child placement, and rendering/grouping behavior.

Agent (sidechain) entries are now part of the DAG instead of being
partitioned out and reordered heuristically. _integrate_agent_entries()
parents agent roots to their anchor tool_result and assigns synthetic
session IDs ({sessionId}#agent-{agentId}) so they form separate
DAG-lines spliced at the correct junction point.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Mar 8, 2026

📝 Walkthrough

Walkthrough

Integrates agent/sidechain transcript entries into the DAG by adding _integrate_agent_entries() in the converter, assigning synthetic session IDs for agents, wiring agent roots to anchor UUIDs, and updating DAG construction, renderer grouping/pagination, utilities, and tests so agent sessions are treated as child sessions of their parent sessions.

Changes

Cohort / File(s) Summary
Agent Integration Core
claude_code_log/converter.py
Added _integrate_agent_entries() to attach agent sidechain roots to main-session anchors, assign in-memory synthetic session IDs ({sessionId}#agent-{agentId}), use integrated all_messages for DAG building and non-DAG collections, and filter/exclude agent sessions across pagination, dedup, and per-session file generation.
DAG Structure
claude_code_log/dag.py
Added is_sidechain: bool = False to SessionDAGLine and updated docstrings to reflect "unloaded sidechain data" wording; no API signature breaks.
Renderer Adjustments
claude_code_log/renderer.py
Imported is_agent_session, get_parent_session_id, and get_project_display_name; skip agent sessions in navigation and per-session headers; compute effective parent session for agent messages so agent content renders under parent sessions.
Utilities
claude_code_log/utils.py
Added is_agent_session(session_id: str) and get_parent_session_id(session_id: str) helpers (note: duplicated definitions were introduced and should be deduplicated).
Tests
test/test_dag_integration.py
Extended _make_assistant_entry to accept agent_id, updated expectations, and added TestAgentDagIntegration coverage verifying agent anchoring, synthetic sessions, ordering, branch scenarios, and rendering behavior.
Docs & Roadmap
dev-docs/dag.md, work/phase-c-agent-transcripts.md
Updated Phase C status and roadmap, documenting _integrate_agent_entries() behavior, DAG splicing of agent lines, renderer handling, retained fallback helpers, and Phase D next steps.

Sequence Diagram

sequenceDiagram
    participant Directory as Directory Scanner
    participant Converter as Converter
    participant Integrator as _integrate_agent_entries()
    participant DAGBuilder as DAG Builder
    participant Renderer as Renderer

    Directory->>Converter: load_directory_transcripts()
    Converter->>Converter: scan & collect all messages
    Converter->>Integrator: _integrate_agent_entries(all_messages)
    Integrator->>Integrator: build agentId → anchor UUID map
    Integrator->>Integrator: set parentUuid on agent roots
    Integrator->>Integrator: assign synthetic sessionIds ("#agent-")
    Integrator-->>Converter: modified messages (in-place)
    Converter->>DAGBuilder: build_dag_from_entries(all_messages, sidechain_uuids)
    DAGBuilder->>DAGBuilder: construct DAG including agent child sessions
    DAGBuilder-->>Converter: DAG structure
    Converter->>Renderer: generate HTML/navigation
    Renderer->>Renderer: filter out agent sessions from nav
    Renderer->>Renderer: use effective parent session for agent messages
    Renderer-->>Converter: rendered content grouped under parent sessions
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Poem

🐰
I stitched sidechains to the main tree bright,
Anchors hummed and tiny IDs took flight,
Agent hops tucked neatly under boughs,
Parent sessions hold their curious brows,
Rabbit drums the DAG — all snug tonight.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title "Integrate agent transcripts into the DAG (Phase C)" accurately and clearly describes the main objective of the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dev/dag-agents

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@cboos
Copy link
Collaborator Author

cboos commented Mar 8, 2026

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Mar 8, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
test/test_dag_integration.py (1)

741-986: Add one branch-scoped agent regression.

These cases only exercise agents anchored on ordinary session trunks. A case where the anchor lives inside a within-session fork would catch regressions where agent messages get regrouped under the base session instead of the branch pseudo-session.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/test_dag_integration.py` around lines 741 - 986, Add a new test (e.g.,
test_agent_branch_scoped_session) that creates a within-session fork and places
an agent anchor inside one branch to ensure agent entries are attached to the
branch pseudo-session rather than the base session: use _make_user_entry and
_make_assistant_entry to build a main trunk (u1→a1), then create two divergent
branch entries both referencing a1 (branch anchor u2_branch and a continuation
a2_branch) and include agent sidechain entries with agent_id tied to u2_branch;
write them with _write_jsonl and call load_directory_transcripts, then assert
the synthetic agent session appears in tree.sessions with parent_session_id
equal to the specific branch session id and attachment_uuid equal to the branch
anchor's uuid (using tree.sessions, parent_session_id, and attachment_uuid) and
that agent entries are ordered between the branch anchor and its continuation in
the returned result.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@claude_code_log/renderer.py`:
- Around line 2025-2031: The code incorrectly derives agent parent scope by
slicing msg_session_id (msg_session_id.split("#agent-") ...) which loses branch
context; instead, use the session-tree/explicit render parent information (e.g.
prefer message.render_session_id or call the SessionTree method that resolves
the parent/anchor for a session) to compute agent_parent_session so branch
pseudo-sessions are preserved; update occurrences around
msg_session_id/agent_parent_session (and any logic that feeds
_reorder_session_template_messages) to stop splitting on "#agent-" and use the
SessionTree API or message.render_session_id to determine the true parent
session.

---

Nitpick comments:
In `@test/test_dag_integration.py`:
- Around line 741-986: Add a new test (e.g., test_agent_branch_scoped_session)
that creates a within-session fork and places an agent anchor inside one branch
to ensure agent entries are attached to the branch pseudo-session rather than
the base session: use _make_user_entry and _make_assistant_entry to build a main
trunk (u1→a1), then create two divergent branch entries both referencing a1
(branch anchor u2_branch and a continuation a2_branch) and include agent
sidechain entries with agent_id tied to u2_branch; write them with _write_jsonl
and call load_directory_transcripts, then assert the synthetic agent session
appears in tree.sessions with parent_session_id equal to the specific branch
session id and attachment_uuid equal to the branch anchor's uuid (using
tree.sessions, parent_session_id, and attachment_uuid) and that agent entries
are ordered between the branch anchor and its continuation in the returned
result.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a1c21610-c91a-40de-856a-11a147b04f98

📥 Commits

Reviewing files that changed from the base of the PR and between 2eeacec and a9f3c56.

📒 Files selected for processing (6)
  • claude_code_log/converter.py
  • claude_code_log/dag.py
  • claude_code_log/renderer.py
  • dev-docs/dag.md
  • test/test_dag_integration.py
  • work/phase-c-agent-transcripts.md

cboos and others added 2 commits March 8, 2026 18:20
Use session hierarchy to determine the correct parent session for agent
messages (may be a branch pseudo-session when the anchor is inside a
within-session fork). Fallback to extracting from synthetic ID.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extract `is_agent_session()` helper to `utils.py` and use it to filter
agent sessions from: session metadata building, cache population,
pagination session collection, and message grouping for pages. Agent
messages in pagination are grouped under their parent session.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@cboos
Copy link
Collaborator Author

cboos commented Mar 8, 2026

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Mar 8, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
claude_code_log/converter.py (1)

1431-1437: ⚠️ Potential issue | 🟠 Major

Fold agent entries into the parent session cache instead of skipping them.

These messages are grouped back under the parent session later when rendering/paginating, so dropping #agent-... sessions here makes the cached message_count and token totals too low for any session with agents. That skews page assignment and session/page stats away from what the UI actually renders. Coalesce synthetic agent sessions back to the parent session ID here, and mirror the same normalization in _build_session_data_from_messages().

💡 Suggested fix
-            session_id = getattr(message, "sessionId", "")
-            if not session_id or is_agent_session(session_id):
+            session_id = getattr(message, "sessionId", "")
+            if not session_id:
                 continue
+            if is_agent_session(session_id):
+                session_id = session_id.split("#agent-")[0]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@claude_code_log/converter.py` around lines 1431 - 1437, The current branch
skips any message whose sessionId is an agent session
(is_agent_session(session_id)), which drops those messages from the session
cache and undercounts message_count/tokens; instead, when processing
session-level data in this block (and similarly inside
_build_session_data_from_messages()), detect agent sessions and remap/coalesce
them into their parent session id (e.g., derive parent_id from session_id rather
than continue), then proceed to accumulate counts/tokens under that parent_id;
update both this message-processing branch (where message and
SummaryTranscriptEntry are checked) and the _build_session_data_from_messages()
normalization so agent sessions are folded into the parent session consistently.
🧹 Nitpick comments (1)
test/test_dag_integration.py (1)

738-1072: Add a regression for session-metadata/pagination coalescing.

These tests cover DAG/tree/rendering well, but the PR also changes cache/session aggregation and paginated grouping. A focused assertion around _update_cache_with_session_data() or paginated HTML generation would catch cases where agent messages render inline but are omitted from the parent session’s message_count or token totals.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/test_dag_integration.py` around lines 738 - 1072, Add a regression test
that exercises _update_cache_with_session_data (via load_directory_transcripts)
to ensure agent sidechain entries are included in the parent session's
pagination/aggregation metadata: create a main session with an agent anchor and
sidechain agent_entries (like existing tests), call
load_directory_transcripts(tmp_path, silent=True) then call or verify the cache
update path (_update_cache_with_session_data) ran, and assert that
session_tree.sessions["s1"] (or the parent session object) has message_count and
total token-related fields updated to include the agent messages (e.g.,
message_count increased by agent entries, total_tokens or similar totals reflect
agent content) so paginated HTML generation will include agent messages in
parent session aggregates.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@claude_code_log/converter.py`:
- Around line 1431-1437: The current branch skips any message whose sessionId is
an agent session (is_agent_session(session_id)), which drops those messages from
the session cache and undercounts message_count/tokens; instead, when processing
session-level data in this block (and similarly inside
_build_session_data_from_messages()), detect agent sessions and remap/coalesce
them into their parent session id (e.g., derive parent_id from session_id rather
than continue), then proceed to accumulate counts/tokens under that parent_id;
update both this message-processing branch (where message and
SummaryTranscriptEntry are checked) and the _build_session_data_from_messages()
normalization so agent sessions are folded into the parent session consistently.

---

Nitpick comments:
In `@test/test_dag_integration.py`:
- Around line 738-1072: Add a regression test that exercises
_update_cache_with_session_data (via load_directory_transcripts) to ensure agent
sidechain entries are included in the parent session's pagination/aggregation
metadata: create a main session with an agent anchor and sidechain agent_entries
(like existing tests), call load_directory_transcripts(tmp_path, silent=True)
then call or verify the cache update path (_update_cache_with_session_data) ran,
and assert that session_tree.sessions["s1"] (or the parent session object) has
message_count and total token-related fields updated to include the agent
messages (e.g., message_count increased by agent entries, total_tokens or
similar totals reflect agent content) so paginated HTML generation will include
agent messages in parent session aggregates.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3db4e5f1-dedf-4f81-9ffa-473893d3cb36

📥 Commits

Reviewing files that changed from the base of the PR and between a9f3c56 and 000817f.

📒 Files selected for processing (4)
  • claude_code_log/converter.py
  • claude_code_log/renderer.py
  • claude_code_log/utils.py
  • test/test_dag_integration.py

Instead of dropping agent messages from session metadata, remap their
synthetic session IDs to the parent session using get_parent_session_id().
This ensures message counts, token totals, and timestamps in pagination
and cache include agent contributions.

Add regression test verifying agent messages fold into parent session's
message_count and token totals via _build_session_data_from_messages().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@cboos
Copy link
Collaborator Author

cboos commented Mar 8, 2026

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Mar 8, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@claude_code_log/converter.py`:
- Around line 1649-1658: The fallback path still groups by raw sessionId and
thus leaks synthetic agent sessions; update _collect_project_sessions() (or the
process_projects_hierarchy() fallback call) to exclude any session_id where
is_agent_session(session_id) is true and to skip warmup_session_ids, mirroring
the filtering used earlier (session_id not in warmup_session_ids and not
is_agent_session(session_id)); ensure the grouping step uses the filtered
session_ids variable so pseudo-sessions like "#agent-..." are never emitted as
standalone project-index sessions.
- Around line 416-456: The agent anchoring logic in _integrate_agent_entries
only collects anchors from non-sidechain entries so nested agents (whose anchor
is in a sidechain) remain unanchored; update the agent_anchors build to include
any entry with an agentId (i.e., check msg.agentId regardless of
msg.isSidechain) and when multiple candidates exist prefer a non-sidechain
anchor over a sidechain one (so preserve an existing non-sidechain anchor and
only set/override when the current candidate is a better choice), keeping
references to BaseTranscriptEntry, agentId, isSidechain, agent_anchors,
parentUuid, and sessionId to locate the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 047fec33-895e-4754-b83c-c8d84df4dae8

📥 Commits

Reviewing files that changed from the base of the PR and between 000817f and 61b7ff1.

📒 Files selected for processing (4)
  • claude_code_log/converter.py
  • claude_code_log/renderer.py
  • claude_code_log/utils.py
  • test/test_dag_integration.py

Comment on lines +416 to +456
def _integrate_agent_entries(messages: list[TranscriptEntry]) -> None:
"""Parent agent entries and assign synthetic session IDs.

Agent (sidechain) entries share sessionId with their parent session
but form separate conversation threads. This function:

1. Builds a map of agentId -> anchor UUID (the main-session User entry
whose agentId matches, i.e. the tool_result that references the agent)
2. For each agent's root entry (parentUuid=None, isSidechain=True),
sets parentUuid to the anchor UUID
3. Assigns a synthetic sessionId ("{sessionId}#agent-{agentId}") to all
agent entries so they form separate DAG-lines

Mutates entries in place (Pydantic v2 models are mutable by default).
"""
# Build agentId -> anchor UUID map from main-session entries
agent_anchors: dict[str, str] = {}
for msg in messages:
if not isinstance(msg, BaseTranscriptEntry):
continue
if msg.isSidechain:
continue
# Main-session entries with agentId reference an agent transcript
if msg.agentId:
agent_anchors[msg.agentId] = msg.uuid

if not agent_anchors:
return

# Process sidechain entries: parent roots and assign synthetic sessionIds
for msg in messages:
if not isinstance(msg, BaseTranscriptEntry):
continue
if not msg.isSidechain or not msg.agentId:
continue
agent_id = msg.agentId
# Assign synthetic session ID to separate from main session
msg.sessionId = f"{msg.sessionId}#agent-{agent_id}"
# Parent the root entry to the anchor
if msg.parentUuid is None and agent_id in agent_anchors:
msg.parentUuid = agent_anchors[agent_id]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Nested agent transcripts still won't get anchored.

load_transcript() already recurses into agent files, but agent_anchors is built only from non-sidechain entries. If an agent launches another agent, that inner anchor lives in a sidechain entry, so the nested root keeps parentUuid=None and becomes a separate root instead of attaching under the outer agent branch.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@claude_code_log/converter.py` around lines 416 - 456, The agent anchoring
logic in _integrate_agent_entries only collects anchors from non-sidechain
entries so nested agents (whose anchor is in a sidechain) remain unanchored;
update the agent_anchors build to include any entry with an agentId (i.e., check
msg.agentId regardless of msg.isSidechain) and when multiple candidates exist
prefer a non-sidechain anchor over a sidechain one (so preserve an existing
non-sidechain anchor and only set/override when the current candidate is a
better choice), keeping references to BaseTranscriptEntry, agentId, isSidechain,
agent_anchors, parentUuid, and sessionId to locate the change.

Comment on lines +1649 to +1658
# Find all unique session IDs (excluding warmup and agent sessions)
session_ids: set[str] = set()
for message in messages:
if hasattr(message, "sessionId"):
session_id: str = getattr(message, "sessionId")
if session_id and session_id not in warmup_session_ids:
if (
session_id
and session_id not in warmup_session_ids
and not is_agent_session(session_id)
):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The no-cache project-summary fallback still leaks synthetic agent sessions.

Filtering agent sessions out here fixes individual session files, but process_projects_hierarchy() falls back to _collect_project_sessions() when cached project data is missing, and that helper still groups by raw sessionId. In that path, #agent-... pseudo-sessions will still show up as standalone project-index sessions, which contradicts the inline-only behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@claude_code_log/converter.py` around lines 1649 - 1658, The fallback path
still groups by raw sessionId and thus leaks synthetic agent sessions; update
_collect_project_sessions() (or the process_projects_hierarchy() fallback call)
to exclude any session_id where is_agent_session(session_id) is true and to skip
warmup_session_ids, mirroring the filtering used earlier (session_id not in
warmup_session_ids and not is_agent_session(session_id)); ensure the grouping
step uses the filtered session_ids variable so pseudo-sessions like "#agent-..."
are never emitted as standalone project-index sessions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant