Skip to content

fix(claude_agent_sdk): log ResultMessage metrices#225

Open
Mahhheshh (Mahhheshh) wants to merge 1 commit intobraintrustdata:mainfrom
Mahhheshh:claude_agent_sdk_imp
Open

fix(claude_agent_sdk): log ResultMessage metrices#225
Mahhheshh (Mahhheshh) wants to merge 1 commit intobraintrustdata:mainfrom
Mahhheshh:claude_agent_sdk_imp

Conversation

@Mahhheshh
Copy link
Copy Markdown
Contributor

Capture previously dropped ResultMessage fields in _handle_result.

ref 5 (#149)

@starfolkai
Copy link
Copy Markdown

starfolkai bot commented Apr 8, 2026

Drop into this review session: sfk devbox connect pr-225-braintrust-sdk-python-17a8 --attach

Found 2 issues:

  1. Test guards use hasattr but production code filters with if v is not None -- these have different semantics. When a ResultMessage attribute exists but has value None (e.g., stop_sequence is None when no stop sequence was triggered), hasattr returns True but the production filter excludes it from metadata. The assertion assert task_span.get("metadata", {}).get("stop_sequence") is not None would then fail. All five new guards (lines 175-184) should use getattr(result_message, ..., None) is not None to match the if v is not None filter in _handle_result (line 739).

assert task_span.get("metadata", {}).get("session_id") is not None
if hasattr(result_message, "stop_reason"):
assert task_span.get("metadata", {}).get("stop_reason") is not None
if hasattr(result_message, "stop_sequence"):
assert task_span.get("metadata", {}).get("stop_sequence") is not None
if hasattr(result_message, "total_cost_usd"):
assert task_span.get("metadata", {}).get("total_cost_usd") is not None
if hasattr(result_message, "duration_ms"):
assert task_span.get("metadata", {}).get("duration_ms") is not None
if hasattr(result_message, "duration_api_ms"):
assert task_span.get("metadata", {}).get("duration_api_ms") is not None
llm_spans = [s for s in spans if s["span_attributes"]["type"] == SpanTypeAttribute.LLM]

Production filter for reference:

result_metadata = {
k: v
for k, v in {
"num_turns": getattr(message, "num_turns", None),
"session_id": getattr(message, "session_id", None),
"stop_reason": getattr(message, "stop_reason", None),
"stop_sequence": getattr(message, "stop_sequence", None),
"total_cost_usd": getattr(message, "total_cost_usd", None),
"duration_ms": getattr(message, "duration_ms", None),
"duration_api_ms": getattr(message, "duration_api_ms", None),
}.items()
if v is not None
}

  1. The stop_sequence assertion is dead code -- the cassette for test_calculator_with_multiple_operations does not include stop_sequence in the result payload, so the guard is always False and the assertion at line 177-178 never executes. This field has zero test coverage. Consider either updating the cassette to include a result with a non-null stop_sequence, or adding a targeted unit test with a synthetic ResultMessage (the existing ResultMessage stub at line 1360 also needs updating -- it only defines num_turns and session_id).

assert task_span.get("metadata", {}).get("stop_reason") is not None
if hasattr(result_message, "stop_sequence"):
assert task_span.get("metadata", {}).get("stop_sequence") is not None
if hasattr(result_message, "total_cost_usd"):

Fake ResultMessage stub that needs the new fields:

class ResultMessage:
def __init__(
self,
*,
input_tokens: int = 1,
output_tokens: int = 1,
cache_creation_input_tokens: int = 0,
num_turns: int = 1,
session_id: str = "session-123",
):
self.usage = types.SimpleNamespace(
input_tokens=input_tokens,
output_tokens=output_tokens,
cache_creation_input_tokens=cache_creation_input_tokens,
)
self.num_turns = num_turns
self.session_id = session_id

@AbhiPrasad
Copy link
Copy Markdown
Member

please rebase! CI should be fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants