LLM-71: Add Attack success rate support to Notion summary parser by benglewis · Pull Request #104 · Hirundo-io/llm-behavior-eval

benglewis · 2026-02-19T11:28:51Z

Motivation

The Notion summary parser assumed only Accuracy (%) / Error (%) columns and failed when metric columns were renamed (emoji suffixes) or when attack-focused metrics were present or some metric columns were missing.
Support for the Attack success rate (%) metric family and the emoji-suffixed headers is required so summaries produced by current evaluators are parsed without raising ValueError.

Description

Extend parse_summary_brief to recognize Attack success rate (%) and Attack success rate (%) ⬇️ in addition to legacy and emoji-suffixed Accuracy (%) / Error (%) headers.
Change metric selection precedence to prefer Error first, then Attack success rate, then Accuracy when multiple metric families are present.
Relax validation so a file with Dataset plus at least one supported metric family (accuracy, error, or attack success rate) is accepted; preserve stripping and numeric parsing with clear ValueError on non-numeric values.
Add regression tests in hirundo_notion_tools/test_upload_to_notion.py covering attack-success-rate-only CSVs and mixed-priority resolution, and include tracking tag LLM-71 for issue tracking.

Testing

Ran formatter and linter: ruff format . (reformatted 2 files) and ruff check . (all checks passed).
Attempted static type check with basedpyright, which is not available in this environment (command not found).
Ran pytest hirundo_notion_tools/test_upload_to_notion.py, but collection failed due to a missing runtime dependency (typer) so the new tests could not be executed end-to-end in this environment.
Attempted to install project dependencies via uv pip install -e ., but network/PyPI access was blocked so dependency installation and full test execution were not possible.

Codex Task

Note

Low Risk
Small, well-tested parsing change limited to selecting/validating CSV metric columns; main risk is minor behavior changes if multiple metric columns are present.

Overview
Updates parse_summary_brief to accept additional metric header variants (emoji-suffixed Accuracy (%)/Error (%) and Attack success rate (%)), and relaxes validation so a CSV with Dataset plus any supported metric column is accepted.

Changes metric selection precedence to prefer Error, then Attack success rate, then Accuracy, and adds regression tests covering renamed headers, single-metric files, attack-success-rate-only files, and mixed-column precedence.

^{Written by Cursor Bugbot for commit f688c77. This will update automatically on new commits. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

cursor · 2026-02-19T11:31:22Z

+            "Dataset" not in field_names
+            or not available_accuracy_columns
+            and not available_error_columns
+            and not available_attack_success_rate_columns


Missing parentheses in mixed and/or validation condition

Medium Severity

The validation condition mixes or and and without explicit parentheses. Python's operator precedence (and binds tighter than or) makes it evaluate correctly today, but the flat formatting strongly suggests all operators are at the same level. A future maintainer adding or reordering clauses could easily break the logic. Wrapping the and group in parentheses — e.g., "Dataset" not in field_names or (not available_accuracy_columns and not available_error_columns and not available_attack_success_rate_columns) — would make the intent unambiguous.

@codex Please fix this

Summary

Updated the mixed boolean validation check in parse_summary_brief to explicitly group the and clauses with parentheses, addressing the review concern about readability and future-maintenance risk while preserving current behavior. hirundo_notion_tools/upload_to_notion.pyL181-L185

Committed the fix with message: Clarify summary metric validation precedence (commit 7227375), and opened a follow-up PR via the make_pr tool titled “Follow-up: Parenthesize mixed validation condition in summary parser”.

Testing

⚠️ python - <<'PY' ... urllib.request.urlopen('https://duckduckgo.com/html/?q=python+and+or+precedence') ... PY (web search attempt failed due to network proxy/tunnel 403 in this environment).

✅ ruff format hirundo_notion_tools/upload_to_notion.py && ruff check hirundo_notion_tools/upload_to_notion.py

⚠️ pytest hirundo_notion_tools/test_upload_to_notion.py -q (fails during collection because typer is not installed in this environment).

View task →

Handle attack success rate columns in summary parser (LLM-71)

f688c77

benglewis added the codex label Feb 19, 2026 — with ChatGPT Codex Connector

cursor Bot reviewed Feb 19, 2026

View reviewed changes

benglewis changed the title ~~[LLM-71] Add Attack success rate support to Notion summary parser~~ LLM-71: Add Attack success rate support to Notion summary parser Feb 19, 2026

benglewis self-assigned this Feb 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLM-71: Add Attack success rate support to Notion summary parser#104

LLM-71: Add Attack success rate support to Notion summary parser#104
benglewis wants to merge 1 commit into
mainfrom
codex/2026-02-19/linear-mention-llm-71-renamed-csv-columns-break-downstream

benglewis commented Feb 19, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Feb 19, 2026

Uh oh!

benglewis Feb 19, 2026

Uh oh!

chatgpt-codex-connector Bot Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

benglewis commented Feb 19, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Description

Testing

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Feb 19, 2026

Choose a reason for hiding this comment

Missing parentheses in mixed and/or validation condition

Uh oh!

benglewis Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Feb 19, 2026

Choose a reason for hiding this comment

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

benglewis commented Feb 19, 2026 •

edited by cursor Bot

Loading