LLM-71: Add Attack success rate support to Notion summary parser#104
LLM-71: Add Attack success rate support to Notion summary parser#104benglewis wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| "Dataset" not in field_names | ||
| or not available_accuracy_columns | ||
| and not available_error_columns | ||
| and not available_attack_success_rate_columns |
There was a problem hiding this comment.
Missing parentheses in mixed and/or validation condition
Medium Severity
The validation condition mixes or and and without explicit parentheses. Python's operator precedence (and binds tighter than or) makes it evaluate correctly today, but the flat formatting strongly suggests all operators are at the same level. A future maintainer adding or reordering clauses could easily break the logic. Wrapping the and group in parentheses — e.g., "Dataset" not in field_names or (not available_accuracy_columns and not available_error_columns and not available_attack_success_rate_columns) — would make the intent unambiguous.
There was a problem hiding this comment.
Summary
- Updated the mixed boolean validation check in
parse_summary_briefto explicitly group theandclauses with parentheses, addressing the review concern about readability and future-maintenance risk while preserving current behavior. hirundo_notion_tools/upload_to_notion.pyL181-L185 - Committed the fix with message:
Clarify summary metric validation precedence(commit7227375), and opened a follow-up PR via themake_prtool titled “Follow-up: Parenthesize mixed validation condition in summary parser”.
Testing
⚠️ python - <<'PY' ... urllib.request.urlopen('https://duckduckgo.com/html/?q=python+and+or+precedence') ... PY(web search attempt failed due to network proxy/tunnel 403 in this environment).- ✅
ruff format hirundo_notion_tools/upload_to_notion.py && ruff check hirundo_notion_tools/upload_to_notion.py ⚠️ pytest hirundo_notion_tools/test_upload_to_notion.py -q(fails during collection becausetyperis not installed in this environment).


Motivation
Accuracy (%)/Error (%)columns and failed when metric columns were renamed (emoji suffixes) or when attack-focused metrics were present or some metric columns were missing.Attack success rate (%)metric family and the emoji-suffixed headers is required so summaries produced by current evaluators are parsed without raisingValueError.Description
parse_summary_briefto recognizeAttack success rate (%)andAttack success rate (%) ⬇️in addition to legacy and emoji-suffixedAccuracy (%)/Error (%)headers.Errorfirst, thenAttack success rate, thenAccuracywhen multiple metric families are present.Datasetplus at least one supported metric family (accuracy, error, or attack success rate) is accepted; preserve stripping and numeric parsing with clearValueErroron non-numeric values.hirundo_notion_tools/test_upload_to_notion.pycovering attack-success-rate-only CSVs and mixed-priority resolution, and include tracking tagLLM-71for issue tracking.Testing
ruff format .(reformatted 2 files) andruff check .(all checks passed).basedpyright, which is not available in this environment (command not found).pytest hirundo_notion_tools/test_upload_to_notion.py, but collection failed due to a missing runtime dependency (typer) so the new tests could not be executed end-to-end in this environment.uv pip install -e ., but network/PyPI access was blocked so dependency installation and full test execution were not possible.Codex Task
Note
Low Risk
Small, well-tested parsing change limited to selecting/validating CSV metric columns; main risk is minor behavior changes if multiple metric columns are present.
Overview
Updates
parse_summary_briefto accept additional metric header variants (emoji-suffixedAccuracy (%)/Error (%)andAttack success rate (%)), and relaxes validation so a CSV withDatasetplus any supported metric column is accepted.Changes metric selection precedence to prefer
Error, thenAttack success rate, thenAccuracy, and adds regression tests covering renamed headers, single-metric files, attack-success-rate-only files, and mixed-column precedence.Written by Cursor Bugbot for commit f688c77. This will update automatically on new commits. Configure here.