feat: strengthen CSV analysis safety, error handling, and robustness#7
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 91ecdaea92
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if not columns: | ||
| raise AnalysisError("CSV header not found") | ||
|
|
||
| rows = list(reader) |
There was a problem hiding this comment.
Normalize row keys after stripping CSV headers
_parse_csv_text trims header names into columns but returns raw DictReader rows unchanged, so padded headers like " amount " are later queried as row.get("amount") and treated as entirely missing in summarize_rows. This silently corrupts dtype/missing/stat outputs for otherwise valid CSV files that contain whitespace around header names; map row keys to the normalized header names before returning.
Useful? React with 👍 / 👎.
Motivation
ollamainvocations from CLI and web UI.Description
AnalysisErrorand explicit validation inbitnet_tools/analysis.pyto surface errors for empty CSV content and missing headers, and trimmed/validated header names.utf-8-sig,utf-8,cp949) and unified CSV text parsing withcsv.Snifferdelimiter detection.int,float,date,string) and expanded numeric stats to includeoutlier_count(IQR rule) alongsidecount/mean/min/q1/median/q3/max/stdinsummarize_rows.AnalysisErrorand core helpers inbitnet_tools/__init__.pyand added CLI (bitnet_tools/cli.py) with--timeoutforollamaand friendly error handling for analysis failures.bitnet_tools/web.py) with payload size guard (MAX_CSV_TEXT_CHARS) andollamatimeout support, plus static UI assets (ui/*.html/.js/.css).pyproject.tomland expanded test coverage and test file (tests/test_analysis.py) to cover semicolon-delimited CSVs, empty input, mixed-type column behavior, and presence ofoutlier_count.Testing
python -m pytest -qwhich returned5 passed.python -m bitnet_tools.cli analyze /tmp/next2.csv --question '요약' --out /tmp/next2_payload.jsonand verified inferreddtypeswere{'id': 'int', 'value': 'int', 'dt': 'date'}andoutlier_countprinted as0.0.AnalysisErroris returned as a clear HTTP error when input is invalid.Codex Task