Skip to content

feat: upgrade multi-CSV analysis with drift detection and group-ratio insights#18

Merged
rad1092 merged 2 commits into
mainfrom
codex/evaluate-current-project-completion-level
Feb 14, 2026
Merged

feat: upgrade multi-CSV analysis with drift detection and group-ratio insights#18
rad1092 merged 2 commits into
mainfrom
codex/evaluate-current-project-completion-level

Conversation

@rad1092
Copy link
Copy Markdown
Owner

@rad1092 rad1092 commented Feb 14, 2026

Motivation

  • Move multi-file analysis beyond Excel-level summaries by adding distribution and comparison diagnostics so users can spot drift and subgroup composition quickly.
  • Provide actionable follow-ups by generating Pandas example code and richer markdown reports to speed up exploration and visualization.
  • Improve CLI ergonomics to enable grouped ratio analysis directly from multi-analyze so common workflows like 시도명 x 세차유형 are supported.

Description

  • Enhanced bitnet_tools/multi_csv.py to compute numeric outlier ratio (IQR-based), dominant_value_ratio, and optional group-target ratio tables, and added cross-file _schema_drift summarizing dtype changes, missing-ratio range, dominant-value range, and numeric mean range.
  • Extended bitnet_tools/analysis.py with a streaming summarize_reader to avoid full materialization and added build_markdown_report for single-file reports, plus robust numeric aggregation (count/mean/min/max).
  • Updated CLI (bitnet_tools/cli.py) to add multi-analyze options --group-column and --target-column, added report, desktop, and doctor flows, and wire-ups to produce JSON + Markdown outputs via analyze_multiple_csv.
  • Added small desktop UI (bitnet_tools/desktop.py + bitnet_desktop.pyw and BitNet_Desktop_Start.bat) and environment diagnostics (bitnet_tools/doctor.py), updated pyproject.toml to include bitnet-desktop, and expanded README examples to document the new grouped-ratio workflow.
  • Tests: added/updated tests in tests/test_analysis.py and tests/test_cli.py to cover schema drift, group-target ratios, CLI behavior, and report generation.

Testing

  • Ran pytest -q and all tests passed (13 passed).
  • Verified CLI help shows multi-analyze and new options with python -m bitnet_tools.cli --help.
  • Executed a sample multi-file run with python -m bitnet_tools.cli multi-analyze /tmp/g1.csv /tmp/g2.csv --question "확장분석" --group-column city --target-column type --out-json /tmp/g.json --out-report /tmp/g.md which produced the JSON and Markdown outputs successfully.

Codex Task

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 36f4f3f81d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread bitnet_tools/multi_csv.py
" if key in merged.columns and key in df.columns:\n"
" merged = merged.merge(df, on=key, how='outer', suffixes=('', '_r'))\n\n"
"missing_ratio = merged.isna().mean().sort_values(ascending=False)\n"
"print('결측 비율 상위:\n', missing_ratio.head(10))\n\n"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Escape newlines in generated pandas example

The pandas snippet emitted by build_code_guidance is syntactically invalid because \n is embedded directly inside single-quoted literals, so the generated text contains a real line break before the closing quote (e.g., in print('결측 비율 상위:\n', ...)). When users copy the pandas_example block from multi-analyze output and run it, Python raises SyntaxError: unterminated string literal, which breaks the new “actionable code guidance” flow.

Useful? React with 👍 / 👎.

Comment thread bitnet_tools/desktop.py
Comment on lines +148 to +149
csv_text = self.csv_text.get("1.0", "end").strip()
if self.csv_path:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep Tk widget reads on the main thread

_analyze_async starts a worker thread, but _analyze immediately reads Tk widgets (self.question.get, self.csv_text.get) from that background thread; Tkinter operations are required to run on the UI thread, so this can intermittently raise TclError or freeze/crash the desktop app when users click Analyze. Capture UI state on the main thread (e.g., via after) before dispatching background work.

Useful? React with 👍 / 👎.

@rad1092 rad1092 merged commit edc007d into main Feb 14, 2026
4 checks passed
@rad1092 rad1092 deleted the codex/evaluate-current-project-completion-level branch February 14, 2026 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant