feat: upgrade multi-CSV analysis with drift detection and group-ratio insights#18
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 36f4f3f81d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| " if key in merged.columns and key in df.columns:\n" | ||
| " merged = merged.merge(df, on=key, how='outer', suffixes=('', '_r'))\n\n" | ||
| "missing_ratio = merged.isna().mean().sort_values(ascending=False)\n" | ||
| "print('결측 비율 상위:\n', missing_ratio.head(10))\n\n" |
There was a problem hiding this comment.
Escape newlines in generated pandas example
The pandas snippet emitted by build_code_guidance is syntactically invalid because \n is embedded directly inside single-quoted literals, so the generated text contains a real line break before the closing quote (e.g., in print('결측 비율 상위:\n', ...)). When users copy the pandas_example block from multi-analyze output and run it, Python raises SyntaxError: unterminated string literal, which breaks the new “actionable code guidance” flow.
Useful? React with 👍 / 👎.
| csv_text = self.csv_text.get("1.0", "end").strip() | ||
| if self.csv_path: |
There was a problem hiding this comment.
Keep Tk widget reads on the main thread
_analyze_async starts a worker thread, but _analyze immediately reads Tk widgets (self.question.get, self.csv_text.get) from that background thread; Tkinter operations are required to run on the UI thread, so this can intermittently raise TclError or freeze/crash the desktop app when users click Analyze. Capture UI state on the main thread (e.g., via after) before dispatching background work.
Useful? React with 👍 / 👎.
Motivation
multi-analyzeso common workflows like시도명 x 세차유형are supported.Description
bitnet_tools/multi_csv.pyto compute numeric outlier ratio (IQR-based),dominant_value_ratio, and optional group-target ratio tables, and added cross-file_schema_driftsummarizing dtype changes, missing-ratio range, dominant-value range, and numeric mean range.bitnet_tools/analysis.pywith a streamingsummarize_readerto avoid full materialization and addedbuild_markdown_reportfor single-file reports, plus robust numeric aggregation (count/mean/min/max).bitnet_tools/cli.py) to addmulti-analyzeoptions--group-columnand--target-column, addedreport,desktop, anddoctorflows, and wire-ups to produce JSON + Markdown outputs viaanalyze_multiple_csv.bitnet_tools/desktop.py+bitnet_desktop.pywandBitNet_Desktop_Start.bat) and environment diagnostics (bitnet_tools/doctor.py), updatedpyproject.tomlto includebitnet-desktop, and expanded README examples to document the new grouped-ratio workflow.tests/test_analysis.pyandtests/test_cli.pyto cover schema drift, group-target ratios, CLI behavior, and report generation.Testing
pytest -qand all tests passed (13 passed).multi-analyzeand new options withpython -m bitnet_tools.cli --help.python -m bitnet_tools.cli multi-analyze /tmp/g1.csv /tmp/g2.csv --question "확장분석" --group-column city --target-column type --out-json /tmp/g.json --out-report /tmp/g.mdwhich produced the JSON and Markdown outputs successfully.Codex Task