You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Implement recommendation C to avoid blocking the web UI for expensive chart generation by running chart rendering out-of-band.
Provide richer multi-CSV analysis/reporting and local desktop convenience to improve user workflows for CSV profiling and BitNet prompts.
Improve streaming/low-memory summary computation so large files are profiled without materializing full value lists.
Surface a lightweight diagnostics command to validate local Ollama/BitNet availability and inform tuning decisions.
Description
Added an in-process async chart job queue in bitnet_tools/web.py with helpers submit_chart_job(files) and get_chart_job(job_id) and HTTP endpoints POST /api/charts/jobs and GET /api/charts/jobs/<job_id> (thread pool + per-job input/output dirs under .bitnet_cache/chart_jobs).
Introduced bitnet_tools/visualize.py to create sampled charts per CSV and integrated it into web/UI and CLI chart flows via create_multi_charts.
Implemented bitnet_tools/multi_csv.py for streaming, memory-bounded multi-file profiling, schema-drift detection, insights, caching, and markdown/report builders; exposed analyze_multiple_csv, build_multi_csv_markdown, and result_to_json.
Updated bitnet_tools/analysis.py to use streaming summarization (summarize_reader) with O(1) memory numeric aggregations and added build_markdown_report.
Added Windows desktop UI launch support in bitnet_tools/desktop.py and top-level bitnet_desktop.pyw, plus a doctor helper in bitnet_tools/doctor.py to collect environment info.
Enhanced CLI (bitnet_tools/cli.py) with commands: multi-analyze, report, desktop, and doctor, and wiring for chart generation and report output.
Extended web UI assets (bitnet_tools/ui/*) to support multi-CSV upload, dashboard rendering, and controls to run the new flows.
Added tests covering analysis, multi-file analysis, CLI flows, and web chart job lifecycle (tests/*) and a .gitignore and pyproject script entry for the desktop command.
Testing
Ran unit tests with pytest -q and all tests passed: 24 passed.
Added tests/test_web.py to validate chart job submission/completion and not-found behavior, and tests/* coverage for multi_csv, visualize, cli, and analysis reporting features which passed.
Performed a simple micro-benchmark comparing analyze_multiple_csv(..., max_workers=1) vs max_workers=4 on synthesized CSVs which showed workers=1 performed at least as well in this environment, indicating that parallelism tuning should be data- and environment-dependent rather than blindly increasing worker counts.
Sanitize chart job filenames before writing input files
_run_chart_job joins the client-provided name directly into job_input_dir and writes it, so a crafted value like ../../outside.csv escapes the job directory and writes to arbitrary writable paths. This is exploitable whenever the web API is exposed beyond fully trusted callers (for example, UI bound to 0.0.0.0), and can overwrite unrelated files on the host. Normalize/validate the filename (or resolve and enforce parent containment) before write_text.
Block path traversal in /api/multi-analyze file names
The /api/multi-analyze handler also writes name directly via Path(td) / name without sanitization, so a JSON request containing ../ segments can write outside the temporary directory. In deployments where this endpoint is reachable by untrusted input, that allows arbitrary file overwrite in writable locations; enforce basename-only names or verify the resolved path stays under td before writing.
build_code_guidance currently embeds \n inside single-quoted print literals in the generated snippet, which becomes a real newline character when the outer Python string is built. The resulting pandas_example contains an unterminated string literal and cannot be executed as-is, so users copying the suggested code will hit a syntax error immediately.
Evict finished chart jobs from the in-memory registry
get_chart_job returns terminal results but never removes the corresponding future from _CHART_JOBS, while submit_chart_job keeps appending new entries. In a long-running server this creates unbounded growth of retained futures/results and steadily increases memory usage as chart requests accumulate; delete completed/failed jobs or add retention limits.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Description
bitnet_tools/web.pywith helperssubmit_chart_job(files)andget_chart_job(job_id)and HTTP endpointsPOST /api/charts/jobsandGET /api/charts/jobs/<job_id>(thread pool + per-job input/output dirs under.bitnet_cache/chart_jobs).bitnet_tools/visualize.pyto create sampled charts per CSV and integrated it into web/UI and CLI chart flows viacreate_multi_charts.bitnet_tools/multi_csv.pyfor streaming, memory-bounded multi-file profiling, schema-drift detection, insights, caching, and markdown/report builders; exposedanalyze_multiple_csv,build_multi_csv_markdown, andresult_to_json.bitnet_tools/analysis.pyto use streaming summarization (summarize_reader) with O(1) memory numeric aggregations and addedbuild_markdown_report.bitnet_tools/desktop.pyand top-levelbitnet_desktop.pyw, plus adoctorhelper inbitnet_tools/doctor.pyto collect environment info.bitnet_tools/cli.py) with commands:multi-analyze,report,desktop, anddoctor, and wiring for chart generation and report output.bitnet_tools/ui/*) to support multi-CSV upload, dashboard rendering, and controls to run the new flows.tests/*) and a.gitignoreandpyprojectscript entry for the desktop command.Testing
pytest -qand all tests passed:24 passed.tests/test_web.pyto validate chart job submission/completion and not-found behavior, andtests/*coverage formulti_csv,visualize,cli, andanalysisreporting features which passed.analyze_multiple_csv(..., max_workers=1)vsmax_workers=4on synthesized CSVs which showedworkers=1performed at least as well in this environment, indicating that parallelism tuning should be data- and environment-dependent rather than blindly increasing worker counts.Codex Task