Releases: joeharris76/BenchBox
BenchBox 0.2.1
Full Changelog: v0.2.0...v0.2.1
BenchBox 0.2.0
Full Changelog: v0.1.5...v0.2.0
BenchBox 0.1.5
Added
- Textcharts standalone library - Extracted all 15 ASCII chart types and base rendering
primitives into an independenttextchartspackage underpackages/textcharts/. The library
has its ownpyproject.toml, README with chart gallery and API reference, and zero BenchBox
dependencies. Clean standalone names (BarChart,Histogram,Heatmap, etc.) are exported
alongside BenchBox-compatible aliases. BenchBox now depends on textcharts as a path dependency
with compatibility shims preserving existing import paths. - Open table format loading - Added runtime loading support for external table formats
(Delta Lake, Iceberg, Hudi) through adapter-levelload_tableimplementations for Spark
mixin platforms, cloud SQL platforms, and Snowflake/ClickHouse adapters. Format support is
gated on adapter configuration so it is only available on platforms that implement it. - Expanded format capability registry - Registered format capabilities for Hudi,
Presto/Trino, Snowflake, ClickHouse, Redshift, BigQuery, and Spark-based platforms including
cloud lakehouse variants (EMR, Dataproc, Glue, Fabric Spark, Synapse Spark, Dataproc
Serverless). Removed registrations for platforms without actual loading code (including
LakeSail for delta/iceberg/hudi). - Mutation testing - Added
mutmutmutation testing targeting 5 critical modules
(duckdb.py,adapter.py,runner.py,chart_generator.py,run.py) with a
make mutation-testMakefile target for manual quality reviews.
Fixed
- Format capability registry accuracy - Normalized platform display names to match registry
keys and removed platforms from format registrations where adapter code has no actual loading
implementation. - CLI recursive import - Fixed a circular lazy-import in the benchmarks module that caused a
RecursionErroron CLI startup. - CoffeeShop SA2 query - Corrected
group_bycolumn name from'name'to
'product_name'. - Textcharts API migration - Migrated to textcharts v0.1.2 API after breaking changes,
renamedASCII*-prefixed classes across 10 source and test files, removed 3 unused deprecated
factory imports, and regenerated golden snapshots for neutralized defaults. - pytest-xdist worker title patch - Tightened the xdist worker title monkeypatch to prevent
test pollution across parallel workers. - Comprehensive Windows CI compatibility - Fixed 80+ Windows test failures spanning path
separators (.as_posix()for forward-slash comparison), file encoding (encoding="utf-8"
forwrite_text()), numpy int32 overflow on 64-bit multiplication, Rich Console width on
headless CI, Python ABI tag format differences (.pydvs.so),Path.touch()vs
time.time()mocking, NTFS directoryst_sizereturning 0, Windows CWD locks preventing
temp directory cleanup, andshutil.copytreereplacing symlinks for TPC-DS template setup. - TPC-DS dsqgen Windows option prefix - Fixed dsqgen invocation on Windows where the binary
expects/option prefixes instead of-(OPTION_STARTinr_params.c), and switched to
relative paths to stay under dsqgen's 80-charPARAM_MAX_LENbuffer. - Missing
tpcds.idxdistribution file - Added the required TPC-DS distribution index file
to Windows binary packages (both x86_64 and ARM64). - Throughput test timer resolution - Used
time.perf_counter()for throughput duration
calculation to avoid zero-duration results from low-resolutiontime.time()on Windows.
Changed
- Test suite quality overhaul - Deleted 13 hollow coverage-theater test files and replaced
them with behavior-verifying tests for DuckDB, SQLite, and DataFusion adapters. Replaced mock
credential tests with real file-based tests. Strengthened 150is-not-Noneassertions across
5 test files, replaced hollowisinstanceassertions with behavioral checks, and swapped
MagicMockforSimpleNamespaceon attribute-only objects. Removed per-file coverage
enforcement in favor of a suite-wide 60% threshold. - ~316 rendering tests migrated to textcharts - Pure chart-rendering tests moved from
BenchBox's test suite to the standalone textcharts library, with shim import smoke tests
retained in BenchBox to verify re-export paths. - Pytest lane restructure - Converted test lanes from implicit timing heuristics to explicit
source markers with measured-timing-based rebucketing. Restored a lightweight fast lane,
serialized stress tests, re-laned cloud adapter tests toslow+cloud_import, and documented
pytest-xdist safety requirements. - Chart subtitle simplified - Migrated chart subtitle storage from a metadata dict to a
plain string, removing an unnecessary layer of indirection. - Verbose logging extracted - Moved verbose logging configuration from
run.pyinto a
dedicatedcli/verbose_logging.pymodule. - Visualization constants - Extracted magic numbers into named constants across
visualization modules.
Full Changelog: https://github.com/joeharris76/BenchBox/blob/main/CHANGELOG.md#015---2026-03-10
BenchBox 0.1.4
Added
power_barchart type - Added a horizontal bar chart for TPC Power@Size comparisons.
Higher values are treated as better (opposite ofperformance_bar), powered by
summary.tpc_metrics.power_at_sizeand exposed inNormalizedResult.power_bartemplate coverage - Added toflagship,head_to_head,trends,
regression_triage, andexecutive_summary. The chart renders only when TPC metric data is
present and is skipped for non-TPC runs.- Driver-version-aware chart labeling - Multi-platform chart series labels and run summaries
now include driver version context so version comparisons stay explicit in rendered output. - Runtime ABI validation for isolated drivers - Added ABI compatibility checks to isolated
runtime discovery so driver auto-install paths fail fast with actionable validation errors
instead of late runtime crashes. - Presorted data-generation modes for table formats - Added
parquet-sortedoutput mode,
plusdelta-sortedandiceberg-sortedorganization paths with clustering primitives
(z-order, Hilbert, partition-aware sorting) andcluster-bytuning integration.
Fixed
- Query plan capture correctness and persistence - Fixed multiple plan-capture defects:
forwardingcapture_plansthroughRunConfig, DuckDB JSON plan parsing edge cases,
preservation ofquery_planthrough normalization, andshow-plan/compare-plans
loading through the standard result-file path. - SSB dot-notation query IDs -
--queriesnow accepts IDs likeQ2.1, and plan-oriented
CLI flows preserve dotted IDs instead of normalizing them away. - Result timing pipeline accuracy - Fixed datagen/load timing propagation end-to-end,
including per-table load timings intable_statistics, corrected load-phase duration keying,
datagen phase duration and manifest stats in metadata, and explicit total duration override
propagation in result builders. Data-only runs now correctly execute generation, and
force_regenerateis forwarded through CLI and runner paths. - ASCII visualization readability under skewed data - Fixed outlier handling across chart
types (bar, histogram, stacked, scatter, line, CDF, percentile ladder, heatmap), addressed
zero-heavy fallback truncation edge cases, improved natural query sorting and color cycling,
and raised effective render width cap from 120 to 400 characters. --quietoutput contract for automation - Quiet mode now emits only the bare result
filepath to stdout, removing decorative output that broke script parsing.- Runtime environment stability - Fixed interpreter targeting for driver auto-install,
correctedauto_install_usedstate propagation, and resolved SIGSEGV-class failures when
driver_auto_install=truereused an already-matching version. - Additional correctness fixes - Restored
ai_primitivesregistry resolution fallback,
corrected SQLiteforce_recreateoption handling, fixed SSB customer row-count expectation in
SSBRowCountStrategy, and resolved visualize command crashes / multi-series rendering issues.
Changed
- Plan-capture default now uses actual execution timing -
--capture-plansnow defaults to
EXPLAIN (ANALYZE, FORMAT JSON)behavior viaanalyze_plans=True, recording measured timing
in captured plans. Users can opt out withanalyze_plans: falsefor estimate-only capture. - Benchmark runtime/result internals harmonized - Refactored enhanced result construction to
use a shared factory path and aligned canonical runtime behavior for benchmarks like NYC Taxi
and TSBS DevOps. make test-allresource policy and parallelism - Resource-heavy tests are now serialized
to prevent machine stalls, while slow/performance suites are moved to a dedicated stress lane.
The test suite also replaces fixed sleeps with bounded polling, reduces fixture/harness
duplication, and shifts selected CLI/e2e coverage to in-process runners for faster execution.- CI quality gates tightened - Added required table-format integration coverage and promoted
doc checks (linkcheck, example validation, docstring coverage) plus security audit policy
controls to blocking CI behavior.
Full Changelog: https://github.com/joeharris76/BenchBox/blob/main/CHANGELOG.md#014---2026-03-03
BenchBox 0.1.3
Added
- Driver version pinning —
--platform-option driver_version=X.Y.Zpins any platform's Python driver; pair withdriver_auto_install=trueto have BenchBox install it automatically viauv. Active driver version is shown in the run announcement line. - Bulk multi-shard table loading — new
load_table_bulk()interface onFileFormatHandleringests multi-shard tables in a single native call. DuckDB (CSV, Parquet) and ClickHouse Native are the first implementations; TPC-DS sharded runs are measurably faster. - Greyscale / no-color ASCII chart fallbacks — all seven ASCII chart types now use fill-pattern and glyph differentiation when color is unavailable (CI logs,
NO_COLOR, piped output). - Five new ASCII chart types — percentile ladder, stacked bar, sparkline table, CDF, rank table, and normalized speedup (log₂-scaled). All registered in the chart registry and accessible via CLI and MCP.
- Post-run summary charts — charts are automatically generated and displayed in the terminal after every benchmark run and included in MCP
run_benchmarkresponses. - Three new chart template bundles —
latency_deep_dive,regression_triage, andexecutive_summary. fabric-dwas a preferred CLI alias for thefabric_dwplatform.
Fixed
- Driver auto-install version switching — stale
sys.modulesand metadata caching could return the wrong driver version afterdriver_auto_installswapped versions; module cache is now invalidated on switch. - DataFrame cache path mismatch — DataFrame and SQL modes now share a flat directory layout, eliminating redundant data generation when switching modes on the same scale factor.
- ClickHouse zstd double-decompression —
ClickHouseNativeHandlerwas applying manual decompression on top of the driver's built-in decompression, corrupting data for compressed bulk loads. - Platform display names — corrected for Amazon Athena, Google Cloud Dataproc, Microsoft Azure platforms, and Databricks SQL.
- CLI warning when a platform option's default value is not in the declared choices list.
- Ranking normalization crash when all metric values are negative finite numbers.
- PySpark SIGINT handler hanging
pytest-xdistworkers. --validation-modeCLI prompt crash whenspec.defaultis not a string.
Changed
- Four platform drivers moved to optional extras — DuckDB (
benchbox[duckdb]), Polars (benchbox[polars]), ClickHouse Connect (benchbox[clickhouse-connect]), and psycopg2 (benchbox[postgresql]) are no longer hard dependencies. Usepip install benchbox[all]to restore the previous behaviour. - All user-facing terminal output in the run pipeline flows through
emit(), making--quietsuppression and output capture in tests consistent. BaseQueryCatalogMixinandTranslatableQueryMixinextracted from duplicate query-catalog implementations.
Full Changelog: https://github.com/joeharris76/BenchBox/blob/main/CHANGELOG.md#0130---2026-02-23
BenchBox 0.1.2
Added
- DataFrame mode for all benchmarks — complete DataFrame query implementations across all 18 benchmarks including TPC-DS (102 queries), TPC-H (22 queries), SSB, ClickBench, NYC Taxi, TSBS DevOps, H2ODB, AMPLab, CoffeeShop, TPC-H Skew, and Data Vault. DataFrame platforms: Polars, DuckDB, DataFusion, PySpark, Pandas, Modin, Dask, and cuDF (GPU).
- ASCII chart visualizations — terminal-native ASCII rendering replacing Plotly HTML charts. Seven chart types: performance bar, distribution box, query heatmap, comparison bar, diverging bar, summary box, and query latency histogram. ANSI colors, Unicode box-drawing, and best/worst highlighting.
- 14 new SQL platform adapters — PostgreSQL, Trino, PrestoDB, Apache Spark, AWS Athena, Azure Synapse, Microsoft Fabric, Firebolt, MotherDuck, InfluxDB 3.x, TimescaleDB, ClickHouse Cloud, Onehouse Quanton, and managed Spark variants (EMR, Dataproc, Glue, Fabric Spark, Synapse Spark, Dataproc Serverless).
- Open table format support — Delta Lake, Apache Iceberg, Apache Hudi, DuckLake, and Vortex columnar format with format conversion orchestration and manifest v2 for multi-format tracking.
- Physical tuning DDL generation — platform-specific DDL generators for DuckDB, Snowflake, Redshift, BigQuery, ClickHouse, Firebolt, PostgreSQL, TimescaleDB, Trino/Presto/Athena, and Spark family with sort keys, partitioning, clustering, and compression.
- Query plan capture and comparison — plan parsers for DuckDB, PostgreSQL, Redshift, DataFusion, and SQLite. Comparison engine with regression detection, fingerprinting, historical tracking with flapping detection, and CLI visualization.
- Interactive CLI wizard — guided benchmark configuration with platform selection, tuning wizard, scale factor validation, phase/query selection, onboarding, and persistent preferences.
- TPC-DI benchmark — complete implementation across 4 phases: core schema, query suite, ETL pipeline, and validation/testing.
- Cross-platform comparison engine —
benchbox comparecommand with multi-platform analysis, SQL vs DataFrame comparison, and unified visualization. - Unified tuning configuration — YAML-based tuning system with per-platform DDL generation, write-time physical layout configuration, and dry-run preview.
- Cloud storage and deployment modes for S3/GCS/ADLS/DBFS with credential setup wizard and cost estimation.
- TPC compliance improvements: stream-aware validation, query permutations, warmup/measurement iterations, maintenance operations (RF1/RF2), and
--seedfor reproducibility. - New benchmarks: AI/ML Primitives, Metadata Primitives, Write Primitives, Transaction Primitives.
- MCP server:
suggest_chartsandgenerate_charttools;--queriesand--validation-modeCLI flags; tiered--help.
Fixed
- TPC-DS data generation reliability — segfaults with fractional scale factors, parallel generation errors, streaming compression, and chunked file handling.
- Cloud platform stability — credential refresh errors, schema creation ordering, UC Volume uploads, S3 key handling, and BigQuery/Snowflake/Redshift/Databricks adapter issues.
- Type safety — 150+ type errors resolved across production code with proper annotations and TYPE_CHECKING imports.
- SQL dialect translation — SQLGlot compatibility for DuckDB, ClickHouse, DataFusion, and Netezza; reserved keyword quoting and identifier case sensitivity.
- Security hardening: SQL injection prevention, parameterized queries, path traversal protection.
- CLI hanging in non-interactive mode, progress display precision,
--quietmode propagation. - TPC compliance: correct stream permutations, maintenance phase SQL execution, Power@Size calculation parity between SQL and DataFrame modes.
Changed
- Dropped Plotly HTML charts in favor of ASCII-only rendering.
- Lazy-load cloud platform adapters to speed up CLI startup and test suite.
- Optimized TPC-DS smoke tests with selective table generation.
Full Changelog: https://github.com/joeharris76/BenchBox/blob/main/CHANGELOG.md#012---2026-02-09
v0.1.1
Fixed
- Critical: TPC-H/TPC-DS query templates missing from wheel distribution — BenchBox installed from PyPI could not run TPC-H or TPC-DS benchmarks because query template files were stored outside the package tree. Templates are now bundled inside
benchbox/_binaries/*/templates/with a resolution utility that checks the bundled location first and falls back to_sources/for development installs. - dsqgen path buffer overflow — TPC-DS query generation could fail on systems with long temp directory paths (e.g. macOS
/var/folders/...) due to dsqgen's internal 80-char path buffer. Fixed by using short symlinks in the temp directory. - Python 3.10 compatibility for CLI, version utilities, and
tomllibimports. - Windows CI test failures and cross-platform compatibility issues.
- DuckDB version compatibility in tests.
- MCP server: XSS prevention and strengthened path traversal checks.
Added
- MCP server: 7 new tools (
get_query_details,detect_regressions,get_performance_trends,aggregate_results,get_query_plan,export_results,export_summary) and 2 prompts. - GitHub Actions PyPI publishing with trusted publishers.
- Release automation:
--push,--auto-continue, CI validation integration, bidirectional sync. - Platform adoption tiers (
recommended,supported,experimental,preview) replacing booleanrecommendedfield.
Changed
- Minimum Python version explicitly documented as 3.10.
- MCP server refactored to use public API instead of CLI internals.
- Benchmark metadata centralized into single registry.
- Per-platform TPC-DS query template duplicates removed (530 files, ~4 MB saved from wheel).
- MANIFEST.in expanded to include TPC patches, EULAs, and compilation infrastructure for sdist users who build from source.
Full Changelog: https://github.com/joeharris76/BenchBox/blob/main/CHANGELOG.md#011---2026-01-24
v0.1.0
Alpha Software: BenchBox is alpha software. APIs may change without notice, features may be incomplete, and production use is not recommended.
BenchBox v0.1.0 is the initial public release of the database benchmarking framework — making it simple to run industry-standard benchmarks on analytical databases, from embedded engines like DuckDB to cloud data warehouses like Snowflake and Databricks.
Benchmarks (18 total)
- TPC Standards: TPC-H (22 queries), TPC-DS (99 queries), TPC-DI
- Academic: SSB, AMPLab, JoinOrder (IMDB dataset)
- Industry: ClickBench, H2ODB, NYC Taxi, TSBS DevOps, CoffeeShop
- Data Modeling: TPC-H Data Vault
- BenchBox Primitives: Read Primitives, Write Primitives, Transaction Primitives
- Experimental: TPC-DS-OBT, TPC-Havoc, TPC-H Skew
SQL Platforms (16 total)
- Embedded: DuckDB, SQLite, DataFusion
- Cloud Data Warehouses: Snowflake, Databricks, BigQuery, Redshift, Azure Synapse
- Analytical Databases: ClickHouse, Trino, Presto, Firebolt, InfluxDB
- General Purpose: PostgreSQL, Spark, Athena
DataFrame Platforms (8 total)
Polars, DataFusion, DuckDB, PySpark, Pandas, Modin, Dask, cuDF (GPU)
Core Features
- Self-contained data generation (no external tools required)
- Automatic SQL dialect translation between platforms
- CLI with dry-run support, progress bars, and rich output
- Programmatic Python API for integration
- Result export in JSON, CSV, and HTML formats
Full Changelog: https://github.com/joeharris76/BenchBox/blob/main/CHANGELOG.md#010---2026-01-10-initial-release