Releases · joeharris76/BenchBox

Textcharts standalone library - Extracted all 15 ASCII chart types and base rendering
primitives into an independent textcharts package under packages/textcharts/. The library
has its own pyproject.toml, README with chart gallery and API reference, and zero BenchBox
dependencies. Clean standalone names (BarChart, Histogram, Heatmap, etc.) are exported
alongside BenchBox-compatible aliases. BenchBox now depends on textcharts as a path dependency
with compatibility shims preserving existing import paths.
Open table format loading - Added runtime loading support for external table formats
(Delta Lake, Iceberg, Hudi) through adapter-level load_table implementations for Spark
mixin platforms, cloud SQL platforms, and Snowflake/ClickHouse adapters. Format support is
gated on adapter configuration so it is only available on platforms that implement it.
Expanded format capability registry - Registered format capabilities for Hudi,
Presto/Trino, Snowflake, ClickHouse, Redshift, BigQuery, and Spark-based platforms including
cloud lakehouse variants (EMR, Dataproc, Glue, Fabric Spark, Synapse Spark, Dataproc
Serverless). Removed registrations for platforms without actual loading code (including
LakeSail for delta/iceberg/hudi).
Mutation testing - Added mutmut mutation testing targeting 5 critical modules
(duckdb.py, adapter.py, runner.py, chart_generator.py, run.py) with a
make mutation-test Makefile target for manual quality reviews.

Fixed

Format capability registry accuracy - Normalized platform display names to match registry
keys and removed platforms from format registrations where adapter code has no actual loading
implementation.
CLI recursive import - Fixed a circular lazy-import in the benchmarks module that caused a
RecursionError on CLI startup.
CoffeeShop SA2 query - Corrected group_by column name from 'name' to
'product_name'.
Textcharts API migration - Migrated to textcharts v0.1.2 API after breaking changes,
renamed ASCII*-prefixed classes across 10 source and test files, removed 3 unused deprecated
factory imports, and regenerated golden snapshots for neutralized defaults.
pytest-xdist worker title patch - Tightened the xdist worker title monkeypatch to prevent
test pollution across parallel workers.
Comprehensive Windows CI compatibility - Fixed 80+ Windows test failures spanning path
separators (.as_posix() for forward-slash comparison), file encoding (encoding="utf-8"
for write_text()), numpy int32 overflow on 64-bit multiplication, Rich Console width on
headless CI, Python ABI tag format differences (.pyd vs .so), Path.touch() vs
time.time() mocking, NTFS directory st_size returning 0, Windows CWD locks preventing
temp directory cleanup, and shutil.copytree replacing symlinks for TPC-DS template setup.
TPC-DS dsqgen Windows option prefix - Fixed dsqgen invocation on Windows where the binary
expects / option prefixes instead of - (OPTION_START in r_params.c), and switched to
relative paths to stay under dsqgen's 80-char PARAM_MAX_LEN buffer.
Missing tpcds.idx distribution file - Added the required TPC-DS distribution index file
to Windows binary packages (both x86_64 and ARM64).
Throughput test timer resolution - Used time.perf_counter() for throughput duration
calculation to avoid zero-duration results from low-resolution time.time() on Windows.

Changed

Test suite quality overhaul - Deleted 13 hollow coverage-theater test files and replaced
them with behavior-verifying tests for DuckDB, SQLite, and DataFusion adapters. Replaced mock
credential tests with real file-based tests. Strengthened 150 is-not-None assertions across
5 test files, replaced hollow isinstance assertions with behavioral checks, and swapped
MagicMock for SimpleNamespace on attribute-only objects. Removed per-file coverage
enforcement in favor of a suite-wide 60% threshold.
~316 rendering tests migrated to textcharts - Pure chart-rendering tests moved from
BenchBox's test suite to the standalone textcharts library, with shim import smoke tests
retained in BenchBox to verify re-export paths.
Pytest lane restructure - Converted test lanes from implicit timing heuristics to explicit
source markers with measured-timing-based rebucketing. Restored a lightweight fast lane,
serialized stress tests, re-laned cloud adapter tests to slow+cloud_import, and documented
pytest-xdist safety requirements.
Chart subtitle simplified - Migrated chart subtitle storage from a metadata dict to a
plain string, removing an unnecessary layer of indirection.
Verbose logging extracted - Moved verbose logging configuration from run.py into a
dedicated cli/verbose_logging.py module.
Visualization constants - Extracted magic numbers into named constants across
visualization modules.

Full Changelog: https://github.com/joeharris76/BenchBox/blob/main/CHANGELOG.md#015---2026-03-10

Assets 2

03 Mar 04:02

github-actions

v0.1.4

48d731e

BenchBox 0.1.4

Added

power_bar chart type - Added a horizontal bar chart for TPC Power@Size comparisons.
Higher values are treated as better (opposite of performance_bar), powered by
summary.tpc_metrics.power_at_size and exposed in NormalizedResult.
power_bar template coverage - Added to flagship, head_to_head, trends,
regression_triage, and executive_summary. The chart renders only when TPC metric data is
present and is skipped for non-TPC runs.
Driver-version-aware chart labeling - Multi-platform chart series labels and run summaries
now include driver version context so version comparisons stay explicit in rendered output.
Runtime ABI validation for isolated drivers - Added ABI compatibility checks to isolated
runtime discovery so driver auto-install paths fail fast with actionable validation errors
instead of late runtime crashes.
Presorted data-generation modes for table formats - Added parquet-sorted output mode,
plus delta-sorted and iceberg-sorted organization paths with clustering primitives
(z-order, Hilbert, partition-aware sorting) and cluster-by tuning integration.

Fixed

Query plan capture correctness and persistence - Fixed multiple plan-capture defects:
forwarding capture_plans through RunConfig, DuckDB JSON plan parsing edge cases,
preservation of query_plan through normalization, and show-plan / compare-plans
loading through the standard result-file path.
SSB dot-notation query IDs - --queries now accepts IDs like Q2.1, and plan-oriented
CLI flows preserve dotted IDs instead of normalizing them away.
Result timing pipeline accuracy - Fixed datagen/load timing propagation end-to-end,
including per-table load timings in table_statistics, corrected load-phase duration keying,
datagen phase duration and manifest stats in metadata, and explicit total duration override
propagation in result builders. Data-only runs now correctly execute generation, and
force_regenerate is forwarded through CLI and runner paths.
ASCII visualization readability under skewed data - Fixed outlier handling across chart
types (bar, histogram, stacked, scatter, line, CDF, percentile ladder, heatmap), addressed
zero-heavy fallback truncation edge cases, improved natural query sorting and color cycling,
and raised effective render width cap from 120 to 400 characters.
--quiet output contract for automation - Quiet mode now emits only the bare result
filepath to stdout, removing decorative output that broke script parsing.
Runtime environment stability - Fixed interpreter targeting for driver auto-install,
corrected auto_install_used state propagation, and resolved SIGSEGV-class failures when
driver_auto_install=true reused an already-matching version.
Additional correctness fixes - Restored ai_primitives registry resolution fallback,
corrected SQLite force_recreate option handling, fixed SSB customer row-count expectation in
SSBRowCountStrategy, and resolved visualize command crashes / multi-series rendering issues.

Changed

Plan-capture default now uses actual execution timing - --capture-plans now defaults to
EXPLAIN (ANALYZE, FORMAT JSON) behavior via analyze_plans=True, recording measured timing
in captured plans. Users can opt out with analyze_plans: false for estimate-only capture.
Benchmark runtime/result internals harmonized - Refactored enhanced result construction to
use a shared factory path and aligned canonical runtime behavior for benchmarks like NYC Taxi
and TSBS DevOps.
make test-all resource policy and parallelism - Resource-heavy tests are now serialized
to prevent machine stalls, while slow/performance suites are moved to a dedicated stress lane.
The test suite also replaces fixed sleeps with bounded polling, reduces fixture/harness
duplication, and shifts selected CLI/e2e coverage to in-process runners for faster execution.
CI quality gates tightened - Added required table-format integration coverage and promoted
doc checks (linkcheck, example validation, docstring coverage) plus security audit policy
controls to blocking CI behavior.

Full Changelog: https://github.com/joeharris76/BenchBox/blob/main/CHANGELOG.md#014---2026-03-03

Assets 4

24 Feb 00:28

joeharris76

v0.1.3

5697344

BenchBox 0.1.3

Added

Driver version pinning — --platform-option driver_version=X.Y.Z pins any platform's Python driver; pair with driver_auto_install=true to have BenchBox install it automatically via uv. Active driver version is shown in the run announcement line.
Bulk multi-shard table loading — new load_table_bulk() interface on FileFormatHandler ingests multi-shard tables in a single native call. DuckDB (CSV, Parquet) and ClickHouse Native are the first implementations; TPC-DS sharded runs are measurably faster.
Greyscale / no-color ASCII chart fallbacks — all seven ASCII chart types now use fill-pattern and glyph differentiation when color is unavailable (CI logs, NO_COLOR, piped output).
Five new ASCII chart types — percentile ladder, stacked bar, sparkline table, CDF, rank table, and normalized speedup (log₂-scaled). All registered in the chart registry and accessible via CLI and MCP.
Post-run summary charts — charts are automatically generated and displayed in the terminal after every benchmark run and included in MCP run_benchmark responses.
Three new chart template bundles — latency_deep_dive, regression_triage, and executive_summary.
fabric-dw as a preferred CLI alias for the fabric_dw platform.

Fixed

Driver auto-install version switching — stale sys.modules and metadata caching could return the wrong driver version after driver_auto_install swapped versions; module cache is now invalidated on switch.
DataFrame cache path mismatch — DataFrame and SQL modes now share a flat directory layout, eliminating redundant data generation when switching modes on the same scale factor.
ClickHouse zstd double-decompression — ClickHouseNativeHandler was applying manual decompression on top of the driver's built-in decompression, corrupting data for compressed bulk loads.
Platform display names — corrected for Amazon Athena, Google Cloud Dataproc, Microsoft Azure platforms, and Databricks SQL.
CLI warning when a platform option's default value is not in the declared choices list.
Ranking normalization crash when all metric values are negative finite numbers.
PySpark SIGINT handler hanging pytest-xdist workers.
--validation-mode CLI prompt crash when spec.default is not a string.

Changed

Four platform drivers moved to optional extras — DuckDB (benchbox[duckdb]), Polars (benchbox[polars]), ClickHouse Connect (benchbox[clickhouse-connect]), and psycopg2 (benchbox[postgresql]) are no longer hard dependencies. Use pip install benchbox[all] to restore the previous behaviour.
All user-facing terminal output in the run pipeline flows through emit(), making --quiet suppression and output capture in tests consistent.
BaseQueryCatalogMixin and TranslatableQueryMixin extracted from duplicate query-catalog implementations.

Full Changelog: https://github.com/joeharris76/BenchBox/blob/main/CHANGELOG.md#0130---2026-02-23

Assets 2

24 Feb 00:31

joeharris76

v0.1.2

d5b1135

BenchBox 0.1.2

Added

DataFrame mode for all benchmarks — complete DataFrame query implementations across all 18 benchmarks including TPC-DS (102 queries), TPC-H (22 queries), SSB, ClickBench, NYC Taxi, TSBS DevOps, H2ODB, AMPLab, CoffeeShop, TPC-H Skew, and Data Vault. DataFrame platforms: Polars, DuckDB, DataFusion, PySpark, Pandas, Modin, Dask, and cuDF (GPU).
ASCII chart visualizations — terminal-native ASCII rendering replacing Plotly HTML charts. Seven chart types: performance bar, distribution box, query heatmap, comparison bar, diverging bar, summary box, and query latency histogram. ANSI colors, Unicode box-drawing, and best/worst highlighting.
14 new SQL platform adapters — PostgreSQL, Trino, PrestoDB, Apache Spark, AWS Athena, Azure Synapse, Microsoft Fabric, Firebolt, MotherDuck, InfluxDB 3.x, TimescaleDB, ClickHouse Cloud, Onehouse Quanton, and managed Spark variants (EMR, Dataproc, Glue, Fabric Spark, Synapse Spark, Dataproc Serverless).
Open table format support — Delta Lake, Apache Iceberg, Apache Hudi, DuckLake, and Vortex columnar format with format conversion orchestration and manifest v2 for multi-format tracking.
Physical tuning DDL generation — platform-specific DDL generators for DuckDB, Snowflake, Redshift, BigQuery, ClickHouse, Firebolt, PostgreSQL, TimescaleDB, Trino/Presto/Athena, and Spark family with sort keys, partitioning, clustering, and compression.
Query plan capture and comparison — plan parsers for DuckDB, PostgreSQL, Redshift, DataFusion, and SQLite. Comparison engine with regression detection, fingerprinting, historical tracking with flapping detection, and CLI visualization.
Interactive CLI wizard — guided benchmark configuration with platform selection, tuning wizard, scale factor validation, phase/query selection, onboarding, and persistent preferences.
TPC-DI benchmark — complete implementation across 4 phases: core schema, query suite, ETL pipeline, and validation/testing.
Cross-platform comparison engine — benchbox compare command with multi-platform analysis, SQL vs DataFrame comparison, and unified visualization.
Unified tuning configuration — YAML-based tuning system with per-platform DDL generation, write-time physical layout configuration, and dry-run preview.
Cloud storage and deployment modes for S3/GCS/ADLS/DBFS with credential setup wizard and cost estimation.
TPC compliance improvements: stream-aware validation, query permutations, warmup/measurement iterations, maintenance operations (RF1/RF2), and --seed for reproducibility.
New benchmarks: AI/ML Primitives, Metadata Primitives, Write Primitives, Transaction Primitives.
MCP server: suggest_charts and generate_chart tools; --queries and --validation-mode CLI flags; tiered --help.

Fixed

TPC-DS data generation reliability — segfaults with fractional scale factors, parallel generation errors, streaming compression, and chunked file handling.
Cloud platform stability — credential refresh errors, schema creation ordering, UC Volume uploads, S3 key handling, and BigQuery/Snowflake/Redshift/Databricks adapter issues.
Type safety — 150+ type errors resolved across production code with proper annotations and TYPE_CHECKING imports.
SQL dialect translation — SQLGlot compatibility for DuckDB, ClickHouse, DataFusion, and Netezza; reserved keyword quoting and identifier case sensitivity.
Security hardening: SQL injection prevention, parameterized queries, path traversal protection.
CLI hanging in non-interactive mode, progress display precision, --quiet mode propagation.
TPC compliance: correct stream permutations, maintenance phase SQL execution, Power@Size calculation parity between SQL and DataFrame modes.

Changed

Dropped Plotly HTML charts in favor of ASCII-only rendering.
Lazy-load cloud platform adapters to speed up CLI startup and test suite.
Optimized TPC-DS smoke tests with selective table generation.

Full Changelog: https://github.com/joeharris76/BenchBox/blob/main/CHANGELOG.md#012---2026-02-09

Assets 2

07 Feb 14:09

joeharris76

v0.1.1

e0438e9

v0.1.1

Fixed

Critical: TPC-H/TPC-DS query templates missing from wheel distribution — BenchBox installed from PyPI could not run TPC-H or TPC-DS benchmarks because query template files were stored outside the package tree. Templates are now bundled inside benchbox/_binaries/*/templates/ with a resolution utility that checks the bundled location first and falls back to _sources/ for development installs.
dsqgen path buffer overflow — TPC-DS query generation could fail on systems with long temp directory paths (e.g. macOS /var/folders/...) due to dsqgen's internal 80-char path buffer. Fixed by using short symlinks in the temp directory.
Python 3.10 compatibility for CLI, version utilities, and tomllib imports.
Windows CI test failures and cross-platform compatibility issues.
DuckDB version compatibility in tests.
MCP server: XSS prevention and strengthened path traversal checks.

Added

MCP server: 7 new tools (get_query_details, detect_regressions, get_performance_trends, aggregate_results, get_query_plan, export_results, export_summary) and 2 prompts.
GitHub Actions PyPI publishing with trusted publishers.
Release automation: --push, --auto-continue, CI validation integration, bidirectional sync.
Platform adoption tiers (recommended, supported, experimental, preview) replacing boolean recommended field.

Changed

Minimum Python version explicitly documented as 3.10.
MCP server refactored to use public API instead of CLI internals.
Benchmark metadata centralized into single registry.
Per-platform TPC-DS query template duplicates removed (530 files, ~4 MB saved from wheel).
MANIFEST.in expanded to include TPC patches, EULAs, and compilation infrastructure for sdist users who build from source.

Full Changelog: https://github.com/joeharris76/BenchBox/blob/main/CHANGELOG.md#011---2026-01-24

Assets 2

19 Jan 16:25

joeharris76

v0.1.0

8f1720e

v0.1.0

Alpha Software: BenchBox is alpha software. APIs may change without notice, features may be incomplete, and production use is not recommended.

BenchBox v0.1.0 is the initial public release of the database benchmarking framework — making it simple to run industry-standard benchmarks on analytical databases, from embedded engines like DuckDB to cloud data warehouses like Snowflake and Databricks.

Benchmarks (18 total)

TPC Standards: TPC-H (22 queries), TPC-DS (99 queries), TPC-DI
Academic: SSB, AMPLab, JoinOrder (IMDB dataset)
Industry: ClickBench, H2ODB, NYC Taxi, TSBS DevOps, CoffeeShop
Data Modeling: TPC-H Data Vault
BenchBox Primitives: Read Primitives, Write Primitives, Transaction Primitives
Experimental: TPC-DS-OBT, TPC-Havoc, TPC-H Skew

SQL Platforms (16 total)

Embedded: DuckDB, SQLite, DataFusion
Cloud Data Warehouses: Snowflake, Databricks, BigQuery, Redshift, Azure Synapse
Analytical Databases: ClickHouse, Trino, Presto, Firebolt, InfluxDB
General Purpose: PostgreSQL, Spark, Athena

DataFrame Platforms (8 total)

Polars, DataFusion, DuckDB, PySpark, Pandas, Modin, Dask, cuDF (GPU)

Core Features

Self-contained data generation (no external tools required)
Automatic SQL dialect translation between platforms
CLI with dry-run support, progress bars, and rich output
Programmatic Python API for integration
Result export in JSON, CSV, and HTML formats

Full Changelog: https://github.com/joeharris76/BenchBox/blob/main/CHANGELOG.md#010---2026-01-10-initial-release

Assets 2

Releases: joeharris76/BenchBox

BenchBox 0.2.1

Uh oh!

BenchBox 0.2.0

Uh oh!

BenchBox 0.1.5

Added

Fixed

Changed

Uh oh!

BenchBox 0.1.4

Added

Fixed

Changed

Uh oh!

BenchBox 0.1.3

Added

Fixed

Changed

Uh oh!

BenchBox 0.1.2

Added

Fixed

Changed

Uh oh!

v0.1.1

Fixed

Added

Changed

Uh oh!

v0.1.0

Benchmarks (18 total)

SQL Platforms (16 total)

DataFrame Platforms (8 total)

Core Features

Uh oh!