Skip to content

Releases: joeharris76/BenchBox

BenchBox 0.2.1

27 Apr 07:27

Choose a tag to compare

BenchBox 0.2.0

02 Apr 01:16

Choose a tag to compare

BenchBox 0.1.5

11 Mar 13:06

Choose a tag to compare

Added

  • Textcharts standalone library - Extracted all 15 ASCII chart types and base rendering
    primitives into an independent textcharts package under packages/textcharts/. The library
    has its own pyproject.toml, README with chart gallery and API reference, and zero BenchBox
    dependencies. Clean standalone names (BarChart, Histogram, Heatmap, etc.) are exported
    alongside BenchBox-compatible aliases. BenchBox now depends on textcharts as a path dependency
    with compatibility shims preserving existing import paths.
  • Open table format loading - Added runtime loading support for external table formats
    (Delta Lake, Iceberg, Hudi) through adapter-level load_table implementations for Spark
    mixin platforms, cloud SQL platforms, and Snowflake/ClickHouse adapters. Format support is
    gated on adapter configuration so it is only available on platforms that implement it.
  • Expanded format capability registry - Registered format capabilities for Hudi,
    Presto/Trino, Snowflake, ClickHouse, Redshift, BigQuery, and Spark-based platforms including
    cloud lakehouse variants (EMR, Dataproc, Glue, Fabric Spark, Synapse Spark, Dataproc
    Serverless). Removed registrations for platforms without actual loading code (including
    LakeSail for delta/iceberg/hudi).
  • Mutation testing - Added mutmut mutation testing targeting 5 critical modules
    (duckdb.py, adapter.py, runner.py, chart_generator.py, run.py) with a
    make mutation-test Makefile target for manual quality reviews.

Fixed

  • Format capability registry accuracy - Normalized platform display names to match registry
    keys and removed platforms from format registrations where adapter code has no actual loading
    implementation.
  • CLI recursive import - Fixed a circular lazy-import in the benchmarks module that caused a
    RecursionError on CLI startup.
  • CoffeeShop SA2 query - Corrected group_by column name from 'name' to
    'product_name'.
  • Textcharts API migration - Migrated to textcharts v0.1.2 API after breaking changes,
    renamed ASCII*-prefixed classes across 10 source and test files, removed 3 unused deprecated
    factory imports, and regenerated golden snapshots for neutralized defaults.
  • pytest-xdist worker title patch - Tightened the xdist worker title monkeypatch to prevent
    test pollution across parallel workers.
  • Comprehensive Windows CI compatibility - Fixed 80+ Windows test failures spanning path
    separators (.as_posix() for forward-slash comparison), file encoding (encoding="utf-8"
    for write_text()), numpy int32 overflow on 64-bit multiplication, Rich Console width on
    headless CI, Python ABI tag format differences (.pyd vs .so), Path.touch() vs
    time.time() mocking, NTFS directory st_size returning 0, Windows CWD locks preventing
    temp directory cleanup, and shutil.copytree replacing symlinks for TPC-DS template setup.
  • TPC-DS dsqgen Windows option prefix - Fixed dsqgen invocation on Windows where the binary
    expects / option prefixes instead of - (OPTION_START in r_params.c), and switched to
    relative paths to stay under dsqgen's 80-char PARAM_MAX_LEN buffer.
  • Missing tpcds.idx distribution file - Added the required TPC-DS distribution index file
    to Windows binary packages (both x86_64 and ARM64).
  • Throughput test timer resolution - Used time.perf_counter() for throughput duration
    calculation to avoid zero-duration results from low-resolution time.time() on Windows.

Changed

  • Test suite quality overhaul - Deleted 13 hollow coverage-theater test files and replaced
    them with behavior-verifying tests for DuckDB, SQLite, and DataFusion adapters. Replaced mock
    credential tests with real file-based tests. Strengthened 150 is-not-None assertions across
    5 test files, replaced hollow isinstance assertions with behavioral checks, and swapped
    MagicMock for SimpleNamespace on attribute-only objects. Removed per-file coverage
    enforcement in favor of a suite-wide 60% threshold.
  • ~316 rendering tests migrated to textcharts - Pure chart-rendering tests moved from
    BenchBox's test suite to the standalone textcharts library, with shim import smoke tests
    retained in BenchBox to verify re-export paths.
  • Pytest lane restructure - Converted test lanes from implicit timing heuristics to explicit
    source markers with measured-timing-based rebucketing. Restored a lightweight fast lane,
    serialized stress tests, re-laned cloud adapter tests to slow+cloud_import, and documented
    pytest-xdist safety requirements.
  • Chart subtitle simplified - Migrated chart subtitle storage from a metadata dict to a
    plain string, removing an unnecessary layer of indirection.
  • Verbose logging extracted - Moved verbose logging configuration from run.py into a
    dedicated cli/verbose_logging.py module.
  • Visualization constants - Extracted magic numbers into named constants across
    visualization modules.

Full Changelog: https://github.com/joeharris76/BenchBox/blob/main/CHANGELOG.md#015---2026-03-10

BenchBox 0.1.4

03 Mar 04:02

Choose a tag to compare

Added

  • power_bar chart type - Added a horizontal bar chart for TPC Power@Size comparisons.
    Higher values are treated as better (opposite of performance_bar), powered by
    summary.tpc_metrics.power_at_size and exposed in NormalizedResult.
  • power_bar template coverage - Added to flagship, head_to_head, trends,
    regression_triage, and executive_summary. The chart renders only when TPC metric data is
    present and is skipped for non-TPC runs.
  • Driver-version-aware chart labeling - Multi-platform chart series labels and run summaries
    now include driver version context so version comparisons stay explicit in rendered output.
  • Runtime ABI validation for isolated drivers - Added ABI compatibility checks to isolated
    runtime discovery so driver auto-install paths fail fast with actionable validation errors
    instead of late runtime crashes.
  • Presorted data-generation modes for table formats - Added parquet-sorted output mode,
    plus delta-sorted and iceberg-sorted organization paths with clustering primitives
    (z-order, Hilbert, partition-aware sorting) and cluster-by tuning integration.

Fixed

  • Query plan capture correctness and persistence - Fixed multiple plan-capture defects:
    forwarding capture_plans through RunConfig, DuckDB JSON plan parsing edge cases,
    preservation of query_plan through normalization, and show-plan / compare-plans
    loading through the standard result-file path.
  • SSB dot-notation query IDs - --queries now accepts IDs like Q2.1, and plan-oriented
    CLI flows preserve dotted IDs instead of normalizing them away.
  • Result timing pipeline accuracy - Fixed datagen/load timing propagation end-to-end,
    including per-table load timings in table_statistics, corrected load-phase duration keying,
    datagen phase duration and manifest stats in metadata, and explicit total duration override
    propagation in result builders. Data-only runs now correctly execute generation, and
    force_regenerate is forwarded through CLI and runner paths.
  • ASCII visualization readability under skewed data - Fixed outlier handling across chart
    types (bar, histogram, stacked, scatter, line, CDF, percentile ladder, heatmap), addressed
    zero-heavy fallback truncation edge cases, improved natural query sorting and color cycling,
    and raised effective render width cap from 120 to 400 characters.
  • --quiet output contract for automation - Quiet mode now emits only the bare result
    filepath to stdout, removing decorative output that broke script parsing.
  • Runtime environment stability - Fixed interpreter targeting for driver auto-install,
    corrected auto_install_used state propagation, and resolved SIGSEGV-class failures when
    driver_auto_install=true reused an already-matching version.
  • Additional correctness fixes - Restored ai_primitives registry resolution fallback,
    corrected SQLite force_recreate option handling, fixed SSB customer row-count expectation in
    SSBRowCountStrategy, and resolved visualize command crashes / multi-series rendering issues.

Changed

  • Plan-capture default now uses actual execution timing - --capture-plans now defaults to
    EXPLAIN (ANALYZE, FORMAT JSON) behavior via analyze_plans=True, recording measured timing
    in captured plans. Users can opt out with analyze_plans: false for estimate-only capture.
  • Benchmark runtime/result internals harmonized - Refactored enhanced result construction to
    use a shared factory path and aligned canonical runtime behavior for benchmarks like NYC Taxi
    and TSBS DevOps.
  • make test-all resource policy and parallelism - Resource-heavy tests are now serialized
    to prevent machine stalls, while slow/performance suites are moved to a dedicated stress lane.
    The test suite also replaces fixed sleeps with bounded polling, reduces fixture/harness
    duplication, and shifts selected CLI/e2e coverage to in-process runners for faster execution.
  • CI quality gates tightened - Added required table-format integration coverage and promoted
    doc checks (linkcheck, example validation, docstring coverage) plus security audit policy
    controls to blocking CI behavior.

Full Changelog: https://github.com/joeharris76/BenchBox/blob/main/CHANGELOG.md#014---2026-03-03

BenchBox 0.1.3

24 Feb 00:28

Choose a tag to compare

Added

  • Driver version pinning--platform-option driver_version=X.Y.Z pins any platform's Python driver; pair with driver_auto_install=true to have BenchBox install it automatically via uv. Active driver version is shown in the run announcement line.
  • Bulk multi-shard table loading — new load_table_bulk() interface on FileFormatHandler ingests multi-shard tables in a single native call. DuckDB (CSV, Parquet) and ClickHouse Native are the first implementations; TPC-DS sharded runs are measurably faster.
  • Greyscale / no-color ASCII chart fallbacks — all seven ASCII chart types now use fill-pattern and glyph differentiation when color is unavailable (CI logs, NO_COLOR, piped output).
  • Five new ASCII chart types — percentile ladder, stacked bar, sparkline table, CDF, rank table, and normalized speedup (log₂-scaled). All registered in the chart registry and accessible via CLI and MCP.
  • Post-run summary charts — charts are automatically generated and displayed in the terminal after every benchmark run and included in MCP run_benchmark responses.
  • Three new chart template bundleslatency_deep_dive, regression_triage, and executive_summary.
  • fabric-dw as a preferred CLI alias for the fabric_dw platform.

Fixed

  • Driver auto-install version switching — stale sys.modules and metadata caching could return the wrong driver version after driver_auto_install swapped versions; module cache is now invalidated on switch.
  • DataFrame cache path mismatch — DataFrame and SQL modes now share a flat directory layout, eliminating redundant data generation when switching modes on the same scale factor.
  • ClickHouse zstd double-decompressionClickHouseNativeHandler was applying manual decompression on top of the driver's built-in decompression, corrupting data for compressed bulk loads.
  • Platform display names — corrected for Amazon Athena, Google Cloud Dataproc, Microsoft Azure platforms, and Databricks SQL.
  • CLI warning when a platform option's default value is not in the declared choices list.
  • Ranking normalization crash when all metric values are negative finite numbers.
  • PySpark SIGINT handler hanging pytest-xdist workers.
  • --validation-mode CLI prompt crash when spec.default is not a string.

Changed

  • Four platform drivers moved to optional extras — DuckDB (benchbox[duckdb]), Polars (benchbox[polars]), ClickHouse Connect (benchbox[clickhouse-connect]), and psycopg2 (benchbox[postgresql]) are no longer hard dependencies. Use pip install benchbox[all] to restore the previous behaviour.
  • All user-facing terminal output in the run pipeline flows through emit(), making --quiet suppression and output capture in tests consistent.
  • BaseQueryCatalogMixin and TranslatableQueryMixin extracted from duplicate query-catalog implementations.

Full Changelog: https://github.com/joeharris76/BenchBox/blob/main/CHANGELOG.md#0130---2026-02-23

BenchBox 0.1.2

24 Feb 00:31

Choose a tag to compare

Added

  • DataFrame mode for all benchmarks — complete DataFrame query implementations across all 18 benchmarks including TPC-DS (102 queries), TPC-H (22 queries), SSB, ClickBench, NYC Taxi, TSBS DevOps, H2ODB, AMPLab, CoffeeShop, TPC-H Skew, and Data Vault. DataFrame platforms: Polars, DuckDB, DataFusion, PySpark, Pandas, Modin, Dask, and cuDF (GPU).
  • ASCII chart visualizations — terminal-native ASCII rendering replacing Plotly HTML charts. Seven chart types: performance bar, distribution box, query heatmap, comparison bar, diverging bar, summary box, and query latency histogram. ANSI colors, Unicode box-drawing, and best/worst highlighting.
  • 14 new SQL platform adapters — PostgreSQL, Trino, PrestoDB, Apache Spark, AWS Athena, Azure Synapse, Microsoft Fabric, Firebolt, MotherDuck, InfluxDB 3.x, TimescaleDB, ClickHouse Cloud, Onehouse Quanton, and managed Spark variants (EMR, Dataproc, Glue, Fabric Spark, Synapse Spark, Dataproc Serverless).
  • Open table format support — Delta Lake, Apache Iceberg, Apache Hudi, DuckLake, and Vortex columnar format with format conversion orchestration and manifest v2 for multi-format tracking.
  • Physical tuning DDL generation — platform-specific DDL generators for DuckDB, Snowflake, Redshift, BigQuery, ClickHouse, Firebolt, PostgreSQL, TimescaleDB, Trino/Presto/Athena, and Spark family with sort keys, partitioning, clustering, and compression.
  • Query plan capture and comparison — plan parsers for DuckDB, PostgreSQL, Redshift, DataFusion, and SQLite. Comparison engine with regression detection, fingerprinting, historical tracking with flapping detection, and CLI visualization.
  • Interactive CLI wizard — guided benchmark configuration with platform selection, tuning wizard, scale factor validation, phase/query selection, onboarding, and persistent preferences.
  • TPC-DI benchmark — complete implementation across 4 phases: core schema, query suite, ETL pipeline, and validation/testing.
  • Cross-platform comparison enginebenchbox compare command with multi-platform analysis, SQL vs DataFrame comparison, and unified visualization.
  • Unified tuning configuration — YAML-based tuning system with per-platform DDL generation, write-time physical layout configuration, and dry-run preview.
  • Cloud storage and deployment modes for S3/GCS/ADLS/DBFS with credential setup wizard and cost estimation.
  • TPC compliance improvements: stream-aware validation, query permutations, warmup/measurement iterations, maintenance operations (RF1/RF2), and --seed for reproducibility.
  • New benchmarks: AI/ML Primitives, Metadata Primitives, Write Primitives, Transaction Primitives.
  • MCP server: suggest_charts and generate_chart tools; --queries and --validation-mode CLI flags; tiered --help.

Fixed

  • TPC-DS data generation reliability — segfaults with fractional scale factors, parallel generation errors, streaming compression, and chunked file handling.
  • Cloud platform stability — credential refresh errors, schema creation ordering, UC Volume uploads, S3 key handling, and BigQuery/Snowflake/Redshift/Databricks adapter issues.
  • Type safety — 150+ type errors resolved across production code with proper annotations and TYPE_CHECKING imports.
  • SQL dialect translation — SQLGlot compatibility for DuckDB, ClickHouse, DataFusion, and Netezza; reserved keyword quoting and identifier case sensitivity.
  • Security hardening: SQL injection prevention, parameterized queries, path traversal protection.
  • CLI hanging in non-interactive mode, progress display precision, --quiet mode propagation.
  • TPC compliance: correct stream permutations, maintenance phase SQL execution, Power@Size calculation parity between SQL and DataFrame modes.

Changed

  • Dropped Plotly HTML charts in favor of ASCII-only rendering.
  • Lazy-load cloud platform adapters to speed up CLI startup and test suite.
  • Optimized TPC-DS smoke tests with selective table generation.

Full Changelog: https://github.com/joeharris76/BenchBox/blob/main/CHANGELOG.md#012---2026-02-09

v0.1.1

07 Feb 14:09

Choose a tag to compare

Fixed

  • Critical: TPC-H/TPC-DS query templates missing from wheel distribution — BenchBox installed from PyPI could not run TPC-H or TPC-DS benchmarks because query template files were stored outside the package tree. Templates are now bundled inside benchbox/_binaries/*/templates/ with a resolution utility that checks the bundled location first and falls back to _sources/ for development installs.
  • dsqgen path buffer overflow — TPC-DS query generation could fail on systems with long temp directory paths (e.g. macOS /var/folders/...) due to dsqgen's internal 80-char path buffer. Fixed by using short symlinks in the temp directory.
  • Python 3.10 compatibility for CLI, version utilities, and tomllib imports.
  • Windows CI test failures and cross-platform compatibility issues.
  • DuckDB version compatibility in tests.
  • MCP server: XSS prevention and strengthened path traversal checks.

Added

  • MCP server: 7 new tools (get_query_details, detect_regressions, get_performance_trends, aggregate_results, get_query_plan, export_results, export_summary) and 2 prompts.
  • GitHub Actions PyPI publishing with trusted publishers.
  • Release automation: --push, --auto-continue, CI validation integration, bidirectional sync.
  • Platform adoption tiers (recommended, supported, experimental, preview) replacing boolean recommended field.

Changed

  • Minimum Python version explicitly documented as 3.10.
  • MCP server refactored to use public API instead of CLI internals.
  • Benchmark metadata centralized into single registry.
  • Per-platform TPC-DS query template duplicates removed (530 files, ~4 MB saved from wheel).
  • MANIFEST.in expanded to include TPC patches, EULAs, and compilation infrastructure for sdist users who build from source.

Full Changelog: https://github.com/joeharris76/BenchBox/blob/main/CHANGELOG.md#011---2026-01-24

v0.1.0

19 Jan 16:25

Choose a tag to compare

Alpha Software: BenchBox is alpha software. APIs may change without notice, features may be incomplete, and production use is not recommended.

BenchBox v0.1.0 is the initial public release of the database benchmarking framework — making it simple to run industry-standard benchmarks on analytical databases, from embedded engines like DuckDB to cloud data warehouses like Snowflake and Databricks.

Benchmarks (18 total)

  • TPC Standards: TPC-H (22 queries), TPC-DS (99 queries), TPC-DI
  • Academic: SSB, AMPLab, JoinOrder (IMDB dataset)
  • Industry: ClickBench, H2ODB, NYC Taxi, TSBS DevOps, CoffeeShop
  • Data Modeling: TPC-H Data Vault
  • BenchBox Primitives: Read Primitives, Write Primitives, Transaction Primitives
  • Experimental: TPC-DS-OBT, TPC-Havoc, TPC-H Skew

SQL Platforms (16 total)

  • Embedded: DuckDB, SQLite, DataFusion
  • Cloud Data Warehouses: Snowflake, Databricks, BigQuery, Redshift, Azure Synapse
  • Analytical Databases: ClickHouse, Trino, Presto, Firebolt, InfluxDB
  • General Purpose: PostgreSQL, Spark, Athena

DataFrame Platforms (8 total)

Polars, DataFusion, DuckDB, PySpark, Pandas, Modin, Dask, cuDF (GPU)

Core Features

  • Self-contained data generation (no external tools required)
  • Automatic SQL dialect translation between platforms
  • CLI with dry-run support, progress bars, and rich output
  • Programmatic Python API for integration
  • Result export in JSON, CSV, and HTML formats

Full Changelog: https://github.com/joeharris76/BenchBox/blob/main/CHANGELOG.md#010---2026-01-10-initial-release