Quick navigation to find the right example for your needs
This index helps you discover examples organized by your goal, experience level, platform, and benchmark type. Start here to find exactly what you need without digging through directories.
| I want to... | Start here | Time | Difficulty |
|---|---|---|---|
| Run my first benchmark | getting_started/local/duckdb_tpch_power.py |
2 min | ⭐ Easy |
| Preview queries without running | getting_started/cloud/bigquery_tpch_dry_run.py |
1 min | ⭐ Easy |
| Test specific queries only | getting_started/intermediate/duckdb_tpch_query_subset.py |
3 min | ⭐⭐ Medium |
| Benchmark a cloud database | getting_started/cloud/databricks_tpch_power.py |
5 min | ⭐⭐ Medium |
| Use Jupyter notebooks | notebooks/databricks_benchmarking.ipynb |
10 min | ⭐⭐ Medium |
| Compare platforms | See Multi-Platform Comparison | 30 min | ⭐⭐⭐ Advanced |
| Set up CI/CD testing | See CI/CD Examples | 15 min | ⭐⭐⭐ Advanced |
| Create custom benchmark | See Custom Benchmarks | 60 min | ⭐⭐⭐ Advanced |
Goal: Get first successful benchmark run and understand basics
Start here:
-
Hello World (2 min):
getting_started/local/duckdb_tpch_power.py- Zero configuration required
- Generates data and runs TPC-H power test
- Perfect for learning the workflow
-
Try Different Benchmark (3 min):
getting_started/local/duckdb_tpcds_power.py- Same workflow, different queries
- See TPC-DS complexity
-
Minimal Dataset (1 min):
duckdb_coffeeshop.py- Tiny dataset for quick experimentation
- Great for testing integrations
Next Steps: Move to Intermediate once comfortable with local examples
Goal: Use cloud platforms, understand configuration, run targeted tests
Recommended path:
-
Query Subset (5 min):
getting_started/intermediate/duckdb_tpch_query_subset.py- Run specific queries instead of full suite
- Faster iteration for testing
-
Cloud Dry-Run (3 min):
getting_started/cloud/bigquery_tpch_dry_run.py- Preview without spending cloud credits
- Validate configuration
-
Cloud Execution (10 min):
getting_started/cloud/databricks_tpch_power.py- Run on real cloud platform
- Set up authentication
-
Interactive Notebooks (15 min):
notebooks/databricks_benchmarking.ipynb- Explore results interactively
- Visualize performance
Next Steps: Try tuning configurations, explore advanced workflows
Goal: Production workflows, CI/CD, multi-platform comparison, custom benchmarks
Recommended path:
-
Unified Runner (varies):
unified_runner.py- Multi-platform, multi-benchmark runner
- Full configuration support
- Production-ready patterns
-
Tuning Configurations (varies):
tunings/- Compare tuned vs baseline performance
- Platform-specific optimizations
-
Advanced Patterns: See PATTERNS.md
- Multi-platform comparison
- Result analysis and comparison
- CI/CD integration
- Custom benchmark development
Goal: Understand performance characteristics before adoption
Path:
- Quick Test (5 min):
getting_started/local/duckdb_tpch_power.pywith--scale 0.01 - Realistic Scale (30 min): Same script with
--scale 1.0 - Multiple Benchmarks (60 min): Try TPC-H, TPC-DS, ClickBench
- Platform Comparison: Run same benchmark on multiple platforms using
unified_runner.py
Resources:
- Platform comparison:
notebooks/ - Tuning configurations:
tunings/ - Configuration examples:
config/
Goal: Catch performance regressions in CI/CD
Path:
- Baseline Run: Save current performance to JSON
- Test Run: Run benchmark on new code
- Comparison: Use result comparison utilities
- CI Integration: See CI/CD examples (coming soon)
Key Features:
--dry-runflag for fast validation--scale 0.01for quick CI runs- JSON export for result storage
- Comparison utilities
Goal: Validate performance before migrating platforms
Path:
- Current Platform: Run benchmarks on source database
- Target Platform Dry-Run: Preview on destination
- Target Platform Test: Run on destination (small scale)
- Full Migration Test: Run at production scale
- Compare Results: Analyze differences
Resources:
- Multi-platform notebooks:
notebooks/ - Platform configs:
config/ - Tuning examples:
tunings/
Goal: Find best configuration for your workload
Path:
- Baseline: Run with no tuning (
tunings/{platform}/*_notuning.yaml) - Tuned: Run with optimizations (
tunings/{platform}/*_tuned.yaml) - Compare: Analyze performance differences
- Customize: Create custom tuning configuration
- Iterate: Test variations
Resources:
- Tuning guide:
tunings/README.md - Example configurations:
tunings/duckdb/,tunings/databricks/
Goal: Fast validation during development
Path:
- Use
getting_started/intermediate/duckdb_tpch_query_subset.py - Run 2-3 representative queries
- Verify results quickly
Tips:
- Use
--queries 1,6for fastest tests - Use
--scale 0.01for minimal data - Use
--dry-runto skip execution
Goal: Understand TPC benchmarks and methodology
Path:
- TPC-H:
getting_started/local/duckdb_tpch_power.py- Start with power test (22 sequential queries)
- Learn about scale factors
- TPC-DS:
getting_started/local/duckdb_tpcds_power.py- More complex queries
- Larger schema
- Documentation: See docs/BENCHMARKS.md
Examples:
- Basic:
getting_started/local/duckdb_tpch_power.py - TPC-DS:
getting_started/local/duckdb_tpcds_power.py - Query subset:
getting_started/intermediate/duckdb_tpch_query_subset.py - CoffeeShop:
duckdb_coffeeshop.py
Tuning Configs: tunings/duckdb/ (24 configurations)
Prerequisites: None (DuckDB installed automatically)
Examples: Via unified_runner.py with --platform sqlite
Prerequisites: None (SQLite built into Python)
Examples:
- Getting started:
getting_started/cloud/databricks_tpch_power.py - Notebook:
notebooks/databricks_benchmarking.ipynb - Configuration:
config/databricks.yaml
Tuning Configs: tunings/databricks/ (10 configurations)
Prerequisites:
DATABRICKS_TOKENenvironment variableDATABRICKS_HOSTenvironment variabledatabricks-sql-connectorpackage
Examples:
- Dry-run:
getting_started/cloud/bigquery_tpch_dry_run.py - Notebook:
notebooks/bigquery_benchmarking.ipynb - Configuration:
config/bigquery.yaml
Prerequisites:
GOOGLE_APPLICATION_CREDENTIALSorBIGQUERY_PROJECTgoogle-cloud-bigquerypackage
Examples:
- Notebook:
notebooks/snowflake_benchmarking.ipynb - Configurations:
config/snowflake.yaml,config/snowflake_keypair.yaml
Prerequisites:
- Snowflake credentials (username/password or key pair)
snowflake-connector-pythonpackage
Examples:
- Notebook:
notebooks/redshift_benchmarking.ipynb - Configurations:
config/redshift.yaml,config/redshift_with_keys.yaml
Prerequisites:
- AWS credentials
redshift-connectorpackage
Examples:
- Notebook:
notebooks/clickhouse_benchmarking.ipynb - Configurations:
config/clickhouse.yaml,config/clickhouse_cloud.yaml
Prerequisites:
- ClickHouse server access
clickhouse-driverpackage
The following platforms are supported via unified_runner.py and platform adapters:
| Platform | Usage | Prerequisites |
|---|---|---|
| PostgreSQL | --platform postgresql |
PostgreSQL server, psycopg2 |
| Trino | --platform trino |
Trino cluster, trino package |
| Presto | --platform presto |
Presto cluster, presto-python-client |
| Apache Spark | --platform spark |
Spark cluster, pyspark |
| AWS Athena | --platform athena |
AWS credentials, pyathena |
| Firebolt | --platform firebolt |
Firebolt account, firebolt-sdk |
| Azure Synapse | --platform azure_synapse |
Azure credentials |
| DataFusion | --platform datafusion |
datafusion package |
| Polars | --platform polars |
polars package |
| CUDF (GPU) | --platform cudf |
NVIDIA GPU, cudf package |
See Platform Documentation for detailed setup guides.
Best for: General analytical query performance, decision support
Examples:
- Beginner:
getting_started/local/duckdb_tpch_power.py- Power test (22 queries) - Intermediate:
getting_started/intermediate/duckdb_tpch_query_subset.py- Select specific queries - Cloud:
getting_started/cloud/databricks_tpch_power.py- Databricks execution - Advanced:
unified_runner.pywith--benchmark tpch --phases power,throughput
Scale Factors: 0.01 (tiny), 0.1 (small), 1.0 (standard), 10+ (large)
Tuning: tunings/duckdb/tpch_tuned.yaml, tunings/databricks/tpch_tuned.yaml
Best for: Complex analytical workloads, decision support, BI reporting
Examples:
- Beginner:
getting_started/local/duckdb_tpcds_power.py- Power test - Advanced:
unified_runner.pywith--benchmark tpcds
Scale Factors: 0.01 (tiny), 0.1 (small), 1.0 (standard), 10+ (large)
Tuning: tunings/duckdb/tpcds_tuned.yaml, tunings/databricks/tpcds_tuned.yaml
Best for: ETL performance, data warehouse loading, data transformation
Examples:
- Advanced:
unified_runner.pywith--benchmark tpcdi
Note: TPC-DI is more complex and requires specific setup. See documentation for details.
Best for: Star schema queries, data warehouse analytics, BI workloads
Examples:
- Advanced:
unified_runner.pywith--benchmark ssb
Tuning: tunings/duckdb/ssb_tuned.yaml, tunings/databricks/ssb_tuned.yaml
Best for: Real-time analytics, web analytics, time-series data
Examples:
- Advanced:
unified_runner.pywith--benchmark clickbench
Tuning: tunings/duckdb/clickbench_tuned.yaml
Best for: Real-world data patterns, transportation analytics, geospatial queries
Examples:
- Beginner:
getting_started/local/duckdb_nyctaxi.py- Real taxi data analytics - Advanced:
unified_runner.pywith--benchmark nyctaxi
Scale Factors: 0.01 (tiny), 0.1 (small), 1.0 (30M rows), 10+ (large)
Best for: Infrastructure monitoring, time-series databases, DevOps workloads
Examples:
- Beginner:
getting_started/local/duckdb_tsbs_devops.py- DevOps metrics - Advanced:
unified_runner.pywith--benchmark tsbs_devops
Scale Factors: 0.01 (10 hosts), 0.1 (10 hosts), 1.0 (100 hosts)
Best for: Enterprise data warehouse patterns, Hub-Link-Satellite modeling
Examples:
- Advanced:
unified_runner.pywith--benchmark datavault
Note: TPC-H Data Vault uses Data Vault 2.0 modeling derived from TPC-H schema.
- TPC-DS-OBT (One Big Table): Via
unified_runner.pywith--benchmark tpcds_obt- Denormalized analytics - TPC-H Skew: Via
unified_runner.pywith--benchmark tpch_skew- Non-uniform data distributions - TPC-H Havoc: Via
unified_runner.pywith--benchmark tpchavoc- Query optimizer stress testing - AMPLab Big Data Benchmark: Via
unified_runner.pywith--benchmark amplab - H2O.ai Database Benchmark: Via
unified_runner.pywith--benchmark h2odb - Join Order Benchmark: Via
unified_runner.pywith--benchmark joinorder - Read Primitives: Via
unified_runner.pywith--benchmark read_primitives - Write Primitives: Via
unified_runner.pywith--benchmark write_primitives - Transaction Primitives: Via
unified_runner.pywith--benchmark transaction_primitives - CoffeeShop (Minimal):
duckdb_coffeeshop.py
Examples that support --dry-run:
- All
getting_started/examples unified_runner.py- Cloud examples
Usage: python example.py --dry-run ./preview
Outputs:
- Configuration summary (JSON/YAML)
- Generated SQL queries
- Schema DDL
- Estimated resource usage
Directory: tunings/
Available platforms:
- DuckDB: 24 configurations (tuned vs notuning for each benchmark)
- Databricks: 10 configurations (tuned vs notuning)
Usage: Via unified_runner.py with --tuning tuned|notuning
See: tunings/README.md for detailed guide
Example: getting_started/intermediate/duckdb_tpch_query_subset.py
Usage: --queries 1,6,12 to run specific queries only
Use cases:
- Quick smoke tests
- Debugging specific queries
- Performance profiling
Supported formats:
- JSON (default)
- CSV
- HTML
Usage: Via unified_runner.py with --formats json,csv,html
Directory: notebooks/
Available platforms:
- Databricks
- BigQuery
- Snowflake
- Redshift
- ClickHouse
See: notebooks/README.md for details
- Getting Started: ../docs/usage/getting-started.md
- Examples Guide: ../docs/usage/examples.md
- CLI Reference: ../docs/usage/cli-quick-start.md
- Benchmarks: ../docs/BENCHMARKS.md
- Platform Selection: ../docs/platforms/platform-selection-guide.md
- Configuration: ../docs/usage/configuration.md
- Main README: README.md - Overview and quick navigation
- Common Patterns: PATTERNS.md - Reusable workflow patterns
- Getting Started: getting_started/README.md - Beginner examples
- Notebooks Guide: notebooks/README.md - Interactive examples
- Tuning Guide: tunings/README.md - Performance optimization
- Dry-Run Samples: dry_run_samples/README.md - Example outputs
- Troubleshooting: ../docs/usage/troubleshooting.md
- GitHub Issues: github.com/joeharris76/benchbox/issues
- < 5 minutes: Local examples with
--scale 0.01, dry-run mode - 5-15 minutes: Query subset, cloud dry-run, small scale benchmarks
- 15-60 minutes: Full power test at scale 0.1, multiple benchmarks
- 1+ hours: Large scale tests, multi-platform comparison, throughput tests
- ⭐ Easy:
getting_started/local/- Zero config, local execution - ⭐⭐ Medium:
getting_started/cloud/,notebooks/- Cloud platforms, configuration - ⭐⭐⭐ Advanced:
unified_runner.py, custom workflows - Full features, production patterns
- Learning: Start with local examples, progress to cloud
- Testing: Use query subset, dry-run mode, small scales
- Benchmarking: Use realistic scales, tuning configs, multiple benchmarks
- Production: Use
unified_runner.py, CI/CD integration, result export
Can't find what you need? Check PATTERNS.md for common workflow patterns or README.md for additional navigation options.