DuckPond is a file machine.
DuckPond is a timeseries database.
DuckPond is a site generator.
DuckPond is a telemetry system.
DuckPond is local-first.
DuckPond is transactional.
DuckPond is replicated.
DuckPond is small.
DuckPond is a file system tool to help organize process files and file-based processes. DuckPond lets you place your CSV, Parquet, and JSON files into application-specific paths, then pattern-match and query them using pre-built file factories or ad-hoc SQL statements.
DuckPond is built in Rust, using Apache DataFusion embedded query engine and DeltaLake transaction system. DuckPond reads and writes local storage, cloud storage, git repositories, REST APIs, and more.
DuckPond's abstract file system has first-class support for tabular, time-series, and multi-version file data. File and directory factory instances can be registered to create dynamic, derivative file content. DuckPond includes built-in factories for combining, joining, and reducing timeseries.
DuckPond is useful for data collection, analysis, and monitoring in small industrial settings.
DuckPond is built by the Caspar Water System for small water systems everywhere.
The author is Joshua MacDonald, an open-source lead and software engineer at Microsoft, working in telemetry systems. Joshua is a member of the OpenTelemetry technical committee and co-founder of the OpenTelemetry-Arrow project, which is bringing a high-performance telemetry pipeline in Rust to OpenTelemetry.
Caspar Water uses DuckPond for telemetry, monitoring, its public portal, and more.
The Noyo Center for Marine Sciences in Fort Bragg, CA was DuckPond's first user, where DuckPond gathers water quality data for its public portal.
See docs/cli-reference.md for the complete command reference. Common commands:
pond init # Create a new pond
pond list '/**' # List all entries
pond cat /path/to/file # Read a file
pond cat --sql "SELECT * FROM source WHERE ..." /path # Query a table
pond copy host:///local/file /pond/path # Import a file
pond copy host+series:///data.parquet /pond/series # Import time-series
pond mkdir /dir # Create a directory
pond mknod <factory> /path --config-path config.yaml # Install a factory
pond run /path/to/factory <command> # Execute a factory
pond log # Transaction historyHost mode (no pond required):
pond cat host+csv:///tmp/data.csv --format=table # Query a local CSV
pond run host+remote:///config.yaml list-ponds # Browse S3 backups
pond run host+sitegen:///site.yaml build ./dist # Generate a siteTests live in testsuite/tests/ as numbered shell scripts. Each test
runs in a fresh Docker container with the pond binary:
make test-image # Build the test Docker image
make integration # Run all tests (skips browser tests)
make integration-all # Run all tests including browser
# Run a single test
cd testsuite && ./run-test.sh 201
# Run interactively (explore in container)
cd testsuite && ./run-test.sh --interactiveApache-2.0 — see LICENSES/ for details.
