This document provides step-by-step instructions to install and set up PyDuck for both users and developers.
PyDuck is a high-performance Python library for machine learning data preprocessing built on DuckDB and designed with a Pandas‑like API.
Ensure you have Python 3.7 or above installed.
Check your version:
python3 --versionIf Python is not installed, download it from: https://www.python.org/downloads/
Also verify that pip is available:
python3 -m ensurepip --upgrade
To install the latest PyPI version:
pip install pyduck
git clone https://github.com/your-username/pyduck.git
cd pyduck
pip install -r requirements.txt
Dependencies include:
- duckdb
- pandas
- numpy
- pytest (for testing)
From the outer pyduck/ directory:
pip install -e .
To verify everything is working,
python3 -m pytest testing/
Go to eval folder,
cd eval
And run benchmarking sequences that are defined in each Tester.py file through:
python3 test_all.py