Standards for all experiments under experiments/.
Each experiment is a self-contained unit with its own:
pyproject.tomlanduv.lockfor dependencies.python-versionfor pinned Python version.venv/virtual environment (gitignored).dvc/configuration for data tracking
Do not share dependencies between experiments. This ensures each is reproducible independently.
- Use uv exclusively for dependency management
- Pin Python version in
.python-version(e.g.,3.11) - Pin dependency versions in
pyproject.toml(e.g.,torch>=2.2,<2.3) - Use
uv runto execute scripts within the project environment - Run
uv sync(ormake install) after cloning or updating deps
- Use ruff for both linting and formatting
- Configure ruff in
pyproject.toml(the template provides a default config) - Run
make lintandmake formatbefore committing - Recommended ruff rules:
["E", "F", "I", "W", "UP", "B", "SIM"]
Notebooks are for exploration and visualization only. Move any reusable logic into src/ modules and import it from the notebook.
- No committed outputs: Always clear cell outputs before committing. The
nbstripoutgit filter enforces this automatically —make installsets it up. - Linting:
make lintandmake formatcover both.pyfiles and notebooks (vianbqa ruff). - Keep notebooks focused: One analysis per notebook. Split large explorations into separate files.
- Naming: Use descriptive lowercase-kebab-case with an optional numeric prefix (e.g.,
01-eda-smoke-crops.ipynb,02-model-comparison.ipynb).
- Use DVC for all data and model artifacts
- Never commit large files (datasets, weights, images) directly to git
- Organize data using Kedro-style layers in
data/(see README.md) - Track data with
dvc add data/and commit the.dvcfiles to git - Configure a DVC remote in
.dvc/configusing the convention:s3://pyro-vision-rd/dvc/experiments/<experiment-name>/
All experiments that use the Pyronear sequential dataset must import it from pyro-dataset via dvc import with a pinned version tag. This ensures every experiment tracks exactly which dataset version it uses and where it came from.
# Train/val splits
uv run dvc import https://github.com/pyronear/pyro-dataset \
data/processed/sequential_train_val/train \
-o data/01_raw/datasets/train --rev v2.2.0
uv run dvc import https://github.com/pyronear/pyro-dataset \
data/processed/sequential_train_val/val \
-o data/01_raw/datasets/val --rev v2.2.0
# Test split (for leaderboard/evaluation)
uv run dvc import https://github.com/pyronear/pyro-dataset \
data/processed/sequential_test \
-o data/01_raw/datasets/test --rev v2.2.0- Version tag: Always use
--rev <tag>(e.g.v2.2.0), never a branch name - Output path:
data/01_raw/datasets/{train,val,test} - Frozen imports: DVC imports are frozen by default — they won't re-check the remote on
dvc repro - Collaborators: After cloning, just run
uv run dvc pullto fetch data from the experiment's S3 remote - Commit
.dvcfiles: The resulting.dvcfiles (e.g.data/01_raw/datasets/train.dvc) must be committed to git
If your experiment needs to preprocess the imported data (e.g. truncate sequences to N frames), import to data/01_raw/datasets_full/ and add a pipeline stage that writes to data/01_raw/datasets/:
# Import full dataset
uv run dvc import https://github.com/pyronear/pyro-dataset \
data/processed/sequential_train_val/train \
-o data/01_raw/datasets_full/train --rev v2.2.0Then add a truncate stage in dvc.yaml (see template for example). This keeps downstream stages unchanged since they already depend on data/01_raw/datasets/.
Define your ML pipeline as stages in dvc.yaml. The template provides a commented-out scaffold — uncomment and adapt the stages you need.
A typical pipeline has four stages: prepare → split → train → evaluate. Each stage declares its command, dependencies, parameters, and outputs so DVC can track lineage and skip unchanged steps.
- Commands: Always use
uv run python scripts/<stage>.pyto run within the project environment - CLI arguments: Thread all inputs, outputs, and parameters directly to scripts via flags (e.g.,
--input-dir,--seed ${train.seed}). Scripts should not read paths or config files themselves — receive everything from the command line. - Parameters: Store hyperparameters in
params.yamlat the project root. Reference them indvc.yamlwith${group.key}interpolation and list them under theparams:field so DVC tracks them. - Data paths: Follow the Kedro-style layers in
data/(raw → intermediate → model_input → models → reporting) - Metrics: Declare metrics files with
cache: falseso they are always readable in the working tree (e.g.,data/08_reporting/metrics.json) - Plots: Use the
plots:field for visualization outputs (e.g.,data/08_reporting/plots/)
uv run dvc repro # run the full pipeline (skips up-to-date stages)
uv run dvc repro train # run up to and including the train stage
uv run dvc params diff # compare parameter changes
uv run dvc metrics show # display current metrics
uv run dvc plots show # render plotsuv run dvc exp run -S train.learning_rate=0.001 # run with modified param
uv run dvc exp show # compare experiment results
uv run dvc exp apply <exp-name> # apply best experimentEvery experiment must be reproducible from a clean checkout. Requirements:
- Fixed random seeds: Set and document seeds for all sources of randomness (Python, NumPy, PyTorch/TF)
- Explicit configs: Use YAML config files in
configs/— no magic numbers in code - Pinned deps: Always commit
uv.lock - Logged hyperparameters: Record all hyperparameters alongside metrics
- Versioned data: Use DVC hashes to track exact data versions used
Use DVC experiments or MLflow to track runs. For each experiment, record:
- Model architecture and key design choices
- Hyperparameters (learning rate, batch size, augmentations, etc.)
- Data version (DVC hash of the training data)
- Metrics (see Benchmarking below)
- Hardware used (GPU model, training time)
Use standardized metrics relevant to the Pyronear use case:
| Metric | Description |
|---|---|
| Recall @ FPR | Detection recall at various false positive rates |
| Time-to-detection | Seconds from smoke onset to first alert |
| Inference latency | Milliseconds per frame (specify hardware) |
| Model size | Parameter count and FLOPs |
Compare against the current production baseline (YOLOv8 small) when applicable.
- Experiment directories: lowercase-kebab-case (e.g.,
temporal-smoke-classifier,background-subtraction-baseline) - Python packages/modules: snake_case
- Config files: descriptive names (e.g.,
train_resnet50_lr1e3.yaml)
Each experiment README must include:
- Objective — What problem this project addresses
- Approach — Method and architecture choices
- Data — What data is used and how to obtain it
- Results — Key metrics and comparison to baselines
- How to Reproduce — Step-by-step instructions from clone to results