This file contains guidelines and patterns for maintaining and extending the nflreadpy package.
nflreadpy is a Python port of the R package nflreadr, providing access to NFL data from nflverse repositories. The package uses modern Python conventions with Polars DataFrames, intelligent caching, and comprehensive type hints.
- Python 3.10+ (not 3.9 - it's nearly end of life)
- Polars as the default DataFrame library (not pandas - much faster for large datasets)
- uv for package management and build system (not pip/setuptools)
- Ruff for both linting and formatting (not Black + separate linter)
- Modular design with separate
load_*.pyfiles matching nflreadr's structure - Separated utility modules (
utils_date.py, not monolithicutils.py) - Source layout:
src/nflreadpy/(not flat package structure)
When adding new load functions, follow this pattern:
- File Structure: Create
src/nflreadpy/load_[function_name].py - Import Pattern:
from .utils_date import get_current_season # NOT from .utils
- Season Logic:
- Use
get_current_season()for game data - Use
get_current_season(roster=True)for roster/depth chart data
- Use
- URL Structure: Check the actual nflreadr R files for correct paths
- Format Preference: Default to Parquet, explicitly use CSV only where needed
- Default: Parquet (fastest, most efficient)
- Fallback: CSV (when Parquet unavailable)
- Never: RDS (R-specific format, not readable by Python)
Always validate season ranges with appropriate minimum years:
# Validate seasons
current_season = get_current_season()
for season in seasons:
if not isinstance(season, int) or season < MIN_YEAR or season > current_season:
raise ValueError(f"Season must be between {MIN_YEAR} and {current_season}")- Update
tests/test_integration.pywhen adding new functions - Add import tests for all new load functions
- Update the
expected_exportslist intest_all_exports()
- Import Paths: Use
from .utils_date import get_current_season, notfrom .utils import - Data Formats: Don't default to CSV globally - only use it where Parquet isn't available
- R Dependencies: Don't try to read RDS files - they're R-specific format
Common nflverse data URL patterns:
- Seasonal data:
{repo}/releases/download/{category}/{name}_{season}.{format} - Static data:
{repo}/releases/download/{category}/{name}.{format} - CSV files: Use
format_preference=DataFormat.CSVfor known CSV-only sources
nflreadpy/
├── src/nflreadpy/
│ ├── __init__.py # Main exports
│ ├── config.py # Configuration management
│ ├── cache.py # Caching system
│ ├── downloader.py # HTTP client and data fetching
│ ├── utils_date.py # Date utilities (separated)
│ └── load_*.py # Individual load functions
├── tests/ # Test suite
├── pyproject.toml # uv + modern Python packaging
└── README.md # User documentation
# Install dependencies
uv sync --dev
# Format code
uv run ruff format
# Lint code
uv run ruff check --fix
# Type check
uv run mypy src
# Run tests
uv run pytest
# Serve docs site for local devel
uv run mkdocs serve
# Build docs site
uv run mkdocs build
# Build package
uv build- Research: Check the corresponding R file in nflreadr for URL patterns
- Validate: Ensure minimum season years are correct
- Test: Add to integration tests
- Export: Add to
__init__.pyimports and__all__ - Document: Update README.md if it's a major function
- Polars is much faster than pandas for large NFL datasets
- Use filesystem caching by default (configurable via environment)
- Implement progress bars for large downloads
- Batch multiple seasons efficiently with
pl.concat()
This package aims to provide a modern, fast, and maintainable Python interface to NFL data while preserving API compatibility with the original nflreadr R package.