This repository is a teaching project where you refactor a Jupyter notebook into a reusable Python package (diamonds).
- Python: installed and managed with
pyenv - Virtual environments: managed with
pyenv-virtualenv - direnv: installed and enabled in your shell
-
Fork this repository and clone it into the project directory. Then Create a new branch for your productionizing-ml project.
gh repo fork vivadata/diamonds git clone git@github.com:<your-username>/diamonds.git cd diamonds # Create A new branch and switch to it git checkout -b <your-username>-productioninizing-ml
-
Create a new virtual environment for this project.
pyenv virtualenv 3.11 diamonds
-
Tell this directory to use that virtual environment.
pyenv local diamonds -
Check that Python now points to the virtualenv.
which python python --version
-
Install the package in develop mode. NB : The
-eflag is used to install the package in develop mode.pip install -e .
-
Allow direnv in this directory (only once).
direnv allow
-
Create an
.envrcfile at the root of the project so the virtualenv is activated automatically when youcdinto the directory.echo 'dotenv' > .envrc direnv allow
-
Leave and re-enter the project directory and confirm that the virtualenv is automatically activated.
cd .. cd Pengouins-demo which python
- Explore the notebook: open
notebooks/Exploration.ipynb. - Identify responsibilities:
- data loading and cleaning,
- feature engineering,
- model training and evaluation,
- prediction.
- Refactor into the package:
- move data-related code into
src/diamonds/data.py, - move model-related code into
src/diamonds/model.py, - centralize constants/paths in
src/diamonds/params.py, - implement model saving/loading in
src/diamonds/registry.py.
- move data-related code into
- Train the model:
- Udpate new script
src/diamonds/train.pyto train the model and save it in themodelsdirectory. - run
python -m src.diamonds.trainto train the model and save it in themodelsdirectory.
- Udpate new script