Skip to content

vivadata/diamonds

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Diamonds: from notebook to package

This repository is a teaching project where you refactor a Jupyter notebook into a reusable Python package (diamonds).

Prerequisites

  • Python: installed and managed with pyenv
  • Virtual environments: managed with pyenv-virtualenv
  • direnv: installed and enabled in your shell

1. Clone the repository

  1. Fork this repository and clone it into the project directory. Then Create a new branch for your productionizing-ml project.

    gh repo fork vivadata/diamonds
    git clone git@github.com:<your-username>/diamonds.git
    cd diamonds
    # Create A new branch and switch to it
    git checkout -b <your-username>-productioninizing-ml

2. Create and activate a virtual environment (pyenv-virtualenv)

  1. Create a new virtual environment for this project.

    pyenv virtualenv 3.11 diamonds
  2. Tell this directory to use that virtual environment.

    pyenv local diamonds
  3. Check that Python now points to the virtualenv.

    which python
    python --version
  4. Install the package in develop mode. NB : The -e flag is used to install the package in develop mode.

    pip install -e .

3. Configure direnv

  1. Allow direnv in this directory (only once).

    direnv allow
  2. Create an .envrc file at the root of the project so the virtualenv is activated automatically when you cd into the directory.

    echo 'dotenv' > .envrc
    direnv allow
  3. Leave and re-enter the project directory and confirm that the virtualenv is automatically activated.

    cd ..
    cd Pengouins-demo
    which python

4. Next steps

  • Explore the notebook: open notebooks/Exploration.ipynb.
  • Identify responsibilities:
    • data loading and cleaning,
    • feature engineering,
    • model training and evaluation,
    • prediction.
  • Refactor into the package:
    • move data-related code into src/diamonds/data.py,
    • move model-related code into src/diamonds/model.py,
    • centralize constants/paths in src/diamonds/params.py,
    • implement model saving/loading in src/diamonds/registry.py.
  • Train the model:
    • Udpate new script src/diamonds/train.py to train the model and save it in the models directory.
    • run python -m src.diamonds.train to train the model and save it in the models directory.

About

Projet Fil rouge Mlops

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors