this document contains the complete step-by-step workflow for setting up and deploying the mlops project from scratch.
- 1. setup project repository
- 2. setup mlflow on dagshub
- 3. run ml experiments
- 4. setup dvc project
- 5. complete ml pipeline
- 6. create dvc pipeline
- 7. create flask app
- 8. create dagshub token
- 9. add tests and scripts
- 10. github actions
- 11. containerization
- 12. setup aws services
- 13. run cicd pipeline
- 14. eks cluster setup
- 15. deploy on eks
- cleanup
# create a github repo, clone it locally
git clone <your-repo-url>
cd <repo-name>
# install uv if you don't have it
pip install uv
# initialize project
uv init
# create virtual environment
uv venv .venv
# activate virtual env
.venv/Scripts/activate # Windows
# source .venv/bin/activate # Linux/Mac
# install cookiecutter (used to create projects from templates)
pip install cookiecutter- rename folder and files to align with the current project structure
- git add, commit, and push changes
git add .
git commit -m "initial project setup"
git push- create a new dagshub repo and connect it to github
- copy mlflow tracking remote url and code snippet
mlflow tracking remote:
https://dagshub.com/aashu-0/MLOps_Learning_Project.mlflow
using mlflow tracking:
import dagshub
dagshub.init(repo_owner='aashu-0', repo_name='MLOps_Learning_Project', mlflow=True)
import mlflow
with mlflow.start_run():
mlflow.log_param('parameter name', 'value')
mlflow.log_metric('metric name', 1)- click on "go to mlflow ui" button and explore mlflow ui
uv add dagshub mlflow- run experiments notebook in the
notebooks/folder - create new experiments and decide on:
- machine learning model (in our case: logistic regression)
- feature engineering approach (tfidf vectorizer)
- hyperparameters to use
dvc init# login to aws console
# create an iam user with permission policies → AdministratorAccess
# create an s3 bucket (name: mshrashu-dvc-storage)uv pip install dvc[s3] awscli
uv add dvc[s3] awscliaws configureprovide:
AWS Access Key ID→<your-access-key>AWS Secret Access Key→<your-secret-key>Default region→<your-aws-region>Default output→json
# add s3 remote
dvc remote add -d s3remote s3://mshrashu-dvc-storage
# verify remote
dvc remote list
# remove remote (if needed)
dvc remote remove <name>
# push data to remote
dvc push# create a local folder
mkdir local_s3
# add local remote
dvc remote add -d mylocal local_s3create the entire ml pipeline under the src/ folder:
- create a
logger/folder with logging configuration
data/data_ingestion.py
- load data from source
- preprocess it
- split into train and test
- save to
./dataset/raw/folder
data/data_preprocessing.py
- extra cleaning and text normalization steps on the ingested data
- save preprocessed data in
./dataset/interim/folder
features/feature_engineering.py
- apply tf-idf to text data
- save processed data in
./dataset/processed/directory
model/model_building.py
- build and save a logistic regression model using training data
model/model_evaluation.py
- evaluate trained model using test data
- log metrics to mlflow
model/register_model.py
- register trained model to the mlflow model registry
# create dvc.yaml file (pipeline definition)
# create params.yaml file (pipeline parameters)# reproduce dvc pipeline by running all stages
dvc repro
# commit changes
git add .
git commit -m "add dvc pipeline"
git push
# push data to dvc remote
dvc pushuv add flask- create a directory
flask_app/ - write html, css and
app.py
why? during containerization we only create image of app, so all other project requirements will only increase the size of our docker image
how?
# install pipreqs
uv pip install pipreqs
# navigate to flask_app directory
cd flask_app
# generate requirements.txt
pipreqs . --force- go to dagshub → user settings
- under "manage personal access tokens"
- generate new token
- save token:
mlops_test(<token_name>):<your-dagshub-token>
- add this token to github secrets with name
DAGSHUB_TOKEN
tests/test_flask_app.py
- unittests for flask application
tests/test_model.py
- tests for loading and validating the ml model from mlflow registry
scripts/promote_model.py
- script to promote a model from @Candidate alias to @Champion alias in the mlflow model registry
- add
.github/workflows/cicd.yaml - configure automated testing, building, and deployment
# open docker desktop# in root directory, run:
docker build -t mlops-project:latest .
# run a container
docker run -p 8888:5000 -e DAGSHUB_TOKEN=<your-dagshub-token> mlops-project:latest# create a repo on docker hub
# tag the image
docker tag mlops-project:latest aashu0/mlops-project:latest
# push to docker hub
docker push aashu0/mlops-project:latest# delete images locally
docker rmi mlops-project:latest
docker rmi aashu0/mlops-project:latest
# pull from docker hub
docker pull aashu0/mlops-project:latest
# run a container from pulled image
docker run -p 8888:5000 -e DAGSHUB_TOKEN=<your-dagshub-token> aashu0/mlops-project:latestadd the following secrets:
AWS_ACCESS_KEY_IDAWS_SECRET_ACCESS_KEYAWS_ACCOUNT_IDAWS_REGIONECR_REPOSITORY
- add
AmazonEC2ContainerRegistryFullAccesspolicy to permissions policies for iam user
# run cicd till stage "push docker image to ecr"
git add .
git commit -m "add cicd pipeline"
git pushverify you have installed:
- aws cli: command line tool to interact with aws services
- kubectl: command line tool for kubernetes
- eksctl: command line utility for amazon eks service
# check versions
aws --version
kubectl version --client
eksctl versionhttps://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html
choco install kubernetes-cli -ychoco install eksctl -yhttps://docs.chocolatey.org/en-us/choco/setup/
aws configureeksctl create cluster `
--name mlops-cluster `
--region eu-north-1 `
--nodegroup-name standard-workers `
--node-type t3.small `
--nodes 1 `
--nodes-min 1 `
--nodes-max 1 `
--managedonce cluster is created, eksctl automatically updates kubectl config file
# verify kubectl config
aws eks --region eu-north-1 update-kubeconfig --name mlops-cluster
# list clusters
aws eks list-clustersaws eks --region eu-north-1 describe-cluster --name mlops-cluster --query "cluster.status"kubectl get nodeskubectl get namespaceskubectl get pods
kubectl get svceksctl delete cluster --name mlops-cluster --region eu-north-1eksctl get cluster --region eu-north-1- add next stages in
cicd.yaml - create
deployment.yamlandservice.yaml
- edit the security group for nodes
- add inbound rule for port 5000 to access the app
kubectl get svc mlops-project-service# browser
http://<external-ip>:5000
# or via terminal
curl http://<external-ip>:5000# delete deployment
kubectl delete deployment mlops-project-deployment
# delete service
kubectl delete service mlops-project-service
# delete env variable
kubectl delete secret dagshub-secret# delete cluster
eksctl delete cluster --name mlops-cluster --region eu-north-1
# verify cluster deletion
eksctl get cluster --region eu-north-1- delete artifacts from ecr
- delete artifacts from s3
- validate if cloudformation stacks are deleted
- always ensure docker desktop is running before building images
- keep aws credentials secure and never commit them to git
- regularly backup your dvc remote storage
- monitor aws costs, especially for eks clusters
- delete resources when not in use to avoid unnecessary charges
end of workflow