Skip to content

ravindersirohi/ML-Ops-Model-Training

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI/ML Operations - Claims Fraud Detection

A production-grade machine learning operations (MLOps) platform for automating the training, deployment, and inference of a claims fraud detection model. This project demonstrates best practices for ML model lifecycle management on Azure.

📋 Project Overview

This solution builds an end-to-end MLOps pipeline that:

  • Trains a machine learning model for detecting fraudulent insurance claims
  • Evaluates model performance on held-out test data
  • Deploys the trained model as a scalable REST API
  • Automates the entire workflow using CI/CD pipelines
  • Manages infrastructure as code using Terraform

The model uses logistic regression to classify claims as fraudulent or legitimate based on claim features such as insured age, coverage details, reserve amounts, and risk scores.

🏗️ Architecture

Data (Claims) → Train Pipeline → Model → API Container → Azure Container Apps
                                              ↓
                                         ACR (Registry)
                                         ↓
                                   Azure ML Workspace
                                         ↓
                                   Compute Cluster

Key Components:

  • Azure ML Workspace: Manages experiments, compute clusters, and datastores
  • Azure Container Registry (ACR): Stores containerized inference API images
  • Azure Container Apps: Hosts the scoring API for real-time predictions
  • Key Vault: Securely manages credentials and secrets
  • Storage Account: Stores training data and model artifacts
  • Log Analytics: Provides monitoring and diagnostics
  • Azure Pipelines: Orchestrates CI/CD workflows

📁 Project Structure

ai-ml-ops/
├── api-inference/          # FastAPI scoring service
│   ├── conda.yaml         # API dependencies
│   ├── Dockerfile         # Container image for deployment
│   └── score-api.py       # FastAPI application
├── data/                   # Training datasets
│   └── Claims-2026.csv    # Claims dataset with fraud labels
├── devOps/                 # CI/CD configuration
│   ├── azure-pipelines.yml # Main build/deployment pipeline
│   ├── infra-pipeline.yml  # Infrastructure deployment
│   └── steps/              # Pipeline step templates
├── env/                    # Environment specifications
│   ├── conda-prep.yml     # Data preparation environment
│   └── conda-train.yml    # Model training environment
├── infra/                  # Infrastructure as Code (Terraform)
│   ├── machine_learning_workspace.tf
│   ├── container_app.tf
│   ├── acr.tf
│   ├── key_vault.tf
│   ├── storage_account.tf
│   └── variables.tf
├── ml-pipelines/           # Azure ML job definitions
│   └── claims-training.yml # Training pipeline job
└── src/                    # Python source code
    ├── train_claims.py    # Model training script
    ├── split_claims.py    # Train/test data splitting
    └── score.py           # Batch scoring utility

🚀 Getting Started

Prerequisites

  • Azure Subscription with appropriate permissions
  • Azure CLI (install)
  • Terraform v1.6 or later (install)
  • Python 3.8+ with pip
  • Docker (for local API testing)
  • Azure ML CLI extension (az extension add -n ml)

Local Setup

  1. Clone the repository:

    git clone <repository-url>
    cd ai-ml-ops
  2. Install Python dependencies:

    # For training environment
    conda env create -f env/conda-train.yml
    conda activate claims-train
    
    # For data preparation
    conda env create -f env/conda-prep.yml
    conda activate claims-prep
    
    # For API
    conda env create -f api-inference/conda.yaml
  3. Azure Authentication:

    az login
    az account set --subscription <subscription-id>

Running the Training Pipeline Locally

# Prepare data
python src/split_claims.py \
  --input_data data/Claims-2026.csv \
  --train_output ./data/train \
  --test_output ./data/test

# Train the model
python src/train_claims.py \
  --train_data ./data/train \
  --test_data ./data/test \
  --model_dir ./model

Running the Scoring API Locally

# Build Docker image
docker build -t claims-model:latest -f api-inference/Dockerfile .

# Run container
docker run -p 5001:5001 claims-model:latest

# Test the API
curl -X POST http://localhost:5001/claim/predict-fraud \
  -H "Content-Type: application/json" \
  -d '{
    "reported_lag_days": 45,
    "insured_age": 35,
    "sum_insured": 50000,
    "deductible_amount": 1000,
    "coverage_limit": 100000,
    "initial_reserve_amount": 5000,
    "current_reserve_amount": 3000,
    "paid_amount": 2000,
    "expense_paid_amount": 500,
    "recovery_amount": 0,
    "outstanding_amount": 1000,
    "fraud_score": 0.3,
    "geo_risk_score": 0.2
  }'

🏗️ Infrastructure Deployment

The infrastructure is defined in Terraform and manages all Azure resources.

Deploying Infrastructure

cd infra
terraform init -backend-config="resource_group=$RESOURCE_GROUP" \
               -backend-config="storage_account_name=$STORAGE_ACCOUNT" \
               -backend-config="container_name=$CONTAINER_NAME"

# Select environment
terraform plan -var-file terraform-vars/dev.terraform.tfvars -out=tfplan

# Apply configuration
terraform apply tfplan

Environments

  • dev.terraform.tfvars - Development environment (CPU-based compute)
  • tst.terraform.tfvars - Test environment
  • prod.terraform.tfvars - Production environment

🔄 CI/CD Pipelines

Azure Pipelines Workflow

The CI/CD is triggered automatically on code changes to the main branch affecting ML code or infrastructure.

Pipeline Stages:

  1. Train - Submits training job to Azure ML
  2. Wait - Monitors job completion
  3. Build - Builds containerized API image
  4. Push - Pushes image to ACR
  5. Deploy - Updates Container Apps with new image

Manual Pipeline Execution

# Trigger training pipeline
az pipelines run --name azure-pipelines --branch main

# Check pipeline status
az pipelines build list

📊 API Endpoints

POST /claim/predict-fraud

Predicts fraud probability for a claims.

Request:

{
  "reported_lag_days": 45,
  "insured_age": 35,
  "sum_insured": 50000,
  ...
}

Response:

{
  "prediction": 0,
  "probability": 0.35
}

GET /health

Health check endpoint.

Response:

{
  "status": "ok",
  "model_loaded": true
}

🧪 Testing & Validation

Unit Tests for Scoring API

The project includes comprehensive unit tests for the FastAPI scoring service covering 40+ test scenarios.

Test Files:

  • api-inference/test_score_api.py - Main unit test suite (40+ tests across 6 test classes)
  • api-inference/conftest.py - Pytest fixtures and configuration
  • api-inference/pytest.ini - Pytest configuration with markers and coverage settings

Running API Tests Locally:

# Install test dependencies
pip install pytest httpx

# Navigate to API directory
cd api-inference

# Run all tests
pytest test_score_api.py -v

# Run with coverage report
pytest test_score_api.py --cov=. --cov-report=html

# Run specific test class
pytest test_score_api.py::TestHealthEndpoint -v

# Run with detailed output
pytest test_score_api.py -vv --tb=long

Test Coverage:

  • Health Endpoint - API responsiveness and model loading status
  • Prediction Endpoint - Fraud prediction accuracy and response format
  • Input Validation - Required fields, data types, boundary values
  • Edge Cases - Zero values, extreme values, float precision
  • API Structure - Endpoints exist, correct HTTP methods, proper error codes
  • Response Format - JSON content type, required fields, data types

Test Classes:

  • TestHealthEndpoint - 3 tests for health check validation
  • TestPredictionEndpoint - 6 tests for fraud prediction validation
  • TestInputValidation - 7 tests for input schema and field validation
  • TestEdgeCases - 3 tests for boundary conditions and extreme values
  • TestAPIEndpoints - 6 tests for endpoint availability and HTTP methods
  • TestContentType - 2 tests for response content type validation

Infrastructure Validation

cd infra/tests
# Run infrastructure validation tests
./validate.tests.ps1

Model Testing

The training pipeline includes automatic train/test split and model evaluation using:

  • ROC-AUC Score - Model discrimination ability
  • Accuracy Score - Overall classification accuracy

📝 Key Technologies

  • ML Framework: scikit-learn (Logistic Regression)
  • API: FastAPI
  • Cloud Platform: Microsoft Azure
  • Infrastructure: Terraform
  • Container Registry: Azure Container Registry (ACR)
  • ML Platform: Azure Machine Learning
  • CI/CD: Azure Pipelines
  • Monitoring: Azure Log Analytics & Application Insights
  • IaC Providers: azurerm, azuread, azapi

🔐 Security

  • Credentials stored in Azure Key Vault
  • Container images stored in private ACR
  • Role-based access control (RBAC) for Azure resources
  • Secrets management through environment variables

📚 Documentation

🤝 Contributing

  1. Create a feature branch from main
  2. Make changes to code or infrastructure
  3. Test locally before pushing
  4. Submit pull request for review
  5. Pipeline will automatically validate and deploy after merge

📄 License

This project is licensed under the MIT License - see LICENSE file for details.

About

Implemented a claims‑model training workflow in Azure Machine Learning, deployed the inference container to Azure Container Apps with Entra ID–based RBAC, and integrated the secured endpoint into a RAG pipeline for agentic automation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors