A production-grade machine learning operations (MLOps) platform for automating the training, deployment, and inference of a claims fraud detection model. This project demonstrates best practices for ML model lifecycle management on Azure.
This solution builds an end-to-end MLOps pipeline that:
- Trains a machine learning model for detecting fraudulent insurance claims
- Evaluates model performance on held-out test data
- Deploys the trained model as a scalable REST API
- Automates the entire workflow using CI/CD pipelines
- Manages infrastructure as code using Terraform
The model uses logistic regression to classify claims as fraudulent or legitimate based on claim features such as insured age, coverage details, reserve amounts, and risk scores.
Data (Claims) → Train Pipeline → Model → API Container → Azure Container Apps
↓
ACR (Registry)
↓
Azure ML Workspace
↓
Compute Cluster
Key Components:
- Azure ML Workspace: Manages experiments, compute clusters, and datastores
- Azure Container Registry (ACR): Stores containerized inference API images
- Azure Container Apps: Hosts the scoring API for real-time predictions
- Key Vault: Securely manages credentials and secrets
- Storage Account: Stores training data and model artifacts
- Log Analytics: Provides monitoring and diagnostics
- Azure Pipelines: Orchestrates CI/CD workflows
ai-ml-ops/
├── api-inference/ # FastAPI scoring service
│ ├── conda.yaml # API dependencies
│ ├── Dockerfile # Container image for deployment
│ └── score-api.py # FastAPI application
├── data/ # Training datasets
│ └── Claims-2026.csv # Claims dataset with fraud labels
├── devOps/ # CI/CD configuration
│ ├── azure-pipelines.yml # Main build/deployment pipeline
│ ├── infra-pipeline.yml # Infrastructure deployment
│ └── steps/ # Pipeline step templates
├── env/ # Environment specifications
│ ├── conda-prep.yml # Data preparation environment
│ └── conda-train.yml # Model training environment
├── infra/ # Infrastructure as Code (Terraform)
│ ├── machine_learning_workspace.tf
│ ├── container_app.tf
│ ├── acr.tf
│ ├── key_vault.tf
│ ├── storage_account.tf
│ └── variables.tf
├── ml-pipelines/ # Azure ML job definitions
│ └── claims-training.yml # Training pipeline job
└── src/ # Python source code
├── train_claims.py # Model training script
├── split_claims.py # Train/test data splitting
└── score.py # Batch scoring utility
- Azure Subscription with appropriate permissions
- Azure CLI (install)
- Terraform v1.6 or later (install)
- Python 3.8+ with
pip - Docker (for local API testing)
- Azure ML CLI extension (
az extension add -n ml)
-
Clone the repository:
git clone <repository-url> cd ai-ml-ops
-
Install Python dependencies:
# For training environment conda env create -f env/conda-train.yml conda activate claims-train # For data preparation conda env create -f env/conda-prep.yml conda activate claims-prep # For API conda env create -f api-inference/conda.yaml
-
Azure Authentication:
az login az account set --subscription <subscription-id>
# Prepare data
python src/split_claims.py \
--input_data data/Claims-2026.csv \
--train_output ./data/train \
--test_output ./data/test
# Train the model
python src/train_claims.py \
--train_data ./data/train \
--test_data ./data/test \
--model_dir ./model# Build Docker image
docker build -t claims-model:latest -f api-inference/Dockerfile .
# Run container
docker run -p 5001:5001 claims-model:latest
# Test the API
curl -X POST http://localhost:5001/claim/predict-fraud \
-H "Content-Type: application/json" \
-d '{
"reported_lag_days": 45,
"insured_age": 35,
"sum_insured": 50000,
"deductible_amount": 1000,
"coverage_limit": 100000,
"initial_reserve_amount": 5000,
"current_reserve_amount": 3000,
"paid_amount": 2000,
"expense_paid_amount": 500,
"recovery_amount": 0,
"outstanding_amount": 1000,
"fraud_score": 0.3,
"geo_risk_score": 0.2
}'The infrastructure is defined in Terraform and manages all Azure resources.
cd infra
terraform init -backend-config="resource_group=$RESOURCE_GROUP" \
-backend-config="storage_account_name=$STORAGE_ACCOUNT" \
-backend-config="container_name=$CONTAINER_NAME"
# Select environment
terraform plan -var-file terraform-vars/dev.terraform.tfvars -out=tfplan
# Apply configuration
terraform apply tfplan- dev.terraform.tfvars - Development environment (CPU-based compute)
- tst.terraform.tfvars - Test environment
- prod.terraform.tfvars - Production environment
The CI/CD is triggered automatically on code changes to the main branch affecting ML code or infrastructure.
Pipeline Stages:
- Train - Submits training job to Azure ML
- Wait - Monitors job completion
- Build - Builds containerized API image
- Push - Pushes image to ACR
- Deploy - Updates Container Apps with new image
# Trigger training pipeline
az pipelines run --name azure-pipelines --branch main
# Check pipeline status
az pipelines build listPredicts fraud probability for a claims.
Request:
{
"reported_lag_days": 45,
"insured_age": 35,
"sum_insured": 50000,
...
}Response:
{
"prediction": 0,
"probability": 0.35
}Health check endpoint.
Response:
{
"status": "ok",
"model_loaded": true
}The project includes comprehensive unit tests for the FastAPI scoring service covering 40+ test scenarios.
Test Files:
api-inference/test_score_api.py- Main unit test suite (40+ tests across 6 test classes)api-inference/conftest.py- Pytest fixtures and configurationapi-inference/pytest.ini- Pytest configuration with markers and coverage settings
Running API Tests Locally:
# Install test dependencies
pip install pytest httpx
# Navigate to API directory
cd api-inference
# Run all tests
pytest test_score_api.py -v
# Run with coverage report
pytest test_score_api.py --cov=. --cov-report=html
# Run specific test class
pytest test_score_api.py::TestHealthEndpoint -v
# Run with detailed output
pytest test_score_api.py -vv --tb=longTest Coverage:
- Health Endpoint - API responsiveness and model loading status
- Prediction Endpoint - Fraud prediction accuracy and response format
- Input Validation - Required fields, data types, boundary values
- Edge Cases - Zero values, extreme values, float precision
- API Structure - Endpoints exist, correct HTTP methods, proper error codes
- Response Format - JSON content type, required fields, data types
Test Classes:
TestHealthEndpoint- 3 tests for health check validationTestPredictionEndpoint- 6 tests for fraud prediction validationTestInputValidation- 7 tests for input schema and field validationTestEdgeCases- 3 tests for boundary conditions and extreme valuesTestAPIEndpoints- 6 tests for endpoint availability and HTTP methodsTestContentType- 2 tests for response content type validation
cd infra/tests
# Run infrastructure validation tests
./validate.tests.ps1The training pipeline includes automatic train/test split and model evaluation using:
- ROC-AUC Score - Model discrimination ability
- Accuracy Score - Overall classification accuracy
- ML Framework: scikit-learn (Logistic Regression)
- API: FastAPI
- Cloud Platform: Microsoft Azure
- Infrastructure: Terraform
- Container Registry: Azure Container Registry (ACR)
- ML Platform: Azure Machine Learning
- CI/CD: Azure Pipelines
- Monitoring: Azure Log Analytics & Application Insights
- IaC Providers: azurerm, azuread, azapi
- Credentials stored in Azure Key Vault
- Container images stored in private ACR
- Role-based access control (RBAC) for Azure resources
- Secrets management through environment variables
- Create a feature branch from
main - Make changes to code or infrastructure
- Test locally before pushing
- Submit pull request for review
- Pipeline will automatically validate and deploy after merge
This project is licensed under the MIT License - see LICENSE file for details.