Skip to content

numericalmachinelearning/dexory-technical-task

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Dexory Technical Task - Warehouse Intelligence System

Applied AI Engineer Technical Assessment
Author: Alessandro Alati
Date: November 2025

πŸ“‹ Project Overview

This project analyzes 10 days of warehouse scan data to extract actionable intelligence about inventory accuracy, error patterns, and operational issues. It includes a complete data pipeline, exploratory analysis, anomaly detection, and a containerized REST API.


🎯 Completed Tasks

βœ… Point 1: Data Engineering Pipeline

  • Ingests 10 days of warehouse scan data (350K+ records)
  • Merges with warehouse layout (33K+ locations)
  • Outputs clean Parquet dataset with spatial features
  • Includes comprehensive unit tests

βœ… Point 2: Exploratory Data Analysis (EDA)

  • WHAT: Daily accuracy trends and error type breakdown
  • WHERE: Spatial hotspots (shelf levels, aisles, height correlations)
  • WHEN: Velocity analysis (fast-moving locations)
  • Generates 8 publication-quality visualizations
  • Statistical validation (chi-square tests, correlations)

βœ… Point 3: Anomaly Detection

  • Composite risk scoring model (error severity + operational impact)
  • Identifies Top 20 most problematic locations
  • Transparent, explainable scoring system
  • Output: Ranked CSV with actionable metrics

βœ… Point 4: Scalable & Containerized API

  • FastAPI application with 4 endpoints
  • Docker containerization with docker-compose
  • Interactive Swagger documentation
  • Health checks and error handling

βœ… Point 5: Error Prediction Model

  • Random Forest classifier (zero-to-one model)
  • Predicts high/low error risk from static features only
  • Works on new warehouses with no scan history
  • 65% accuracy (30% above baseline)

πŸ“ Project Structure

dexory-technical-task/
β”œβ”€β”€ core_scripts/               # Main analysis scripts
β”‚   β”œβ”€β”€ data_pipeline.py        # Data ingestion and cleaning
β”‚   β”œβ”€β”€ eda.py                  # Exploratory data analysis
β”‚   β”œβ”€β”€ anomaly_detection.py    # Top 20 problematic locations
β”‚   β”œβ”€β”€ error_prediction.py     # ML model for error prediction
β”‚   └── test_pipeline.py        # Unit tests
β”‚
β”œβ”€β”€ API/
β”‚   └── warehouse-api/          # FastAPI application
β”‚       β”œβ”€β”€ app/main.py         # API endpoints
β”‚       β”œβ”€β”€ Dockerfile          # Container definition
β”‚       β”œβ”€β”€ docker-compose.yml  # Docker orchestration
β”‚       └── requirements.txt    # API dependencies
β”‚
β”œβ”€β”€ data_models/
β”‚   β”œβ”€β”€ technical-task-data/    # Raw input data (10 days)
β”‚   └── output/                 # Processed data & models
β”‚       β”œβ”€β”€ warehouse_data.parquet
β”‚       β”œβ”€β”€ top_20_problematic.csv
β”‚       β”œβ”€β”€ error_predictor.pkl
β”‚       └── eda_plots/          # Analysis visualizations
β”‚
└── README.md                   # This file

πŸš€ Quick Start

Prerequisites

  • Python 3.11+
  • Docker Desktop (for API)

1. Install Dependencies

pip install -r requirements.txt

2. Run the Complete Pipeline

# Step 1: Process data (Point 1)
cd core_scripts
python data_pipeline.py

# Step 2: Run EDA (Point 2)
python eda.py

# Step 3: Detect anomalies (Point 3)
python anomaly_detection.py

# Step 4: Train prediction model (Point 5)
python error_prediction.py

3. Launch the API (Point 4)

# Navigate to API folder
cd ../API/warehouse-api

# Run with Docker
docker-compose up --build

# Access API at:
# http://localhost:8000/docs

πŸ“Š Key Results

Inventory Accuracy

  • Mean Accuracy: 75.07%
  • Range: 3.09% - 99.30%
  • Most Common Error: Unknown item found (3.83%)

Spatial Insights

  • Ground shelves: 6.24% error rate (highest)
  • High shelves: 1.37% error rate (lowest)
  • Most problematic aisle: AZ 1 (10.64% error rate)
  • Significant correlation: Shelf level affects error rate (p < 0.001)

Velocity Analysis

  • High-velocity locations: ~650 (2% of total)
  • Static locations: 25,234 (75% of total)
  • Finding: High-velocity locations have significantly more errors

Anomaly Detection

  • Top 20 problematic locations identified
  • Scoring factors: Error rate (40%), Operational impact (30%), Error severity (20%), Spatial context (10%)
  • Highest risk score: 0.52 (Location with 18.5% error rate + high velocity)

Prediction Model

  • Algorithm: Random Forest (balanced class weights)
  • Accuracy: 65% (vs 50% baseline)
  • High Error Recall: 73% (catches 73% of problematic locations)
  • Key features: Shelf height, position, aisle location

πŸ”Œ API Endpoints

The API serves analysis results and predictions:

Endpoint Description
GET /health System health check
GET /warehouse/anomalies Top 20 problematic locations
GET /warehouse/stats Daily accuracy trends & error breakdown
GET /location/{name} Detailed location analysis + prediction

Interactive docs: http://localhost:8000/docs

See API/warehouse-api/README.md for detailed API documentation.


πŸ§ͺ Testing

Run unit tests:

cd core_scripts
pytest test_pipeline.py -v

Test coverage includes:

  • Data loading and validation
  • Merge operations (no data loss)
  • Feature extraction
  • Edge case handling

πŸ“ˆ Visualizations

The EDA generates 8 visualizations in data_models/output/eda_plots/:

  1. daily_accuracy.png - Accuracy trends over 10 days
  2. status_breakdown.png - Overall status distribution
  3. substatus_breakdown.png - Top 15 error types
  4. spatial_hotspots.png - Error rates by shelf level and aisle
  5. fast_moving_locations.png - Top 20 highest velocity locations
  6. problematic_locations.png - Top 20 risk scores
  7. error_prediction_model.png - Model performance metrics

πŸ’‘ Key Insights

Operational Recommendations

  1. Ground-Level Shelves Need Attention

    • Despite easy access, ground shelves have highest error rates (6.24%)
    • Hypothesis: Rushing, picking interference, or label damage
    • Action: Investigate workflows for ground-level operations
  2. Aisle AZ 1 Requires Investigation

    • 10.64% error rate (2.7x warehouse average)
    • May indicate: lighting issues, layout problems, or label quality
    • Action: On-site audit of physical conditions
  3. Fast-Moving Locations = Higher Risk

    • Positive correlation between velocity and error rate
    • More handling = more opportunities for errors
    • Action: Implement more frequent audits for high-velocity locations
  4. Predictive Model Enables Proactive Management

    • Can identify high-risk locations before they accumulate errors
    • Works on new warehouses (zero-to-one capability)
    • Action: Deploy for ongoing monitoring and early intervention

πŸ› οΈ Technologies Used

  • Data Processing: pandas, numpy, pyarrow
  • Validation: pydantic
  • Machine Learning: scikit-learn (Random Forest)
  • Visualization: matplotlib, seaborn
  • Statistical Analysis: scipy
  • API: FastAPI, uvicorn
  • Containerization: Docker, docker-compose
  • Testing: pytest

πŸ“ Model Justification

Why Composite Risk Scoring (Point 3)?

Chosen over unsupervised methods (Isolation Forest, DBSCAN) because:

  • βœ… Transparent and explainable to stakeholders
  • βœ… Incorporates domain knowledge (error severity weights)
  • βœ… Tunable based on business priorities
  • βœ… Every component can be validated independently
  • βœ… Produces actionable insights

Formula:

Risk Score = 0.40 Γ— Error_Severity + 
             0.30 Γ— Operational_Impact + 
             0.20 Γ— Error_Type_Severity + 
             0.10 Γ— Spatial_Context

Why Random Forest (Point 5)?

Chosen for zero-to-one prediction because:

  • βœ… Handles mixed feature types (numeric + categorical)
  • βœ… Robust to class imbalance (with class_weight='balanced')
  • βœ… Provides feature importances (interpretability)
  • βœ… No feature scaling required
  • βœ… Proven performance on tabular data

Alternative considered: Logistic Regression (too simple), XGBoost (overkill for this data size)


πŸ“„ Requirements

See requirements.txt for complete dependencies.

Core packages:

pandas>=2.0.0
scikit-learn>=1.3.0
fastapi>=0.104.0
uvicorn>=0.24.0

Built with FastAPI β€’ Docker β€’ Python 3.11 πŸš€

About

Warehouse Intelligence System: Data pipeline, EDA, anomaly detection, and ML prediction API for warehouse scan analysis. Built with Python, FastAPI, and Docker.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors