AI-Generated Text Detection using Machine Learning & Production-Ready MLOps
AITextGuard is an end-to-end NLP system designed to detect AI-generated text using a stacked machine learning approach.
The project demonstrates how to design, train, evaluate, containerize, monitor, and deploy a production-style ML system using modern MLOps practices.
The entire system can be launched locally with a single command using Docker Compose.
-
Hugging Face Space:
https://huggingface.co/spaces/Sahil4818/ai-text-guard -
GitHub Repository:
https://github.com/Sip4818/AICheatTextGuard
With the increasing use of large language models, distinguishing between human-written and AI-generated text has become critical for:
- Academic integrity
- Content moderation
- Plagiarism detection
- Information reliability
AITextGuard explores a machine learning–based approach combining statistical features and transformer embeddings to classify AI-generated text.
The system runs as a multi-service Docker setup:
User → Streamlit UI → FastAPI Backend → Redis (Cache)
↓
Prometheus (Monitoring)
- Data ingestion from Google Cloud Storage (GCS)
- Schema validation
- Train–test split
- Feature engineering:
- Statistical text features
- Transformer-based sentence embeddings
- Model training:
- Level-1: Logistic Regression, XGBoost
- Level-2: Meta Logistic Regression (Stacking)
- Hyperparameter tuning (Optuna)
- Experiment tracking (MLflow)
- Model evaluation (ROC-AUC)
- Approved model stored in cloud storage
Training is reproducible using DVC.
- Text submitted via Streamlit UI
- FastAPI backend processes request
- Feature transformation applied
- Stacked model predicts probability
- Result returned via REST API
- Prediction cached in Redis (10-minute TTL)
- Prometheus collects performance metrics
- End-to-end ML pipeline
- Stacked ensemble learning
- Transformer-based embeddings
- Reproducible training with DVC
- Experiment tracking with MLflow
- FastAPI inference API
- Redis caching
- Prometheus monitoring
- Fully containerized architecture
- CI/CD with GitHub Actions
- Automated Docker image builds with model injection
Programming
- Python
Machine Learning & NLP
- Scikit-learn
- XGBoost
- Sentence Transformers
Data & MLOps
- Pandas
- NumPy
- Optuna
- MLflow
- DVC
Backend & API
- FastAPI
Monitoring
- Prometheus
Caching
- Redis
Deployment & DevOps
- Docker
- Docker Compose
- GitHub Actions
- Docker Hub
Cloud
- Google Cloud Storage (Model artifacts)
- Hugging Face Spaces (Live demo)
Features
- Text statistics
- Punctuation distribution
- Sentence embeddings
Architecture
- Level-1: Logistic Regression + XGBoost
- Level-2: Meta Logistic Regression
Evaluation Metric
- ROC-AUC
On every push to main:
- Authenticate with Google Cloud
- Download trained model from GCS
- Build backend image (model included)
- Build UI image
- Push images to Docker Hub
- Docker
- Docker Compose
git clone https://github.com/Sip4818/AICheatTextGuard.git
cd AICheatTextGuard
docker compose up