A robust, production-ready pipeline for classifying human exercises like HammerCurl, DeadLift, LegExtension, and ChestFlyMachine from video input using pose landmarks, CNNs, and LSTM-based models.
- 🎥 Upload video and get back predicted action and confidence.
- 🧍♂️ Pose estimation powered by MediaPipe.
- 🧠 Deep Learning models using ConvLSTM, LSTM, and CNNs.
- 🔌 Built as a Flask API for easy integration.
- 📊 Inference via pose, image, or both (configurable).
- 🗂️ Modular structure for future expansion (e.g. camera, live webcam, etc.).
Pose_Classification/
├── api/ # Core Flask API
│ ├── api.py # Flask server and endpoints
│ ├── Poseclassifier.py # Pose-based action classification logic
│ ├── detect_pose.py # MediaPipe wrapper for pose extraction
│ ├── results/ # Stores temporary video uploads and outputs
│ └── README.md # Detailed API usage
│
├── models/ # Pretrained Keras/TensorFlow models
│ ├── final/ # Combined (pose + image) models
│ ├── img/ # CNN models for raw frame-based classification
│ ├── pose/ # LSTM models for pose-only classification
│ └── README.md # Explanation of model formats and limitations
│
├── requirements.txt # Python dependencies
└── README.md # (You're here!)
git clone https://github.com/<your-username>/Pose_Classification.git
cd Pose_Classificationconda create --name pose-env python=3.10 -y
conda activate pose-envpip install -r requirements.txtOptional: For MediaPipe on some Linux distros, if mediapipe errors out:
pip install mediapipe --no-cache-dircd api
python api.pyThe server will start at:
http://127.0.0.1:5000
Method: POST
Content-Type: multipart/form-data
| Key | Type | Required | Description |
|---|---|---|---|
video |
file | ✅ | Input video file (.mp4, .avi) |
orientation |
string | ❌ | One of PORTRAIT, LANDSCAPE-LEFT, etc. |
curl -X POST -F "video=@test.mp4" -F "orientation=LANDSCAPE-LEFT" http://localhost:5000/api/infer{
"action": "DeadLift",
"confidence": [0.02, 0.93, 0.03, 0.02]
}Refer to the api/README.md for a detailed explanation of inference flow, frame processing, and pose detection.
- Models are trained using sequences of pose landmarks or video frames.
- Based on LSTM, BiLSTM, ConvLSTM2D, and CNN architectures.
- Cannot be converted to ONNX or TFLite due to dynamic time-step layers.
Refer to the models/README.md for full model breakdown.
label_map = ['HammerCurl', 'DeadLift', 'LegExtension', 'ChestFlyMachine']Add more by retraining models (training pipeline not included here).
- 📸 The API shows live OpenCV video frames while processing. Press
qto exit early. - 🧼 Videos are auto-deleted or renamed based on prediction for minimal disk usage.
- 🧠 Switch between
mode='pose','action', and'both'inPose_Classifier.
Main packages (full list in requirements.txt):
tensorflowmediapipeflaskopencv-pythonnumpy
- Add training pipeline and scripts
- Add support for live webcam input
- Extend to more activities and datasets
- Convert models to ONNX-friendly formats with simplified architectures
This project is provided under an open license. Modify and use freely, but attribution is appreciated.
Feel free to fork and submit pull requests. Create issues for bugs or enhancement requests.
Made with ❤️ by M. Hassan Ibrar