This module is a Flask-based video action classification API that uses pose detection and a pretrained model to predict exercises like DeadLift, HammerCurl, etc., from video input.
Pose_Classification/api/
├── api.py # Main Flask API server
├── Poseclassifier.py # Pose-based action classification logic
├── detect_pose.py # MediaPipe-based pose landmark extractor
├── results/ # Output directory for saved/renamed videos
The API:
- Receives a video.
- Optionally rotates the video based on device orientation.
- Extracts pose landmarks from frames using MediaPipe.
- Feeds the landmarks to a deep learning model for classification.
- Returns the predicted action class and confidence scores.
pip install flask numpy opencv-python mediapipecd Pose_Classification/api
python api.pyYou’ll see Flask start at:
👉 http://127.0.0.1:5000
Classifies action from uploaded video.
-
video: The MP4 video file. -
orientation(optional): One ofPORTRAITLANDSCAPE-LEFT(default/no rotation)LANDSCAPE-RIGHTPORTRAIT-UPSIDEDOWN
curl -X POST -F "video=@your_video.mp4" -F "orientation=LANDSCAPE-LEFT" http://127.0.0.1:5000/api/infer{
"action": "HammerCurl",
"confidence": [0.95, 0.03, 0.01, 0.01]
}-
Handles video upload & orientation rotation.
-
Uses OpenCV to read & display video.
-
Calls:
detect_pose.mediapipe_detection()→ to get pose landmarks.pose_model.predict()→ to classify action from pose sequence.
-
Returns predicted label and scores as JSON.
-
Saves processed video to
results/.
Defines the class Pose_Classifier which handles all model-related logic:
-
__init__(self, label_map, mode='both')Initializes the classifier with a label list and mode:'pose': Only pose-based classification'action': Only image-based (frame) classification'both': Combines both
-
predict(self, img_sequence=None, pose_sequence=None)Takes a sequence of 30 pose vectors and returns:predicted_probs: class probabilitiespredicted_label: class with highest probability
-
__get_class_labels(self, probs)Converts model logits into a label. -
__draw_action_list(self, image, predicted_action)Overlays predicted action + menu on video frame. -
concat_frames_horizontally(self, img, predicted_action)Merges frames side-by-side with label overlay (for UI/visualization).
🔁 This file wraps a trained model and makes predictions given processed sequences.
Contains the DetectPose class, responsible for detecting human keypoints.
-
mediapipe_detection(image, draw=True)Uses MediaPipe to detect human pose landmarks in a given frame. Returns:img: Frame with drawn posepose: A list/vector of keypoints for use in classificationvisibility: Keypoint visibility score (optional)
🧠 This is the feature extractor that converts raw video frames to pose sequences.
After classification:
-
The predicted action is returned in the response.
-
The input video is saved to:
results/{PredictedAction}_{RandomID}.mp4
-
A valid prediction requires ~30 frames with a detectable pose.
-
The current label map supports:
['HammerCurl', 'DeadLift', 'LegExtension', 'ChestFlyMachine']
-
If pose detection fails, the API responds with:
{ "action": "", "confidence": [], "error": "No pose detected in the video" }
- Keep input videos short (~2–3 seconds).
- Good lighting and body visibility improve accuracy.
- You can expand the
label_mapand model to include more actions.