In Chapter 3, we structured our video frame data into fixed-length sequences. Now, we need to tell the model what each sequence represents — e.g., 'hello' or 'thanks'. But models don’t understand text — they need numbers.
With X (input sequences) and y (one-hot labels) ready, we need to split the data into training and testing sets to avoid overfitting.
We must convert action names like 'hello' into:
This process involves two main steps:
-
Label Mapping: Assigning a unique number to each action name (e.g., 'hello' -> 0, 'thanks' -> 1, 'iloveyou' -> 2).
-
Label Encoding: Converting these single numerical labels into a format that's standard and efficient for training classification models, specifically One-Hot Encoding..
Create a mapping from action names to numbers:
import numpy as np
# Actions to detect - defined in previous chapters
actions = np.array(['hello', 'thanks', 'iloveyou'])
label_map = {label: num for num, label in enumerate(actions)}✅ Output:
{'hello': 0, 'thanks': 1, 'iloveyou': 2}Now, whenever we encounter the action name 'hello', we know its numerical label is 0, 'thanks' is 1, and 'iloveyou' is 2.
Add these labels to your dataset while loading sequences:
labels.append(label_map[action])After loading all the data, the labels list will contain a sequence of these numerical labels, one for each collected sequence of keypoints. For instance, if you collected 30 sequences for 'hello', 30 for 'thanks', and 30 for 'iloveyou', your labels list might look like [0, 0, ..., 0, 1, 1, ..., 1, 2, 2, ..., 2].
We then convert this list into a NumPy array:
# After the loading loop...
y = np.array(labels)
print(y.shape)(90,)
This y array has a shape of (90,), meaning it's a 1D array containing 90 single numbers (0, 1, or 2), one for each of the 90 sequences we loaded.
Convert numeric labels to one-hot vectors for training:
from tensorflow.keras.utils import to_categorical
y = to_categorical(labels).astype(int)
print(y.shape) # (90, 3)- Label 0 ('hello') becomes the vector
[1, 0, 0] - Label 1 ('thanks') becomes the vector
[0, 1, 0] - Label 2 ('iloveyou') becomes the vector
[0, 0, 1]
The length of the one-hot vector is equal to the total number of classes (3 in our case).
| Action | Label | One-Hot |
|---|---|---|
'hello' |
0 | [1, 0, 0] |
'thanks' |
1 | [0, 1, 0] |
'iloveyou' |
2 | [0, 0, 1] |
During inference, the model outputs something like:
res = [0.05, 0.90, 0.05]
predicted_label = np.argmax(res) # 1
predicted_action = actions[predicted_label] # 'thanks'For example, if the model outputs [0.05, 0.90, 0.05]:
- The highest probability (0.90) is at index 1.
- The numerical label is 1.
- Looking at our
actionsarray (['hello', 'thanks', 'iloveyou']), the element at index1is'thanks'. So, the prediction is'thanks'.
Here's how the label mapping and encoding fit into our overall data preparation pipeline, along with the prediction decoding step:
sequenceDiagram
participant Action Names;
participant label_map (Dictionary);
participant Numerical Labels (y_raw);
participant to_categorical (Function);
participant One-Hot Encoded Labels (y_encoded);
participant ML Model;
participant Model Output (Probabilities);
participant np.argmax (Function);
participant Predicted Numerical Label;
participant actions (Array);
participant Predicted Action Name;
Action Names->>label_map: Map 'hello'->0 etc.
loop For each sequence
Action Names->>Numerical Labels (y_raw): Look up numerical label
end
Numerical Labels (y_raw)->>to_categorical (Function): Convert to one-hot
to_categorical (Function)-->>One-Hot Encoded Labels (y_encoded): [1,0,0] etc.
Note over One-Hot Encoded Labels (y_encoded): This is the target output (y) for training.
ML Model->>Model Output (Probabilities): Predict on new sequence (X)
Model Output (Probabilities)->>np.argmax (Function): Find index of max prob
np.argmax (Function)-->>Predicted Numerical Label: Get numerical label (0,1,2)
Predicted Numerical Label->>actions (Array): Look up action name by index
actions (Array)-->>Predicted Action Name: Return 'hello' etc.
Note over Predicted Action Name: This is displayed to the user.
-
Training set = data the model learns from
-
Testing set = unseen data to evaluate performance
Ensures the model generalizes well to new gestures
Use train_test_split from scikit-learn:
from sklearn.model_selection import train_test_split
# Assuming X and y (one-hot encoded labels) are already loaded
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.05, random_state=123)Let's break this down:
- test_size=0.05: 5% of data for testing
- random_state: ensures reproducibility`
# After the train_test_split call...
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)((85, 30, 1662), (85, 3), (5, 30, 1662), (5, 3))
This output shows:
X_train: Contains 85 sequences, each 30 frames long, with 1662 keypoints per frame.y_train: Contains 85 corresponding one-hot encoded labels, each of size 3.X_test: Contains 5 sequences (5% of 90, rounded), with the same sequence and keypoint dimensions.y_test: Contains 5 corresponding one-hot encoded labels.
| Set | X Shape | y Shape | Purpose |
|---|---|---|---|
| Train | (85, 30, 1662) | (85, 3) | Model training |
| Test | (5, 30, 1662) | (5, 3) | Model evaluation |
Now we have our data neatly divided into the sets needed for training and evaluating our model.
Here's a simple diagram showing what happens during the data splitting process:
sequenceDiagram
participant Combined Data;
participant train_test_split Function;
participant Shuffling;
participant Splitting;
participant Training Sets;
participant Testing Sets;
Combined Data->>train_test_split Function: Pass X and y
train_test_split Function->>Shuffling: Randomly mix sequences & labels
Shuffling->>Splitting: Divide based on test_size
Splitting->>Training Sets: X_train, y_train (e.g., 95%)
Splitting->>Testing Sets: X_test, y_test (e.g., 5%)
train_test_split Function-->>You: Return the 4 sets
Note over Combined Data: X (sequences) + y (one-hot labels)
Note over Training Sets: Used to teach the model.
Note over Testing Sets: Used ONLY to check performance on new data.
Based on our example with 90 sequences and a test_size of 0.05:
| Set Type | Purpose | Data (X) Shape | Labels (y) Shape | Number of Sequences |
|---|---|---|---|---|
| Training | Used to train the model | (85, 30, 1662) | (85, 3) | 85 |
| Testing | Used to evaluate the trained model | (5, 30, 1662) | (5, 3) | 5 |
| Total | (90, 30, 1662) | (90, 3) | 90 |
This division is fundamental to building a robust machine learning model that doesn't just memorize, but genuinely learns the underlying patterns of the hand signs.
We've now:
Mapped and encoded our labels ✅
Split our dataset for training and evaluation ✅
Next up, we’ll use this data to train a powerful neural network — the LSTM model!