Engagement-Prediction

This project aims to predict engagement levels from multimodal behavioral and physiological data, including transcripts, eye-tracking, speech pauses, and brain signals.

It uses a combination of statistical features, time series methods, and machine learning models to train and evaluate engagement classification across various conditions.

Modalities & Training

Modality	Description	Location
Embeddings	Vector representations extracted from transcripts (using utterance embeddings with Gemini model).	`scripts/embeddings/`
Gaze	Eye-tracking features such as fixation duration, saccades, and gaze variability.	`scripts/gaze/`
Pauses	Features based on speech activity timing: production, comprehension, pauses, and turn-taking gaps.	`scripts/pauses/`
ROI	Neural activity features derived from brain region signals (Regions of Interest).	`scripts/roi/`
Combined	Multimodal models combining two or more of the above modalities (e.g., ROI + Gaze).	`scripts/combine/`

Evaluation

Cross-validation is performed using Leave-One-Subject-Out strategy.
Results (accuracy, classification reports, confusion matrices) are logged under results/cross_validation/.
Feature importance for ROI-based models is saved under results/feature_importance/.

Setup

Install dependencies

pip install -r requirements.txt

Prepare Data

Ensure all required data files (e.g., gaze, embeddings, ROI, pause data) are downloaded.

Details in preprocessing refer to scripts in scripts/preprocess/. (Data alignment and feature extraction)

Project Structure

.
├── results     # accuracy, classification reports, confusion matrices, feature importance
│   ├── 25-05-10_minirocket_sample20.txt      # dummy
│   ├── ...
│   ├── cross_validation            # all the results recorded here
│   │   ├── 02-06-25_embeddings&roi&gaze.txt
│   │   ├── ...
│   ├── feature_importance          # Examine features contribute most
│   │   ├── new_roi
│   │   │   ├── roi_feature_importance_sub-[11].csv
│   │   │   ├── ...
│   │   └── roi
│   │       ├── roi_feature_importance_sub-[11].csv
│   │       ├── ...
│   └── model.txt                   # Hyperparameter tuning result
├── data       # Uploaded on OneDrive. Included some raw and preprocessed data
│   ├── mapping
│   │   └── conditions.csv          # Label mapping condition
│   └── raw
│       ├── brain
│       │   ├── aligned
│       │   │   ├── sub-05_roi_data.csv
│       │   │   ├── ...
│       │   ├── new
│       │   │   ├── raw
│       │   │   │   ├── sub-05_roi_data.csv
│       │   │   │   ├── ...
│       │   │   └── train           # use in training, added timestamps (new)
│       │   │       ├── sub-05_roi_data_with_timestamps.csv
│       │   │       ├── ...
│       │   └── train               # use in training, added timestamps (old)
│       │       ├── sub-05_roi_data_with_timestamps.csv
│       │       ├── ...
│       ├── eyetracking
│       │   ├── cleaned
│       │   │   ├── cleaned_sub-01_run-01_eye_tracking.csv
│       │   │   ├── ...
│       │   ├── feature
│       │   │   ├── mutual_gaze     # Aligned with 1.2s per sample
│       │   │   │   ├── sub-05_run-01_tracking_resampled.csv
│       │   │   │   ├── ...
│       │   │   ├── offsets
│       │   │   │   ├── sub-15_run-01_tracking_resampled.csv
│       │   │   │   ├── sub-15_run-02_tracking_resampled.csv
│       │   │   │   └── sub-15_run-03_tracking_resampled.csv
│       │   │   ├── sub-05_run-01_tracking_resampled.csv
│       │   │   ├── ...
│       │   ├── mini_sampled
│       │   │   ├── cleaned_sub-01_run-01_eye_tracking.csv
│       │   │   ├── ...
│       │   ├── sampled
│       │       ├── cleaned_sub-01_run-01_eye_tracking.csv
│       │       ├── ..
│       ├── pause
│       │   └── prod_comp_gaps_pauses_all_included.csv
│       ├── transcripts
│       │   ├── embeddings     # Real in use. Genimi generated embeddings
│       │   │   ├── sub-01_operator_run-01_embeddings.json
│       │   │   ├── ...
│       │   ├── ignore
│       │   │   ├── sub-01_operator_run-01.csv
│       │   │   ├── ...
│       │   ├── operator
│       │   │   ├── sub-05_operator_run-01.csv
│       │   │   ├── ...
│       │   └── participant
│       │       ├── sub-05_participant_run-01.csv
│       │       ├── ...
│       └── video
│           ├── origin        # Origin Video of subjects in use
│           │   ├── sub-05_run-01.mp4
│           │   ├── ...
│           ├── subjects_manually_calibrate     # Applied offsets to see the expected eye points
│           │   ├── gaze_overlay_sub-07_run-03.mp4
│           │   ├── ...
│           └── video_with_eyetracking          # Video visualizing the eye-tracking and head tracking
│               ├── gaze_overlay_sub-05_run-01.mp4
│               ├── ...
├── requirements.txt         # Python library requirements
└── src                      # Source code
    ├── mapping
    │   └── conditions.csv   # Label mapping condition
    └── scripts              # Training and evaluating scripts
        ├── combine          # Multimodal training
        │   ├── train_combine_gaze_embeddings_and_roi.py
        │   ├── train_combine_gaze_with_roi.py
        │   └── train_combine_roi_with_embeddings.py
        ├── embeddings       # Embeddings(transcripts) data training
        │   ├── train_embeddings_only_oper.py
        │   ├── train_embeddings_only_par.py
        │   ├── train_minirocket_embeddings+.py
        │   └── train_minirocket_embeddings.py
        ├── gaze             # Gaze data training
        │   ├── test_gaze_minirocket.py
        │   ├── train_minirocket_gaze.py
        │   └── train_svm.py
        ├── pauses           # Pauses and turn-taking data training
        │   └── train_prod_comp_gaps_pauses.py
        ├── preprocess       # To preprocess multimodal data and some visualization scripts
        │   ├── brain_data_add_timestamp.py
        │   ├── data_align.py
        │   ├── eyetracking_visualization.py
        │   ├── feature_extraction.py
        │   ├── gaze_feature_extraction.py
        │   ├── gaze_feature_extraction_allfile.py
        │   ├── gaze_feature_extractionv2.py
        │   └── sample_data.py
        └── roi             # brain data training
            ├── train_minirocket_roi.py
            ├── train_roi.py
            └── train_roi_new.py

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
feature_importance		feature_importance
results		results
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Engagement-Prediction

Modalities & Training

Evaluation

Setup

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Engagement-Prediction

Modalities & Training

Evaluation

Setup

Project Structure

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages