ML Cohesion EEG Project

This project is dedicated to exploring the relationship between EEG brainwave patterns and subjective cohesion during drumming sessions. The project utilizes machine learning models to uncover insights into how brain activity correlates with perceived social cohesion.

Main Take-Away

Initial models achieved above-chance classification of cohesive vs. non-cohesive pairs from dyadic EEG signals, suggesting some detectable signal in the data. However, instability and signs of overfitting mean the findings should be treated as exploratory rather than conclusive.

Research Basis

The central machine learning problem in this study is to develop predictive models that determine inter- personal cohesion based on Dyadic EEG signals. Given EEG recordings from two individuals engaged in a shared activity (specifically, a four-minute freestyle drumming session) the goal is to predict the pair’s subjective rating of social cohesion, measured on a scale from 1 to 6. This poses a supervised learning problem of classification, where the EEG signal features serve as input, and the self-reported cohesion scores separated by a threshold value act as binary labels (cohesive or not cohesive). The data was collected from Social Neuroscience Lab at Bar Ilan University. Prior research has demonstrated that Dyadic EEG patterns can provide insights into social and team dynamics. For example, Reinero et al. (2021) showed that EEG synchrony between individuals correlates with team performance, while Wang et al. (2024) found that it reflects emotional alignment. Also, Ji et al. (2024) conducted a similarly structured study in which it was found that EEG signalling in dyadic pairs, when fed into a CNN model, was able to differentiate between friends and strangers. Our preliminary analysis of the raw data showed that covariance between dyadic EEG signals was consistently near zero, suggesting that simple linear relationships were insufficient to capture meaningful patterns. This underlined the relevant use of machine learning techniques to model the more complex, nonlinear dynamics we assume underlies social cohesion. By taking a machine learning approach, our project seeks to model these neural synchrony patterns and establish a predictive link between brain activity and perceived social connection. A successful model could have broader applications in fields such as team formation, collaborative work, and even therapeutic interventions, offering an objective, neurobiological basis for assessing social cohesion.

Repository Structure

1) `preprocessing_matlab/`

Contains MATLAB scripts for EEG signal preprocessing. Key steps include:
- Bandpass filtering
- Bandwidth separation
- Timepoint segmentation

2) Pipelines

The code is organized to separate two different pipelines. This pilot_pipline refers to an initial pipeline created in the project. facilitate both data restructuring and model training:

2.1) `pilot_pipeline/`

data_restructuring/: Scripts detailing the restructuring process from preprocessed data to the formats in separated_pairs/ and mixed_pairs/. These are for reference only, as the restructured data is already included. The first pair is further removed here, due to missing data in this pair.
cnn/: Code for training convolutional neural networks (CNNs).
svm/: Code for training support vector machine (SVM) models.
visuals/:
data/:

2.2) `dsen_pipeline/`

data/: The preprocessed EEG data split into 83, 249 and 747 time points, as well as the labels.csv
func/: Functionality scripts which are called upon in the pipeline folder scripts. Mainly cnn_feature_extr_func.py, concatenate_pairs_func.py, model_training_func.py and analysis_func.py.
pipeline/: Scripts to actually run the pipeline. For the main outputs of the models, given the input data: main.py. For the running of the analysis after the model outputs are saved: analysis_main.py.
saved_models/: All models outputted from main.py saved here inn .pkl files
visualisations/: Any saved visualisation to showcase: convolusion matrices and learning curves.

Here is a repository structure visual

dsen_pipeline/
├── data/                    # Raw gamma data and labels
│   ├── raw_gammas_83.csv
│   ├── raw_gammas_249.csv
│   ├── raw_gammas_747.csv
│   └── labels.csv
├── func/                   # Functions
│   ├── analysis_func.py
│   ├── cnn_feature_extr_func.py
│   ├── concatenate_pairs_func.py
│   └── model_training_func.py
├── pipeline/               # Main pipeline script
│   ├── main.py
│   └── analysis_main.py
├── saved_models/           # Pickled models saved
│   ├── Dataset [time_step] Time Step_[model].pkl
├── visualisations/         # Evaluation outputs
│   ├── [Dataset]_[Model]_confusion_matrix.png
│   ├── [Dataset]_[Model]_learning_curve.png
│   └── model_comparison.csv

Here is a theoretical pipeline structure visual

Machine Learning Models

Three models were trained for this project (focusing on the DSEN_Pipeline):

SVM (trained on all EEG signal time separations): Trained on data: 83, 249, 747 time separations.
RF (trained on all EEG signal time separations): Trained on data: 83, 249, 747 time separations.
MLP (trained on all EEG signal time separations): Trained on data: 83, 249, 747 time separations.

The repository is designed for easy usage: no preprocessing or restructuring scripts need to be run before training the models, as all necessary data is already included.

Cohesion Data Overview

Averaged Cohesion Scores:
- Scale: 1 (No Cohesion) to 6 (High Cohesion). Converted to a binary of 1 (Cohesion) and 0 (No Cohesion), with the threshold of 4.7. If (Cohesion_Score_Person_1 + Cohesion_Score_Person_2) / 2 > 4.7, then Cohesion = 1, otherwise 0.
- Method: Based on participant ratings averaged across pairs after drumming sessions.
Raw EEG Data:
- Participants: 98 individuals (49 pairs).
- Session Details: 4-minute drumming sessions.
Preprocessed EEG Data (for DSEN pipeline):
- EEG data for gamma bandwidth only for each participant pairs.
- EEG signal combined together into 83 timepoints, 249 timepoints or 747 timepoints.

Data Cleaning

Participants: 98 individuals, 49 pairs > 88 individuals, 44 pairs (after preprocessing) > 86 individuals, 43 pairs (after data restructuring)
EEG Recording: Collected during 4 minute drumming session

Data Analysis

Model Outputs: Based on the accuracy, precision and F1 outputs, assessing which models and datasets performed the best for further analysis.

Model Performance Table

(for datatset separated into 249 timesteps only, as this produced the best performing models)

Cross Validation Results: For the best performing models, using a t-test with accuracy against baseline performance (0.512), to assess whether the models consistently performed better across 5 fold validation.
MLP: Since findings showed that the MLP model, trained and tested on the 249 granular timestep data, performed the best, we looked at the evluation of this model. It showcased very unsettled learning patterns over training set sizes, and instability.
SVM Support Vector Influence: Since findings showed that the MLP model was unstable and not learning properly, we analysed the second best model, the RF. It showcased very obvious overfitting based on the training dataset learning curve (consitently 100%).Finally, we looked into the inputs which laid most closely to the boundaries and were therefore more heavily weighted in the decision-making process of the model.

	Model	Accuracy	Precision	F1	Parameters	Conlusions of model evaluation
-	Baseline	0.512	-------	----	------	------------
1	MLP	0.7444	0.7667	0.7243	ReLU, 500 max iterations, (256, 128,645) hidden layer sizes	No signs of learning and severe underfitting
2	RF	0.7389	0.85	0.6467	100 n_estimators, 50 max depth	Clear sign of severe overfitting
3	SVM	0.7194	0.7933	0.6554	RBF, C = 10	Unclear learning pattern

SVM Evaluation

K-Fold Evaluation

Was it a fluke, or is the SVM truly differentiating between cohesive and non-cohesive pairs persistently over the k-fold validations?

Conclusion of SVM

The SVM model was statistically significant in its performance, with a p-value of 0.0075, and a confidence interval that did not dip into the chance-level performance.

We went on to further analyse the SVM model : which pairs was it having trouble to differentiate into cohesive vs non-cohseive the most?

Conclusions

Assumptions made:

• Cohesion can be reasonably approximated using a 4.7+ threshold on subjective scores. • Gamma frequency EEG signals are more informative for distinguishing dyads than other bands. • The drumming task provides a socially relevant interaction context, but does not systematically bias the EEG.

Conclusions derived:

• The architecure of the MLP most successfully classifies socially cohesive vs non-cohesive pairs above chance levels based on their EEG signal data only. • There are particular pairs which are more relavant to the model’s decision boundaries, but the nature of their relevance is unknown and would benefit from further inspection in the context of a larger dataset. • Inconsistent performance across other datasets and lack of pattern among the influential pairs suggests that the model may be overfitting to this dataset’s statistical structure.

Setup & Installation

Clone the repository:

git clone https://github.com/yourusername/MLCohesionEEGProject.git
cd MLCohesionEEGProject

Run the main.py file in the dsen_pipeline/main folder to see the accuracy results of the models and visualisations

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
.vscode		.vscode
dsen_pipeline		dsen_pipeline
pilot_pipeline		pilot_pipeline
preprocessing_matlab		preprocessing_matlab
.DS_Store		.DS_Store
EEGDyadicCohesion.pdf		EEGDyadicCohesion.pdf
README.md		README.md
Theoretical_DSEN_Pipeline_Framework.jpeg		Theoretical_DSEN_Pipeline_Framework.jpeg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML Cohesion EEG Project

Main Take-Away

Contents

Research Basis

Repository Structure

1) `preprocessing_matlab/`

2) Pipelines

2.1) `pilot_pipeline/`

2.2) `dsen_pipeline/`

Machine Learning Models

Cohesion Data Overview

Data Cleaning

Data Analysis

Model Performance Table

SVM Evaluation

K-Fold Evaluation

Conclusions

Setup & Installation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ML Cohesion EEG Project

Main Take-Away

Contents

Research Basis

Repository Structure

1) preprocessing_matlab/

2) Pipelines

2.1) pilot_pipeline/

2.2) dsen_pipeline/

Machine Learning Models

Cohesion Data Overview

Data Cleaning

Data Analysis

Model Performance Table

SVM Evaluation

K-Fold Evaluation

Conclusions

Setup & Installation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

1) `preprocessing_matlab/`

2.1) `pilot_pipeline/`

2.2) `dsen_pipeline/`

Packages