Skip to content

owkin/liposarcoma

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Development of a federated learning model for the histological diagnosis of well-differentiated adipocytic tumors built from whole slide images of 2211 cases

Description

Codebase from which the results from this submission were obtained.

Code base Structure

The code base follows this directory structure:

├── liposarcome
│   ├── dl                      <- Deep learning python classes and scripts
│       ├── configs             <- Hydra configuration files for the deep learning training
│       ├── datasets            <- Classes for datasets
│       ├── models              <- Classes for models
│       ├── preprocessors       <- Pipeline for preprocessing features
│       ├── trainers            <- Implement fit and test
│       ├── utils               <- Misc utils (data splits, normalizations, etc.)
│       ├── train.py            <- Script executed for a ML training. Implement the CV
│   ├── cli                     <- CLI python scripts      
│   ├── load_data               <- Read the raw data and send parsed data
│   │   ├── bergo_parsing       <- Formating and checks for data from IB
│   │   ├── clb_parsing         <- Formating and checks for data from IB
│   │   ├── cohort_dataframe_generator.py     <- Functions issueing a df per center
│   │   └── debug_parsing       <- Functions to generate random clinical data (debug)
│   └── paths
│       └── paths.py            <- File to insert paths to use real data
├── .gitignore                         <- List of files/folders ignored by git
├── setup.py                           <- Package installation and dependencies
└── README.md

Requirements

How to run

Install the dependencies

cd liposarcoma

# Create your conda environment
conda create -n liposarcome python=3.8
conda activate liposarcome

# install the repo
uv pip install -e . -i https://pypi.org/simple

Train model with a chosen experiment configuration from liposarcome/dl/configs/experiment/ where experiment_name is one of the files in that folder (without .yaml extension)

liposarcome-dl experiment=debug_l_vs_s_clinical_sklearn

⚠️ WARNING: the only config for which code is running is the above debug config leveraging random data. The other configs require access to the center data which remain private and are not shared.

One can override any parameter from command line as follows

liposarcome-dl experiment=experiment_name trainer.max_epochs=20 dataset.batch_size=64

Feature extractors

This project employs four feature extractors. To train such extractors, please see the following self-supervised learning methods:

  • iBOT ViT Pancan: A Vision Transformer (ViT)-based feature extractor leveraging the iBOT framework (Zhou et al., 2021), trained on all available H&E-stained datasets from TCGA (Pan-cancer). Designed to capture broad, cross-cancer tissue representations.

  • iBOT ViT COAD: A domain-specific variant of the iBOT ViT model, trained exclusively on the TCGA COAD (colon adenocarcinoma) dataset for enhanced representation of colon tissue.

  • MoCo COAD: A feature extractor based on the Momentum Contrast (MoCo) framework (He et al., 2020), trained on TCGA COAD to learn colon-specific features via contrastive learning.

  • MoCo Cond: A condition-aware MoCo COAD model Zhou et al., 2022 allowing condition-sensitive tissue embeddings. In this approach, the feature extractor is trained based on Momentum Contrast (MoCo) framework with a constraint applied during the conditional sampling. This training aims at reducing batch effect issues by enforcing the extractor to discriminate between different images of the same slide (e.g. two images of the same specimen with different stainings).


Tests

The code base contains 4 unit tests that can be run by a call to pytest in the command line. Of note, the test called test_deepmil.py contains a very basic instantiation of the multimodal DeepMIL network which is the main model investigated in this work. This test serves as an example for someone willing to re-use only the neural network.

Use your own data

To use this code base with your own data, the data files need to be structured in a tree identical to what was done for the mock data in tests/load_data/cohort_dataframe_generator/assets. Based on this structure, paths need to be inserted in the corresponding constants in liposarcome/paths/paths.py.

Additionnally, the CohortDataFrameGenerator class defined in liposarcome/load_data/cohort_dataframe_generator.py will need to be updated by:

  • removing the NotImplementedError raised for a cohort name different from "debug" (see this line and this other line),
  • defining parsing functions for the clinical data to deliver a correctly formatted dataframe. Those functions are stored in liposarcome/load_data where formatting examples for the IB and CLB centers are provided.

Federated learning

This code does not include federated learning scripts. For a federated deployment of the proposed deep learning pipeline, this work relied on the open source Substra framework. We used FedAvg as federated strategy.

About

Shared code for the paper "Development of a federated learning model for the histological diagnosis of well-differentiated adipocytic tumors built from whole slide images of 2211 cases"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages