An Integrated Pipeline for Cryo-ET Data Simulation

This repository orchestrates a two-step pipeline for generating synthetic cryo-ET training datasets with ground-truth labels for downstream machine learning tasks (e.g., particle picking, segmentation). All simulation parameters are controlled through TOML configuration files, enabling reproducible and scalable experiments on HPC systems.

Overview

Pipeline dependencies:

polnet-synaptic — Simulates a 3D cellular environment (membranes, actin filaments, microtubules, cytosolic proteins, membrane-bound proteins) and outputs clean 3D density maps with ground-truth labels.
faket-polnet — Takes those density maps, simulates the TEM imaging process (tilt series projection), applies VGG19-based neural style transfer to impose real cryo-ET noise textures, and reconstructs final 3D tomograms via IMOD weighted back-projection.
IMOD — Used for tilt series projection (xyzproj) and 3D reconstruction (tilt). Must be available as a module on your HPC.

This repository (simulation-main) does not contain source code; it only provides TOML configs and SLURM scripts that coordinate the two phases of the pipeline.

Pipeline Variants

The polnet synaptic has two modes, each with a different entry point script:

Mode	Script	Membranes	Cytoskeleton	Proteins
Default	`all_features_default.py`	sphere, ellipsoid, toroid	actin, microtubules	cytosolic proteins, membrane proteins
Synapse	`all_features_synapse.py`	ellipsoid (synaptic vesicles)	actin, microtubules	synaptic membrane proteins

The Default mode is a general-purpose simulator that can enable or disable any combination of features. The Synapse mode handles a special case where all membranes are ellipsoidal to approximate synaptic vesicles, and synaptic membrane proteins are generated on the vesicle surfaces.

Data Flow

TOML Config
    │
    ▼
Phase 1: polnet-synaptic (`all_features_default.py`  OR  `all_features_synapse.py`)
  ├── Generate membranes (sphere, ellipsoid, toroid)
  ├── Generate actin networks
  ├── Generate microtubule networks
  ├── Place cytosolic proteins
  └── Place membrane-bound proteins 
       │
       ▼
  simulation_dir_{simulation_dir_index}/
    ├── tomo_den_N.mrc               ← clean 3D density map
    ├── tomo_lbls_N.mrc              ← label mask
    └── tomos_motif_list_{N}.csv     ← particle types, positions, orientations (per tomogram)
       │
       ▼
Phase 2: faket-polnet (`pipeline_parallel.py`)
  Stage 1: Setup
  ├── Project style tomograms        → style tilt series (IMOD `xyzproj`)
  └── Label transform                → output JSON annotations in CZII challenge format
  Stage 2: Style Transfer
  ├── Project synthetic densities    → clean + noisy tilt series (IMOD `xyzproj`)
  ├── FakET neural style transfer    → style-transferred tilt series
  └── 3D reconstruction              → style-transferred tomograms (IMOD `tilt`)
  Stage 3: Cleanup
  └── Merge JSON metadata and collect reconstructed tomograms
       │
       ▼
  train_dir_{train_dir_index}/
    ├── faket_tomograms/             ← final style-transferred, reconstructed tomograms
    └── overlay/                     ← ground-truth particle annotations (JSON)

Installation

Clone all three repositories into the same parent directory:

git clone https://github.com/computational-cell-analytics/simulation-main.git
git clone https://github.com/computational-cell-analytics/polnet-synaptic.git
git clone https://github.com/computational-cell-analytics/faket-polnet.git

Create and activate the conda environment:

cd simulation-main
conda env create -n simulation-main -f environment-gpu.yaml --channel-priority flexible
conda activate simulation-main

Install polnet-synaptic and faket-polnet into the environment:

cd ../polnet-synaptic && pip install -e .
cd ../faket-polnet && pip install -e .

Setup

1. Configure the TOML file

A single TOML file controls both phases of the pipeline. See configs/czii.toml (default mode) or configs/synapse.toml (synapse mode) for examples.

[tool.polnet] — Phase 1 (polnet-synaptic)
[tool.faket] — Phase 2 (faket-polnet)

See polnet-synaptic and faket-polnet for more detailed parameter documentation.

2. Set up the directory structure

In your base_dir (defined in [tool.faket]), create a style_tomograms_{style_index} directory containing experimental tomograms to use as style references for neural style transfer.

In Phase 1, PolNet will automatically create simulation_dir_{simulation_index}/ inside base_dir. Phase 2 will use the the simulation generated by PolNet and the user-provided style tomograms as input.

3. Download pretrained VGG19 weights

FakET uses a pretrained VGG19 model for neural style transfer. On most HPC systems, compute nodes do not have internet access, so download the weights from the login node before running any jobs:

python - <<'EOF'
from torchvision.models import vgg19, VGG19_Weights
vgg19(weights=VGG19_Weights.DEFAULT)
EOF

The weights will be cached locally, and SLURM will automatically locate the cached file.

Usage

Default mode

Place multiple .toml files in a directory and use the submission wrapper, which runs both phases of the pipeline for each config in parallel.

bash slurm_scripts/default/submit_simulation_default.sh

Synapse mode

For the synapse mode, use the submission wrapper:

bash slurm_scripts/synapse/submit_simulation_synapse.sh

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
configs		configs
slurm_scripts		slurm_scripts
README.md		README.md
environment-gpu.yaml		environment-gpu.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

An Integrated Pipeline for Cryo-ET Data Simulation

Overview

Pipeline Variants

Data Flow

Installation

Setup

1. Configure the TOML file

2. Set up the directory structure

3. Download pretrained VGG19 weights

Usage

Default mode

Synapse mode

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

An Integrated Pipeline for Cryo-ET Data Simulation

Overview

Pipeline Variants

Data Flow

Installation

Setup

1. Configure the TOML file

2. Set up the directory structure

3. Download pretrained VGG19 weights

Usage

Default mode

Synapse mode

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages