This repository orchestrates a two-step pipeline for generating synthetic cryo-ET training datasets with ground-truth labels for downstream machine learning tasks (e.g., particle picking, segmentation). All simulation parameters are controlled through TOML configuration files, enabling reproducible and scalable experiments on HPC systems.
Pipeline dependencies:
- polnet-synaptic — Simulates a 3D cellular environment (membranes, actin filaments, microtubules, cytosolic proteins, membrane-bound proteins) and outputs clean 3D density maps with ground-truth labels.
- faket-polnet — Takes those density maps, simulates the TEM imaging process (tilt series projection), applies VGG19-based neural style transfer to impose real cryo-ET noise textures, and reconstructs final 3D tomograms via IMOD weighted back-projection.
- IMOD — Used for tilt series projection (
xyzproj) and 3D reconstruction (tilt). Must be available as a module on your HPC.
This repository (simulation-main) does not contain source code; it only provides TOML configs and SLURM scripts that coordinate the two phases of the pipeline.
The polnet synaptic has two modes, each with a different entry point script:
| Mode | Script | Membranes | Cytoskeleton | Proteins |
|---|---|---|---|---|
| Default | all_features_default.py |
sphere, ellipsoid, toroid | actin, microtubules | cytosolic proteins, membrane proteins |
| Synapse | all_features_synapse.py |
ellipsoid (synaptic vesicles) | actin, microtubules | synaptic membrane proteins |
The Default mode is a general-purpose simulator that can enable or disable any combination of features. The Synapse mode handles a special case where all membranes are ellipsoidal to approximate synaptic vesicles, and synaptic membrane proteins are generated on the vesicle surfaces.
TOML Config
│
▼
Phase 1: polnet-synaptic (`all_features_default.py` OR `all_features_synapse.py`)
├── Generate membranes (sphere, ellipsoid, toroid)
├── Generate actin networks
├── Generate microtubule networks
├── Place cytosolic proteins
└── Place membrane-bound proteins
│
▼
simulation_dir_{simulation_dir_index}/
├── tomo_den_N.mrc ← clean 3D density map
├── tomo_lbls_N.mrc ← label mask
└── tomos_motif_list_{N}.csv ← particle types, positions, orientations (per tomogram)
│
▼
Phase 2: faket-polnet (`pipeline_parallel.py`)
Stage 1: Setup
├── Project style tomograms → style tilt series (IMOD `xyzproj`)
└── Label transform → output JSON annotations in CZII challenge format
Stage 2: Style Transfer
├── Project synthetic densities → clean + noisy tilt series (IMOD `xyzproj`)
├── FakET neural style transfer → style-transferred tilt series
└── 3D reconstruction → style-transferred tomograms (IMOD `tilt`)
Stage 3: Cleanup
└── Merge JSON metadata and collect reconstructed tomograms
│
▼
train_dir_{train_dir_index}/
├── faket_tomograms/ ← final style-transferred, reconstructed tomograms
└── overlay/ ← ground-truth particle annotations (JSON)
Clone all three repositories into the same parent directory:
git clone https://github.com/computational-cell-analytics/simulation-main.git
git clone https://github.com/computational-cell-analytics/polnet-synaptic.git
git clone https://github.com/computational-cell-analytics/faket-polnet.gitCreate and activate the conda environment:
cd simulation-main
conda env create -n simulation-main -f environment-gpu.yaml --channel-priority flexible
conda activate simulation-mainInstall polnet-synaptic and faket-polnet into the environment:
cd ../polnet-synaptic && pip install -e .
cd ../faket-polnet && pip install -e .A single TOML file controls both phases of the pipeline. See configs/czii.toml (default mode) or configs/synapse.toml (synapse mode) for examples.
[tool.polnet]— Phase 1 (polnet-synaptic)[tool.faket]— Phase 2 (faket-polnet)
See polnet-synaptic and faket-polnet for more detailed parameter documentation.
In your base_dir (defined in [tool.faket]), create a style_tomograms_{style_index} directory containing experimental tomograms to use as style references for neural style transfer.
In Phase 1, PolNet will automatically create simulation_dir_{simulation_index}/ inside base_dir. Phase 2 will use the the simulation generated by PolNet and the user-provided style tomograms as input.
FakET uses a pretrained VGG19 model for neural style transfer. On most HPC systems, compute nodes do not have internet access, so download the weights from the login node before running any jobs:
python - <<'EOF'
from torchvision.models import vgg19, VGG19_Weights
vgg19(weights=VGG19_Weights.DEFAULT)
EOFThe weights will be cached locally, and SLURM will automatically locate the cached file.
Place multiple .toml files in a directory and use the submission wrapper, which runs both phases of the pipeline for each config in parallel.
bash slurm_scripts/default/submit_simulation_default.shFor the synapse mode, use the submission wrapper:
bash slurm_scripts/synapse/submit_simulation_synapse.sh