This repository contains the code for the simulation study presented in On the construction of molecular signatures of lifestyle exposures, evaluating feature selection strategies under different latent variable scenarios.
simulation_signatures/
βββ run_simulation.R # Main simulations (Scenarios 1-3)
βββ R/
β βββ analysis_fxns.R # Analysis functions
β βββ generate_data.R # Data generation functions
β βββ generate_figures.R # Main figure generation
β βββ generate_figures_supplementary.R # Supplementary figure generation
β βββ run_scenario_1.R # Script to run scenario 1
β βββ run_scenario_2.R # Script to run scenario 2
β βββ run_scenario_3.R # Script to run scenarios 3
β βββ run_scenario_2.sh # Bash script to run scenario 2
β βββ run_scenario_3.sh # Bash script to run scenario 3
βββ config/
β βββ scenario_1.R # Config file for Scenario 1
β βββ scenario_2.R # Config file for Scenario 2
β βββ scenario_3.R # Config file for Scenario 3
βββ renv/ # Reproducible R environment
βββ renv.lock # Package versions used in the study
βββ README.md # You're here
If you have Git installed, run:
git clone https://github.com/IARCBiostat/SimulationSignatures/
cd SimulationSignaturesAlternatively, you can download the ZIP file directly from GitHub and unzip it.
To ensure the correct package versions, we recommend using the renv environment:
install.packages("renv") # If not already installed
renv::restore() # Restores package versions from renv.lockThis will install the specific versions of all required packages as used in the original analysis.
Simulations are designed to be run in parallel. The number of cores used is controlled via the NCore parameter defined in the corresponding config file (e.g., config/scenario_1.R).
Rscript run_simulation.R Scenario 2 and 3 is parallelized using SLURM job arrays rather than R's internal parallelization.
Each job processes one parameter combination.
Make sure the array size matches the number of parameter combinations:
nrow(AllParams)Each job uses SLURM_ARRAY_TASK_ID to select its task.
Rscript run_supplementary.R config/supplementary_scenario_4.RYou can edit the NCore value in the config file to match the number of CPU cores you wish to allocate.
Simulation results will be saved in the results/ directory as .rds files.
Once the simulations are complete, you can generate the figures used in the paper and supplementary material:
source("R/generate_figures.R")source("R/generate_figures_supplementary.R")Figures will be saved to a results/Figures/ or equivalent output directory specified in the figure generation script.
For issues, bugs, or questions, feel free to open an issue or contact us.