This repository contains a comprehensive pipeline for analyzing gut microbiome data from the Spondyloarthritis (SpA) cohort. The workflow includes all major steps of the analysis, including data preprocessing, quality control, taxonomic and functional profiling, statistical modeling, and visualization. The documentation below provides a detailed guide on how to reproduce each step of the analysis.
/docs/sessionInfo.txt
Important! Change the working directory before loading the different functions
working_dir <- "~/github_shared_code_and_publications/SpA_microbiome_paper_code"
path_func <- "~/github_shared_code_and_publications/SpA_microbiome_paper_code/functions" The count tables and minimal metadata (including Disease, Diagnosis, and Disease activity) are available at:
/data/1_infiles
All scripts used in the analysis are available in the script directory
2_alpha_diversity
/scripts/2_alpha_diversity/Colon/Diversity_test.R/scripts/2_alpha_diversity/GMM/Diversity_test.R/scripts/2_alpha_diversity/Ileum/Diversity_test.R/scripts/2_alpha_diversity/mOTUs/Diversity_test.R
3_beta_diversity
/scripts/3_beta_diversity/Beta_diversity_colon_biopsies_genus/Beta_diversity.R/scripts/3_beta_diversity/Beta_diversity_colon_biopsies_sv/Beta_diversity.R/scripts/3_beta_diversity/Beta_diversity_ileum_biopsies_sv/Beta_diversity.R/scripts/3_beta_diversity/Beta_diversity_metabolomics/Beta_diversity.R/scripts/3_beta_diversity/Beta_diversity_shotgun_gmm_Treatment_response/Beta_diversity.R/scripts/3_beta_diversity/Beta_diversity_shotgun_motus/Beta_diversity.R/scripts/3_beta_diversity/Beta_diversity_shotgun_motus_Treatment_response/Beta_diversity.R/scripts/3_beta_diversity/Beta_diversity_shotgun_gmm/Bray/Beta_diversity.R/scripts/3_beta_diversity/Beta_diversity_shotgun_gmm/Canberra/Beta_diversity.R/scripts/3_beta_diversity/Beta_diversity_shotgun_kegg/Bray/Beta_diversity.R/scripts/3_beta_diversity/Beta_diversity_shotgun_kegg/Canberra/Beta_diversity.R
4_biomarkers
Biomarkers
/scripts/4_biomarkers/Biomarkers_response_gogut_prediction/boot632_biomarkers.R/scripts/4_biomarkers/Biomarkers_response_gogut_stability/glmnet.R/scripts/4_biomarkers/Cofound_glmnet_biomarkers/glmnet.R
Differential abundance
/scripts/4_biomarkers/Biopsies/Biopsies_DA_pipeline.sh/scripts/4_biomarkers/GMM/Biopsies_DA_pipeline.sh/scripts/4_biomarkers/mOTUs/Biopsies_DA_pipeline.sh
/scripts/4_biomarkers/Metabolome/Biopsies_DA_pipeline.sh/scripts/4_biomarkers/GMM_gogut/1_DAT.sh/scripts/4_biomarkers/mOTUs_gogut/1_DAT.sh
5_Bayesian_network
Bayesian network inference
/scripts/5_network/SpA_Disease/Bayesian_network_pipeline.sh
Bootstrapping and arc strength
The arc strength estimation and Bayesian network bootstrapping were performed using a Sun Grid Engine (SGE) queuing cluster architecture (via a qsub submission script)
/scripts/5_network/SpA_Disease/Cluster_scripts/run_bn_bootstrap_learning.sh
6_mice_experiments
Beta diversity
/scripts/6_mice_experiments/Beta_diversity/Mice_data_Beta_div.R
Biomarkers
/scripts/6_mice_experiments/Biomarkers/WT_diff/WT_diff.R/scripts/6_mice_experiments/Biomarkers/Corr_W12_metadata/corr_Meta.R
ML predictions The machine learning pipeline used for these analyses is publicly available and fully reproducible at: https://github.com/jorgevazcast/Liver_Disease_Microbiome_ML
FASTQ files for this project are available through the European Genome-phenome Archive (EGA) under the following accession numbers:
- Study: EGAS50000001435
- Dataset: EGAD50000002070
- Data Access Committee (DAC): EGAC00001003263