flanders is an R package designed to seamlessly convert finemapping output files (e.g., from the nf-flanders pipeline) into a unified AnnData object and facilitate colocalization analysis. The package provides functions to:
- Convert multiple
*finemap.rdsfiles into a single AnnData object with credible set metadata. - Generate an input table (
coloc_input) for colocalization testing. - Run pairwise colocalization tests, with minimal runtime overhead (typically 5–10 tests per second on standard hardware).
When processing small to moderate datasets, you can run colocalization tests on your PC or laptop. For large-scale analyses, consider using the flanders_nf_coloc Nextflow pipeline.
- Installation
- Quick Start
- Function Reference
- AnnData Column Specifications
- Additional Resources
- Acknowledgments
To install the required R packages from CRAN, BioConductor and github, you can run the following commands in your R session:
install.packages("data.table")
install.packages("dplyr")
install.packages("Matrix")
install.packages("optparse")
# You need these Bioconductor packages:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("SingleCellExperiment")
BiocManager::install("zellkonverter")
BiocManager::install("scRNAseq")
# Install the flanders package
# You need the devtools package if not installed.
install.packages("devtools")
install.packages("anndata")
devtools::install_github("Biostatistics-Unit-HT/flanders_r")This guide installs all dependencies from CRAN and Bioconductor for a straightforward R-based setup.
For a more reproducible environment or if you need to interface with Python’s AnnData via reticulate, you can set up a Conda environment as follows:
mamba create -p <your_folder_with_conda_envs>/flanders_r \
-c conda-forge -c bioconda -c R \
r-base=4.4 \
bioconductor-singlecellexperiment=1.28.0 \
bioconductor-zellkonverter=1.16.0 \
bioconductor-scrnaseq=2.20.0 \
r-susier=0.12.35 \
r-coloc=5.2.3 \
r-data.table=1.17.0 \
r-dplyr=1.1.4 \
r-anndata=0.7.5.6 \
r-matrix=1.7_3 \
r-optparse=1.7.5
conda activate <your_folder_with_conda_envs>/flanders_r
git clone git@github.com:Biostatistics-Unit-HT/flanders_r.git
install.packages("flanders_r", repos = NULL, type = "source")
If you already have an AnnData:
library(zellkonverter)
library(SingleCellExperiment)
library(scRNAseq)
sce <- readH5AD("/path/to/output/my_anndata.h5ad",reader="R")
coloc_input <- anndata2coloc_input(sce)
coloc_results <- anndata2coloc(sce, coloc_input)
print(coloc_results)If you already have multiple AnnDatas:
library(zellkonverter)
library(SingleCellExperiment)
library(scRNAseq)
library(anndata)
list_of_ads <- list("/path/to/ad1.h5ad","/path/to/output/ad2.h5ad")
list_of_ads <- lapply(list_of_ads,read_h5ad)
merged_ad <- concat(
list_of_ads,
join="outer",
merge="first"
)
merged_ad <- fix_ad_var(merged_ad)
write_h5ad(merged_ad,"/path/to/merged_ad.h5ad")
library(zellkonverter)
library(SingleCellExperiment)
library(scRNAseq)
sce <- readH5AD("/path/to/merged_ad.h5ad",reader="R")
coloc_input <- anndata2coloc_input(sce)
coloc_results <- anndata2coloc(sce, coloc_input)
print(coloc_results)If you do not have an AnnData object yet:
- Convert Finemapping Files to AnnData
library(flanders)
library(data.table)
library(dplyr)
finemap_folder <- "/path/to/finemap/results/"
finemap_files <- list.files(finemap_folder, pattern = "*\\.rds", full.names = TRUE)
# create a vector of phenotype ids. Each element corresponds to each file of finemap_files.
# phenotype_id <-
# create a vector of study ids. Each element corresponds to each file of finemap_files.
# study_id <-
ad <- finemap2anndata(
finemap_files = finemap_files,
phenotype_id = phenotype_id,
study_id = study_id
)
# Optionally write the AnnData object to disk
ad$write_h5ad("/path/to/output/my_anndata.h5ad")- Generate Coloc Input Table. Write it if you want further run the nf-hcoloc pipeline
library(zellkonverter)
library(SingleCellExperiment)
library(scRNAseq)
sce <- readH5AD("/path/to/output/my_anndata.h5ad",reader="R")
coloc_input <- anndata2coloc_input(sce)
fwrite(coloc_input, file = "/path/to/coloc_guide_table.csv")- Perform Colocalization Analysis
coloc_results <- anndata2coloc(sce, coloc_input) print(coloc_results)
-
finemap2anndata
Converts a set of finemapping.rdsfiles into a single AnnData object with credible set metadata. -
anndata2coloc_input
Generates a data frame specifying pairs of credible sets for colocalization tests. -
anndata2coloc
Performs colocalization tests using the provided AnnData object and trait-pair table.
| Column Name | Format / Content | Description |
|---|---|---|
snp |
chr{num}:{pos}:EA:RA |
chr{num}: chromosome identifier (any string) pos: SNP position (in bp) EA: effective allele (linked to beta) RA: reference allele |
chr |
String (e.g. "chr{num}") |
Chromosome where the SNP is located. Usually "chr{num}" format, but can be any string. |
pos |
Numeric (bp) | Position of the SNP in base pairs (bp) on the physical map of the genome. |
Note: The row names of ad$var should be exactly equal to the values in ad$var$snp.
| Column Name | Format / Content | Description |
|---|---|---|
cs_name |
chr{num}::{study_id}::{trait_id}::{snp} |
chr{num}: chromosome (formatted as in var)study_id: identifier for the study trait_id: refers to the trait (equivalent to phenotype_id)snp: SNP with the highest logABF within the credible set (formatted as in ad$var$snp). |
chr |
String (e.g. "chr{num}") |
Chromosome where the credible set is located (same format as the chr field in var). |
start |
Numeric (bp) | Start position (in bp) of the analyzed locus, representing the beginning of the locus used for fine mapping. |
end |
Numeric (bp) | End position (in bp) of the analyzed locus. |
study_id |
String | Identifier for the study. |
phenotype_id |
String | Identifier for the trait/phenotype analyzed within the corresponding study. |
min_res_labf |
Numeric (log-scale) | Minimal value of logABF in the locus. If logABF for all SNPs is not available, approximate using:[logsum(logABF)/coverage] - log(N_snps - N_CS_SNPs) |
| coverage | Numeric | requested coverage (usually 99% or 95%) |
| logsum(logABF) | Numeric | log of the sum of ABF for SNPs in the credible set (using a log-sum to avoid overflow from extremely large values) |
Note: The row names of ad$obs should be exactly equal to the values in ad$obs$cs_name.
| Column Name | Format / Content | Description |
|---|---|---|
min.abs.corr |
Numeric | Purity metric for the credible set: minimal absolute correlation between SNPs within the credible set. |
mean.abs.corr |
Numeric | Mean absolute correlation among SNPs within the credible set. |
median.abs.corr |
Numeric | Median absolute correlation among SNPs within the credible set. |
| Column Name | Format / Content | Description |
|---|---|---|
top_pvalue |
Numeric | Lowest (either nominal or conditional) P-value of genetic association in the locus (useful for filtering). |
panel |
String | Name of the imputation panel used. |
If the runtime for your colocalization tests becomes large, use the flanders_nf_coloc Nextflow pipeline for scalable colocalization analysis.
Contributions, bug reports, and feature requests are welcome. Open an issue in case of issue, bug or feature request.