conduitR is an R package for metaproteomics: the large-scale identification and quantification of proteins from microbial communities (e.g. gut microbiome, soil, bioreactors). It provides a single, consistent toolkit for building search databases, processing DIA-NN output, linking proteins to taxonomy and function, and running differential analysis and visualizations.
The package powers Conduit (a Snakemake workflow for metaproteomics) and Conduit-GUI (a graphical interface to explore Conduit results), but you can use conduitR on its own for custom pipelines and analyses.
- Database building — Get proteome FASTA files from UniProt by organism or proteome ID, concatenate them, and optionally create custom FASTA from a list of UniProt accessions.
- Import & structure — Convert DIA-NN parquet reports into a
QFeaturesobject (precursors → peptides → protein groups) with assay links. - Annotations — Attach taxonomy, Gene Ontology, KEGG, EggNOG, or CAZy annotations from UniProt and optional conduit annotation tables.
- Analysis — Run limma-style differential expression, over-representation (ORA), or GSEA; train classification/regression models (e.g. random forest, XGBoost).
- Visualization — Volcano plots, heatmaps, PCA biplots, taxonomic heat trees, sunbursts, and KEGG pathway figures, with consistent Conduit themes and palettes.
- Download proteome FASTA files from UniProt (UniProtKB and UniParc) by proteome or organism ID.
- Concatenate FASTA files and extract metadata (protein ID, organism, taxonomy) from UniProt-style headers.
- Create custom FASTA from a list of UniProt accessions.
- Fetch NCBI taxonomy and UniProt proteome metadata (organism ID, proteome type).
- DIA-NN → QFeatures: turn a DIA-NN parquet report into a
QFeaturesobject with precursors, peptides, and protein groups. - Build
QFeaturesfrom sample annotations and multiple count matrices. - Replace zeros with NA, add log2-imputed assays, and normalize protein abundance to species level.
- Taxonomy matrices: join DIA-NN output with FASTA and taxonomy to produce per-taxon count matrices.
- Limma: design matrix, contrast testing, and empirical Bayes moderation for differential expression.
- ORA & GSEA: over-representation and gene set enrichment with custom term–gene mappings (e.g. GO, species).
- Classification/regression: LASSO, random forest, XGBoost with optional tuning; confusion matrix, ROC, precision–recall, feature importance.
- Volcano plots, heatmaps (static and interactive), PCA biplots.
- Taxonomic heat trees and sunbursts, relative abundance barplots.
- KEGG pathway figures, feature-by-sample plots, missing-value heatmaps.
- Conduit color palettes and themes (
scale_color_conduit_d,scale_fill_conduit_c,set_plot_theme, etc.).
- Validate UniProt accession IDs; check API reachability (UniProt, NCBI).
- Logging with timestamps;
%!in%operator; integration with existing R workflows.
Install the development version from GitHub:
# install.packages("devtools")
devtools::install_github("baynec2/conduitR")After installation, load the package and try a few entry points:
library(conduitR)
# Check that the UniProt API is reachable (required for downloads)
check_api_service()
# Validate UniProt IDs (no network needed)
validate_uniprot_accession_ids(c("P12345", "invalid_id", "A0A023GPI8"))
# Convert a DIA-NN parquet report to QFeatures (requires a local file)
# qf <- diann_to_qfeatures("path/to/report.parquet")
# plot_features_per_sample(qf, assay = "protein_groups")
# Run differential analysis (after building design/contrast)
# terms <- find_possible_contrast_terms(qf, "protein_groups", ~ group)
# res <- perform_limma_analysis(qf, "protein_groups", ~ group, "treatmentB - treatmentA")
# plot_volcano(res$top_table)Function help and examples are in the built-in documentation:
e.g. ?get_fasta_file, ?diann_to_qfeatures,
?perform_limma_analysis.
Core dependencies include QFeatures (proteomics data structures), limma (differential expression), SummarizedExperiment, Biostrings, httr2, KEGGREST, rentrez, tidyr, dplyr, ggplot2, plotly, metacoder, arrow, and others for specific features. See the DESCRIPTION file for the full list.
- In R:
?function_namefor any exported function; many have runnable or\dontrunexamples. - Conduit workflow: conduit.
- Conduit-GUI: conduit-GUI.
MIT License; see LICENSE for details.
