conduitR

What is conduitR?

conduitR is an R package for metaproteomics: the large-scale identification and quantification of proteins from microbial communities (e.g. gut microbiome, soil, bioreactors). It provides a single, consistent toolkit for building search databases, processing DIA-NN output, linking proteins to taxonomy and function, and running differential analysis and visualizations.

The package powers Conduit (a Snakemake workflow for metaproteomics) and Conduit-GUI (a graphical interface to explore Conduit results), but you can use conduitR on its own for custom pipelines and analyses.

Typical workflow

Database building — Get proteome FASTA files from UniProt by organism or proteome ID, concatenate them, and optionally create custom FASTA from a list of UniProt accessions.
Import & structure — Convert DIA-NN parquet reports into a QFeatures object (precursors → peptides → protein groups) with assay links.
Annotations — Attach taxonomy, Gene Ontology, KEGG, EggNOG, or CAZy annotations from UniProt and optional conduit annotation tables.
Analysis — Run limma-style differential expression, over-representation (ORA), or GSEA; train classification/regression models (e.g. random forest, XGBoost).
Visualization — Volcano plots, heatmaps, PCA biplots, taxonomic heat trees, sunbursts, and KEGG pathway figures, with consistent Conduit themes and palettes.

Features

Data and databases

Download proteome FASTA files from UniProt (UniProtKB and UniParc) by proteome or organism ID.
Concatenate FASTA files and extract metadata (protein ID, organism, taxonomy) from UniProt-style headers.
Create custom FASTA from a list of UniProt accessions.
Fetch NCBI taxonomy and UniProt proteome metadata (organism ID, proteome type).

Data processing and structure

DIA-NN → QFeatures: turn a DIA-NN parquet report into a QFeatures object with precursors, peptides, and protein groups.
Build QFeatures from sample annotations and multiple count matrices.
Replace zeros with NA, add log2-imputed assays, and normalize protein abundance to species level.
Taxonomy matrices: join DIA-NN output with FASTA and taxonomy to produce per-taxon count matrices.

Statistical analysis

Limma: design matrix, contrast testing, and empirical Bayes moderation for differential expression.
ORA & GSEA: over-representation and gene set enrichment with custom term–gene mappings (e.g. GO, species).
Classification/regression: LASSO, random forest, XGBoost with optional tuning; confusion matrix, ROC, precision–recall, feature importance.

Visualization

Volcano plots, heatmaps (static and interactive), PCA biplots.
Taxonomic heat trees and sunbursts, relative abundance barplots.
KEGG pathway figures, feature-by-sample plots, missing-value heatmaps.
Conduit color palettes and themes (scale_color_conduit_d, scale_fill_conduit_c, set_plot_theme, etc.).

Utilities

Validate UniProt accession IDs; check API reachability (UniProt, NCBI).
Logging with timestamps; %!in% operator; integration with existing R workflows.

Installation

Install the development version from GitHub:

# install.packages("devtools")
devtools::install_github("baynec2/conduitR")

Quick start

After installation, load the package and try a few entry points:

library(conduitR)

# Check that the UniProt API is reachable (required for downloads)
check_api_service()

# Validate UniProt IDs (no network needed)
validate_uniprot_accession_ids(c("P12345", "invalid_id", "A0A023GPI8"))

# Convert a DIA-NN parquet report to QFeatures (requires a local file)
# qf <- diann_to_qfeatures("path/to/report.parquet")
# plot_features_per_sample(qf, assay = "protein_groups")

# Run differential analysis (after building design/contrast)
# terms <- find_possible_contrast_terms(qf, "protein_groups", ~ group)
# res <- perform_limma_analysis(qf, "protein_groups", ~ group, "treatmentB - treatmentA")
# plot_volcano(res$top_table)

Function help and examples are in the built-in documentation: e.g. ?get_fasta_file, ?diann_to_qfeatures, ?perform_limma_analysis.

Dependencies

Core dependencies include QFeatures (proteomics data structures), limma (differential expression), SummarizedExperiment, Biostrings, httr2, KEGGREST, rentrez, tidyr, dplyr, ggplot2, plotly, metacoder, arrow, and others for specific features. See the DESCRIPTION file for the full list.

Documentation

In R: ?function_name for any exported function; many have runnable or \dontrun examples.
Conduit workflow: conduit.
Conduit-GUI: conduit-GUI.

License

MIT License; see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
.github/workflows		.github/workflows
R		R
inst		inst
man		man
temp		temp
tests		tests
.DS_Store		.DS_Store
.Rbuildignore		.Rbuildignore
.dockerignore		.dockerignore
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
DESCRIPTION		DESCRIPTION
Dockerfile		Dockerfile
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.html		README.html
README.md		README.md
conduitR.Rproj		conduitR.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

conduitR

What is conduitR?

Typical workflow

Features

Data and databases

Data processing and structure

Statistical analysis

Visualization

Utilities

Installation

Quick start

Dependencies

Documentation

License

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

conduitR

What is conduitR?

Typical workflow

Features

Data and databases

Data processing and structure

Statistical analysis

Visualization

Utilities

Installation

Quick start

Dependencies

Documentation

License

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages