Skip to content

cyanea-bio/labs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

123 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cyanea Labs

Modular bioinformatics in Rust.
Sequences, alignments, structures, stats — native, WASM, Python.

Rust License

DocsArchitectureBuild GuideBindingsUsage GuideIssuesDiscussions


Cyanea Labs is a Cargo workspace of 18 crates covering the core primitives of computational biology — sequence analysis, alignment, genomic intervals, statistics, machine learning, cheminformatics, structural biology, phylogenetics, metagenomics, epigenomics, proteomics, and network/pathway biology. Everything compiles to native, WebAssembly, and Python (via PyO3), with an Elixir NIF bridge for the Cyanea platform.

4,150+ tests. Zero unsafe. No heavyweight C/C++ dependencies in the core path.

Quick Start

# Check everything compiles
cargo check --workspace

# Run all tests (exclude cyanea-py unless Python env is configured)
cargo test --workspace --exclude cyanea-py

# Test a single crate
cargo test -p cyanea-seq

# Run benchmarks
cargo bench -p cyanea-align

From Rust

use cyanea_seq::DnaSequence;
use cyanea_align::needleman_wunsch;
use cyanea_stats::describe;

// Sequence analysis
let seq = DnaSequence::new("ATCGATCG").unwrap();
assert_eq!(seq.gc_content(), 0.5);

// Global alignment
let result = needleman_wunsch(b"ACGT", b"ACGTT", 1, -1, -2).unwrap();
println!("{}", result.cigar); // 4=1I

// Descriptive statistics
let stats = describe(&[1.0, 2.0, 3.0, 4.0, 5.0]).unwrap();
println!("mean={}, std={:.2}", stats.mean, stats.std_dev);

From Python

import cyanea

# Align two sequences
result = cyanea.smith_waterman("ACGTACGT", "CGTAC", 2, -1, -2)
print(result["score"], result["cigar"])

# Parse a VCF file
variants = cyanea.read_vcf("samples.vcf")

# PCA on a distance matrix
coords = cyanea.pca([[0,1,2],[1,0,3],[2,3,0]], n_components=2)

From JavaScript / TypeScript

import { align, seq, stats } from "@cyanea/bio";

const result = align.smithWaterman("ACGTACGT", "CGTAC", 2, -1, -2);
const gc = seq.gcContent("ATCGATCG");
const desc = stats.describe([1, 2, 3, 4, 5]);

Crates

Crate What it does Tests
cyanea-core Error types, traits, SHA-256, zstd/gzip, mmap, log-space probability, rank/select bitvectors, wavelet matrix, Fenwick tree 58
cyanea-seq DNA/RNA/protein sequences, FASTA/FASTQ, k-mers, 2-bit encoding, suffix array, FM-index, BWT, FMD-index, MinHash, pattern matching (KMP, Boyer-Moore, Myers bit-parallel), PSSM/motif scanning, ORF finder, codon tables, sequence masking, RNA secondary structure, protein properties, read simulation, de Bruijn graphs, assembly QC, long-read analysis (PacBio/Nanopore), structural variant calling, nanopore QC/methylation 515
cyanea-align Needleman-Wunsch, Smith-Waterman, semi-global, banded, MSA, seed-and-extend, minimizers, WFA, POA, LCSk++, pair HMM, profile HMM, X-drop/Z-drop, spliced alignment, CIGAR utilities, substitution matrices (BLOSUM/PAM), SIMD (NEON/SSE4.1/AVX2), GPU dispatch 321
cyanea-omics Genomic coordinates, interval sets/trees, genome arithmetic, expression matrices, sparse matrices, variants, gene annotations, coordinate liftover, AnnData/h5ad/zarr, variant annotation/VEP, CNV/CBS, methylation, spatial transcriptomics (Visium/MERFISH/Slide-seq, cell segmentation, spatial domains, SVG detection, cell-cell communication, deconvolution), single-cell (HVG, normalize, Leiden/Louvain, diffusion map, DPT, PAGA, markers, Harmony/ComBat/MNN, MTX/CSC, RNA velocity, batch correction metrics), clinical genomics (ACMG/AMP classification, ClinVar matching, pharmacogenomics, HLA typing, TMB, MSI), microarray analysis (RMA normalization, limma-style differential expression, Illumina methylation array analysis), Hi-C contact matrices, TAD calling, A/B compartments, loop detection, CRISPR (guide RNA scoring, off-target prediction, MAGeCK-style screen analysis, base editing CBE/ABE) 556
cyanea-io CSV, VCF, BED, BEDPE, GFF3, GTF, SAM, BAM, CRAM, BCF, Parquet, BLAST, BLAST XML, MAF, GenBank, bigWig, Stockholm, Clustal, Phylip, EMBL, PIR, ABI, bedGraph, GFA, CEL/GPR/IDAT microarray formats, FCS (Flow Cytometry Standard 2.0/3.0/3.1), indexed BAM/VCF, BAM ops, VCF ops, variant calling, fetch clients 381
cyanea-stats Descriptive stats, correlation, hypothesis tests (t, chi-squared, Mann-Whitney, Fisher, KS), distributions, PCA, effect sizes, Bayesian conjugate priors, combinatorics, population genetics (Fst, Tajima's D, LD), differential expression, enrichment (GSEA, ORA), ordination (PCoA, NMDS), multivariate tests (PERMANOVA, ANOSIM), survival analysis, ecological diversity 384
cyanea-ml K-means, DBSCAN, hierarchical clustering, pairwise distances, KNN, PCA, t-SNE, UMAP, random forest, GBDT, feature selection, HMM, classification metrics, cross-validation 269
cyanea-chem SMILES/SDF V2000/V3000, SMARTS, molecular fingerprints (Morgan, MACCS), substructure search, stereochemistry, canonical SMILES, 200+ descriptors, drug-likeness (Lipinski, QED, PAINS), scaffolds (Murcko, MCS), 3D conformers (ETKDG), force fields (UFF, MMFF94), Gasteiger charges, chemical reactions (SMIRKS), standardization, metabolomics (mass matching, isotope patterns, RT prediction, KEGG pathway enrichment) 216
cyanea-struct PDB/mmCIF parsing, 3D geometry, DSSP secondary structure, Kabsch superposition, contact maps, Ramachandran analysis 76
cyanea-phylo Newick/NEXUS parsing, distance matrices, UPGMA/NJ, Fitch/Sankoff parsimony, ML likelihood (GTR+G), bootstrap, tree search (NNI/SPR/TBR), model selection (AIC/BIC), protein models (LG/WAG/JTT), Bayesian MCMC, species tree (ASTRAL), UniFrac, simulation, consensus, dating, drawing 225
cyanea-meta Metagenomics: taxonomy (k-mer LCA classification), taxonomic profiling, alpha/beta diversity, compositional analysis (CLR/ILR, ALDEx2, ANCOM), functional annotation, metagenomic binning (TNF + coverage), assembly QC (N50/auN) 117
cyanea-epi Epigenomics: MACS2-style peak calling (narrow + broad), signal pileup/normalization, motif discovery/PWM scanning/MEME I/O, ChromHMM-like chromatin state learning, differential binding (DESeq2-style), nucleosome positioning, ATAC-seq QC (TSS enrichment, FRiP) 73
cyanea-proteomics Proteomics: MGF/mzML parsing, in-silico digestion (trypsin/LysC/chymotrypsin), fragment ions (b/y/a), database search (XCorr/hyperscore), protein inference (parsimony), label-free & TMT quantification, target-decoy FDR, mzTab output 86
cyanea-network Network/pathway biology: graph types, centrality, community detection (Louvain/LP), PPI analysis, GRN inference (correlation/MI/CLR), pathway topology scoring, crosstalk, GMT/GraphML/SIF/GEXF I/O 87
cyanea-gpu Backend trait with CPU, CUDA, Metal, and WebGPU implementations, GPU buffer management, k-mer counting, Smith-Waterman, MinHash, benchmarks 62
cyanea-datasets Bundled sample datasets and structured protocol templates (10 wet lab + 6 dry lab): genomics, alignment, epigenomics, single-cell, chemistry, phylogenetics, metagenomics, structural biology 57
cyanea-wasm WebAssembly bindings via wasm-bindgen (seq, io, align, stats, ml, chem, struct_bio, phylo, omics, core) 223
cyanea-py Python bindings via PyO3 (seq, align, stats, ml, chem, struct_bio, phylo, io, omics, sc) with optional NumPy support

Architecture

See docs/ARCHITECTURE.md for the full architecture guide with ASCII diagrams, feature flag details, data flow pipelines, and platform support matrix.

cyanea-core (foundation)
├── cyanea-seq          Sequences, indexing, k-mers
├── cyanea-align        Pairwise & multiple alignment
├── cyanea-omics        Genomic intervals, matrices, variants, spatial biology, single-cell, clinical genomics, CRISPR
├── cyanea-io           File format I/O
├── cyanea-stats        Statistics & distributions
├── cyanea-ml           Machine learning & clustering
├── cyanea-chem         Chemical structure, fingerprints & metabolomics
├── cyanea-struct       Protein structure & geometry
├── cyanea-phylo        Phylogenetics (optional: cyanea-ml)
├── cyanea-meta         Metagenomics & microbiome analysis
├── cyanea-epi          Epigenomics (ChIP-seq, ATAC-seq)
├── cyanea-proteomics   Proteomics & mass spectrometry
├── cyanea-network      Network/pathway biology
├── cyanea-gpu          GPU compute backends
├── cyanea-datasets     Bundled sample datasets
├── cyanea-wasm         → WebAssembly (@cyanea/bio on npm)
└── cyanea-py           → Python (PyO3 + maturin)

Every domain crate depends only on cyanea-core. The binding crates (wasm, py) aggregate the domain crates to expose a unified API. The Elixir NIF bridge lives in a separate repo and depends on these crates via path.

Feature Flags

All domain crates default to std. Opt into additional capabilities:

Flag Scope What it enables
parallel align, ml, stats, chem, struct, phylo, io Rayon parallelism
simd align SIMD-accelerated alignment
cuda gpu, align CUDA GPU backend (requires CUDA toolkit)
metal gpu, align Metal GPU backend (macOS)
wgpu gpu WebGPU backend (cross-platform)
blas ml, stats BLAS-backed PCA
wfa align Wavefront alignment
minhash seq MinHash / FracMinHash sketching
sam io SAM format support
bam io BAM format (implies sam)
cram io CRAM format (implies sam)
fcs io FCS flow cytometry format (2.0/3.0/3.1)
parquet io Apache Parquet columnar format
h5ad omics HDF5-backed AnnData I/O
zarr omics Zarr v3 I/O
single-cell omics Single-cell analysis pipeline (Leiden, HVG, pseudotime, integration)
serde all Serialization support
wasm wasm wasm-bindgen annotations
numpy py NumPy array interop

Building Bindings

See docs/BUILDING.md for the complete build guide covering all targets, feature combinations, and troubleshooting.

WebAssembly

cargo check -p cyanea-wasm --features wasm
wasm-pack build cyanea-wasm --features wasm

Published as @cyanea/bio on npm. TypeScript types and a Web Worker API are included in cyanea-wasm/ts/.

Python

PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1 cargo check -p cyanea-py
cd cyanea-py && PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1 maturin develop --release

Elixir NIF

cargo check -p cyanea-native   # Check only (linking requires BEAM)
cd ../cyanea && mix compile     # Full build via Rustler

Cross-Crate Pipelines

See docs/GUIDE.md for complete cross-crate workflow examples: FASTQ→alignment→variants, single-cell analysis, cheminformatics, phylogenetics, population genetics, and microbiome analysis.

Contributing

# Full CI check
cargo fmt --all -- --check
cargo clippy --workspace --exclude cyanea-py
cargo test --workspace --exclude cyanea-py

Using Claude Code? See CLAUDE.md for project context, architecture, and coding conventions.

License

Apache-2.0

About

Cyanea Labs: Rust bioinformatics ecosystem

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages