Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
0a5f395
bump x.y.z version to even y prior to creation of RELEASE_3_21 branch
jwokaty Apr 15, 2025
474602e
bump x.y.z version to odd y following creation of RELEASE_3_21 branch
jwokaty Apr 15, 2025
06b9297
init cirro config
dimalvovs May 9, 2025
5ef6b3f
move config so cirro can find it
dimalvovs May 9, 2025
98e570d
update preprocess.py
dimalvovs May 9, 2025
8edd58c
update and mv main.nf
dimalvovs May 9, 2025
abb66ca
update cirro to newer format
dimalvovs May 9, 2025
c2e020f
update example pipeline
dimalvovs May 9, 2025
83cc13c
update params
dimalvovs May 9, 2025
cafd7a2
update resources
dimalvovs May 9, 2025
a47c121
make nsets < than cpus
dimalvovs May 10, 2025
3acecbb
update to sparse
dimalvovs May 10, 2025
dae6ad4
run on 10k iterations 5,10,15,20 patterns
dimalvovs May 11, 2025
453c447
report every 100 steps
dimalvovs May 11, 2025
b4e10fc
example to run on disco HPC
dimalvovs May 12, 2025
364d823
make it process_long and no distributed
dimalvovs May 13, 2025
859f1e3
filter top 5K genes
dimalvovs May 14, 2025
9ee7261
update params
dimalvovs May 15, 2025
ecee6a4
rm Seurat::NormalizeData step
dimalvovs May 20, 2025
583a868
configure cirro inputs
dimalvovs May 20, 2025
d1caed3
debug cirro inputs
dimalvovs May 20, 2025
852e993
debug cirro inputs
dimalvovs May 20, 2025
40f49d9
debug cirro inputs
dimalvovs May 20, 2025
7b7eca1
allow comma separated string for npatterns
dimalvovs Jul 9, 2025
61fb147
update docs for nextflow mode
dimalvovs Jul 9, 2025
3dacec4
update nextflow instructions
dimalvovs Jul 9, 2025
a8a8060
update npatterns in cirro
dimalvovs Jul 9, 2025
b84c3c5
split preprocess from CoGAPS
dimalvovs Sep 11, 2025
96f767b
refactor, increase RAM
dimalvovs Sep 11, 2025
f629a8c
Merge branch 'master' into nextflow
dimalvovs Sep 11, 2025
51fd4a4
use sparseMatrixStats for var estimation
dimalvovs Sep 11, 2025
5a1fb73
point to nexflow container, improve proprocessing
dimalvovs Sep 11, 2025
39ca1c2
debug channel composition
dimalvovs Sep 11, 2025
30eabcd
increase cpus as filtering reduced RAM footprint
dimalvovs Sep 11, 2025
71f6b15
pin to CoGAPS 3.21.5 due to broken distributed="single-cell"
dimalvovs Sep 13, 2025
3616fce
fix stitchTogether, close #146
dimalvovs Sep 13, 2025
200e0fe
add DistributedCoGAPS tests
dimalvovs Sep 13, 2025
c9cced7
Update nextflow.config
dimalvovs Sep 13, 2025
2d7ee68
add Distributed_CoGAPS tests
dimalvovs Sep 14, 2025
7f3d754
prepare for prod
dimalvovs Sep 14, 2025
f84b1b6
Update main.nf
dimalvovs Sep 15, 2025
575dbef
Update main.nf
dimalvovs Sep 16, 2025
b4e327f
Update nextflow.config
dimalvovs Sep 16, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Package: CoGAPS
Version: 3.27.5
Version: 3.29.1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this version bump makes sense?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Bioconductor is strict about versions: in x.y.z, y is always odd in devel and even in release. Current release is 28, so current devel must be 29. Although ir may be 29.0, will find out when pushing to Bioconductor.

Date: 2025-03-11
Title: Coordinated Gene Activity in Pattern Sets
Author: Jeanette Johnson, Ashley Tsang, Jacob Mitchell, Thomas Sherman, Wai-shing Lee, Conor Kelton, Ondrej Maxian, Jacob Carey,
Expand Down
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ RUN sudo apt-get update -y && \
apt-get install libhdf5-dev build-essential patch -y

#packages below didn't install with devtools::install_deps, needed BiocManager
RUN Rscript -e 'BiocManager::install(c("S4Vectors", "SingleCellExperiment", "SummarizedExperiment", "rhdf5", "fgsea"), ask=FALSE)'
RUN Rscript -e 'BiocManager::install(c("S4Vectors", "SingleCellExperiment", "SummarizedExperiment", "rhdf5", "fgsea", "sparseMatrixStats"), ask=FALSE)'

#install all other dependencies
RUN Rscript -e 'devtools::install_deps(".", dependencies=TRUE)'
Expand Down
7 changes: 3 additions & 4 deletions R/DistributedCogaps.R
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ distributedCogaps <- function(data, allParams, uncertainty)
allParams$gaps@fixedPatterns <- matchedPatterns$consensus
allParams$gaps@whichMatrixFixed <- ifelse(allParams$gaps@distributed
== "genome-wide", "P", "A")

# run final phase with fixed matrix
gapsCat(allParams, "Running Final Stage...\n\n")
finalResult <- bplapply(1:length(sets), BPPARAM=allParams$BPPARAM,
Expand Down Expand Up @@ -233,7 +233,7 @@ stitchTogether <- function(result, allParams, sets)
Asd <- do.call(rbind, lapply(result, function(x) x@loadingStdDev))

# copy P matrix - same for all sets
Pmean <- result[[1]]@sampleFactors
Pmean <- result[[1]]@metadata$params@fixedPatterns
Copy link

Copilot AI Sep 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both genome-wide and single-cell modes are using @fixedPatterns for different matrices. In genome-wide mode, line 236 should likely use @sampleFactors instead of @fixedPatterns for the P matrix, and in single-cell mode, line 258 should use @featureLoadings for the A matrix.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so.

Psd <- matrix(0, nrow=nrow(Pmean), ncol=ncol(Pmean))

# if each feature was used once, re-order to match data
Expand All @@ -255,7 +255,7 @@ stitchTogether <- function(result, allParams, sets)
Psd <- do.call(rbind, lapply(result, function(x) x@factorStdDev))

# copy A matrix - same for all sets
Amean <- result[[1]]@featureLoadings
Amean <- result[[1]]@metadata$params@fixedPatterns
Comment thread
dimalvovs marked this conversation as resolved.
Asd <- matrix(0, nrow=nrow(Amean), ncol=ncol(Amean))

# if each sample was used once, re-order to match data
Expand All @@ -276,4 +276,3 @@ stitchTogether <- function(result, allParams, sets)
"sampleNames"=rownames(Pmean),
"meanChiSq"=sum(sapply(result, function(r) r@metadata$meanChiSq))))
}

162 changes: 117 additions & 45 deletions main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -11,34 +11,13 @@ process COGAPS {
tuple val(meta), path("${prefix}/cogapsResult.rds"), emit: cogapsResult
path "versions.yml", emit: versions

stub:
def args = task.ext.args ?: ''
prefix = task.ext.prefix ?: "${meta.id}/${cparams.niterations}-${cparams.npatterns}-${cparams.sparse}-${cparams.distributed}"
"""
mkdir "${prefix}"
touch "${prefix}/cogapsResult.rds"
cat <<-END_VERSIONS > versions.yml
"${task.process}":
CoGAPS: \$(Rscript -e 'print(packageVersion("CoGAPS"))' | awk '{print \$2}')
R: \$(Rscript -e 'print(packageVersion("base"))' | awk '{print \$2}')
END_VERSIONS
"""

script:
def args = task.ext.args ?: ''
prefix = task.ext.prefix ?: "${meta.id}/${cparams.niterations}-${cparams.npatterns}-${cparams.sparse}-${cparams.distributed}"
"""
mkdir -p "${prefix}"
Rscript -e 'library("CoGAPS");
sparse <- readRDS("$dgCMatrix");
#select top 5K genes
message("finding top ", ${params.n_top_genes}, " genes");
vars <- apply(sparse, 1, var);
ngenes <- min(length(vars),${params.n_top_genes});
top_genes <- order(vars, decreasing=TRUE)[1:ngenes];
sparse <- sparse[top_genes,];
message("selected top ", length(top_genes), " genes of ", length(vars));

data <- as.matrix(sparse);
#avoid errors with distributed params
dist_param <- NULL;
Expand Down Expand Up @@ -70,6 +49,20 @@ process COGAPS {
R: \$(Rscript -e 'print(packageVersion("base"))' | awk '{print \$2}')
END_VERSIONS
"""

stub:
def args = task.ext.args ?: ''
prefix = task.ext.prefix ?: "${meta.id}/${cparams.niterations}-${cparams.npatterns}-${cparams.sparse}-${cparams.distributed}"
"""
mkdir "${prefix}"
touch "${prefix}/cogapsResult.rds"
cat <<-END_VERSIONS > versions.yml
"${task.process}":
CoGAPS: \$(Rscript -e 'print(packageVersion("CoGAPS"))' | awk '{print \$2}')
R: \$(Rscript -e 'print(packageVersion("base"))' | awk '{print \$2}')
END_VERSIONS
"""

}

process COGAPS_TENX2DGC {
Expand All @@ -83,28 +76,30 @@ process COGAPS_TENX2DGC {
tuple val(meta), path("${prefix}/dgCMatrix.rds"), emit: dgCMatrix
path "versions.yml" , emit: versions

stub:

script:
def args = task.ext.args ?: ''
prefix = task.ext.prefix ?: "${meta.id}"

"""
mkdir "${prefix}"
touch "${prefix}/dgCMatrix.rds"

Rscript -e 'res <- Seurat::Read10X("$data/filtered_feature_bc_matrix/");
saveRDS(res, file="${prefix}/dgCMatrix.rds")';

cat <<-END_VERSIONS > versions.yml
"${task.process}":
seurat: \$(Rscript -e 'print(packageVersion("Seurat"))' | awk '{print \$2}')
R: \$(Rscript -e 'print(packageVersion("base"))' | awk '{print \$2}')
END_VERSIONS
"""

script:
stub:
def args = task.ext.args ?: ''
prefix = task.ext.prefix ?: "${meta.id}"

"""
mkdir "${prefix}"

Rscript -e 'res <- Seurat::Read10X("$data/filtered_feature_bc_matrix/");
saveRDS(res, file="${prefix}/dgCMatrix.rds")';
touch "${prefix}/dgCMatrix.rds"

cat <<-END_VERSIONS > versions.yml
"${task.process}":
Expand All @@ -125,20 +120,6 @@ process COGAPS_ADATA2DGC {
tuple val(meta), path("${prefix}/dgCMatrix.rds"), emit: dgCMatrix
path "versions.yml" , emit: versions

stub:
def args = task.ext.args ?: ''
prefix = task.ext.prefix ?: "${meta.id}"

"""
mkdir "${prefix}"
touch "${prefix}/dgCMatrix.rds"
cat <<-END_VERSIONS > versions.yml
"${task.process}":
hdf5r: \$(Rscript -e 'print(packageVersion("Seurat"))' | awk '{print \$2}')
R: \$(Rscript -e 'print(packageVersion("base"))' | awk '{print \$2}')
END_VERSIONS
"""

script:
def args = task.ext.args ?: ''
prefix = task.ext.prefix ?: "${meta.id}"
Expand Down Expand Up @@ -183,6 +164,88 @@ process COGAPS_ADATA2DGC {
R: \$(Rscript -e 'print(packageVersion("base"))' | awk '{print \$2}')
END_VERSIONS
"""

stub:
def args = task.ext.args ?: ''
prefix = task.ext.prefix ?: "${meta.id}"

"""
mkdir "${prefix}"
touch "${prefix}/dgCMatrix.rds"
cat <<-END_VERSIONS > versions.yml
"${task.process}":
hdf5r: \$(Rscript -e 'print(packageVersion("hdf5r"))' | awk '{print \$2}')
R: \$(Rscript -e 'print(packageVersion("base"))' | awk '{print \$2}')
END_VERSIONS
"""
}

process COGAPS_PREPROCESS {
tag "$prefix"
label 'process_medium'
container 'ghcr.io/fertiglab/cogaps:master'

input:
tuple val(meta), path(dgCMatrix)

output:
tuple val(meta), path("${prefix}/dgCMatrix.rds"), emit: dgCMatrix
path "versions.yml", emit: versions

script:
def args = task.ext.args ?: ''
prefix = task.ext.prefix ?: "${meta.id}"
"""
mkdir -p "${prefix}"
Rscript -e 'library("Matrix");
library("sparseMatrixStats")
sparse <- readRDS("$dgCMatrix");

#sparsity is
message("sparsity: ", sum(sparse==0)/ (nrow(sparse)*ncol(sparse)));

#drop rows with > 95% zero counts
message("filtering rows with >95% zeros");
nz <- rowSums(sparse != 0);
sparse <- sparse[nz > 0.05 * ncol(sparse),];
message("filtered to ", nrow(sparse), " columns of ", length(nz));

#drop columns with > 95% zero counts
message("filtering columns with >95% zeros");
nz <- colSums(sparse != 0);
sparse <- sparse[,nz > 0.05 * nrow(sparse)];
message("filtered to ", ncol(sparse), " rows of ", length(nz));

#resulting sparsity is
message("sparsity: ", sum(sparse==0)/ (nrow(sparse)*ncol(sparse)));

#select top N genes
message("finding top ", ${params.n_top_genes}, " genes");
vars <- rowVars(sparse);
ngenes <- min(length(vars),${params.n_top_genes});
top_genes <- order(vars, decreasing=TRUE)[1:ngenes];
sparse <- sparse[top_genes,];
message("selected top ", length(top_genes), " genes of ", length(vars));

saveRDS(sparse, file = "${prefix}/dgCMatrix.rds")'

cat <<-END_VERSIONS > versions.yml
"${task.process}":
R: \$(Rscript -e 'print(packageVersion("base"))' | awk '{print \$2}')
END_VERSIONS
"""

stub:
def args = task.ext.args ?: ''
prefix = task.ext.prefix ?: "${meta.id}"
"""
mkdir "${prefix}"
touch "${prefix}/dgCMatrix.rds"
cat <<-END_VERSIONS > versions.yml
"${task.process}":
R: \$(Rscript -e 'print(packageVersion("base"))' | awk '{print \$2}')
END_VERSIONS
"""
}


Expand All @@ -209,9 +272,17 @@ workflow {
// convert adata to dgCMatrix
COGAPS_ADATA2DGC(ch_adata)

// preprocess dgCMatrix
ch_preprocess = COGAPS_ADATA2DGC.out.dgCMatrix
.map { tuple(it[0], it[1]) }

ch_preprocess = ch_preprocess.mix(ch_rds)

COGAPS_PREPROCESS(ch_preprocess)

// ch_cogaps_input of converted adatas and rdses
ch_input = COGAPS_ADATA2DGC.out.dgCMatrix
ch_input = ch_input.mix(ch_rds)
ch_input = COGAPS_PREPROCESS.out.dgCMatrix
.map { tuple(it[0], it[1]) }

// combine the two channels as input to CoGAPS
ch_input = ch_input.combine(ch_cparams)
Expand All @@ -220,4 +291,5 @@ workflow {
}

//example:
//nextflow run main.nf --input tests/nextflow --outdir out -c nextflow.config -profile docker
//nextflow run main.nf --input tests/nextflow --outdir out -c nextflow.config -profile docker --max_memory 10GB --max_cpus 8

35 changes: 19 additions & 16 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ params {
distributed = "null"
nthreads = 1

max_memory = '128.GB'
max_cpus = 8
max_memory = '200.GB'
max_cpus = 16
max_time = '72.h'

n_top_genes = 5000
Expand Down Expand Up @@ -91,6 +91,8 @@ process {
memory = { check_max( 6.GB * task.attempt, 'memory' ) }
time = { check_max( 4.h * task.attempt, 'time' ) }

resourceLimits = [cpus: params.max_cpus, memory: params.max_memory, time: params.max_time]

errorStrategy = { task.exitStatus in ((130..145) + 104) ? 'retry' : 'finish' }
maxRetries = 1
maxErrors = '-1'
Expand All @@ -103,30 +105,31 @@ process {
// TODO nf-core: Customise requirements for specific processes.
// See https://www.nextflow.io/docs/latest/config.html#config-process-selectors
withLabel:process_single {
cpus = { check_max( 1 , 'cpus' ) }
memory = { check_max( 6.GB * task.attempt, 'memory' ) }
time = { check_max( 4.h * task.attempt, 'time' ) }
cpus = { 1 }
memory = { 6.GB * task.attempt }
time = { 4.h * task.attempt }
}
withLabel:process_low {
cpus = { check_max( 2 * task.attempt, 'cpus' ) }
memory = { check_max( 12.GB * task.attempt, 'memory' ) }
time = { check_max( 4.h * task.attempt, 'time' ) }
cpus = { 2 * task.attempt }
memory = { 12.GB * task.attempt }
time = { 4.h * task.attempt }
}
withLabel:process_medium {
cpus = { check_max( 6 * task.attempt, 'cpus' ) }
memory = { check_max( 24.GB * task.attempt, 'memory' ) }
time = { check_max( 8.h * task.attempt, 'time' ) }
cpus = { 6 * task.attempt }
memory = { 24.GB * task.attempt }
time = { 8.h * task.attempt }
}
withLabel:process_high {
cpus = { check_max( 12 * task.attempt, 'cpus' ) }
memory = { check_max( 72.GB * task.attempt, 'memory' ) }
time = { check_max( 16.h * task.attempt, 'time' ) }
cpus = { 16 * task.attempt }
memory = { 72.GB * task.attempt }
time = { 16.h * task.attempt }
}
withLabel:process_long {
time = { check_max( 48.h * task.attempt, 'time' ) }
time = { 48.h * task.attempt }
}
withLabel:process_high_memory {
memory = { check_max( 200.GB * task.attempt, 'memory' ) }
memory = { 200.GB * task.attempt }

}
withLabel:error_ignore {
errorStrategy = 'ignore'
Expand Down
47 changes: 47 additions & 0 deletions tests/testthat/test_DistributedCogaps.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
test_that("featureLoadings and sampleFactors are not all 0s in single-cell", {
params <- CogapsParams(seed=42,
nIterations = 100,
nPatterns = 2,
sparseOptimization = as.logical(0),
distributed="single-cell")

params <- setDistributedParams(params, nSets = 2)
data(GIST)
cg <- CoGAPS(GIST.matrix, params=params)

featureLoadings <- cg@featureLoadings
sampleFactors <- cg@sampleFactors

# Check featureLoadings and sampleFactors: smaller dimension is nPatterns
# and larger dimension matches data dimensions
expect_false(all(featureLoadings == 0))
expect_false(all(sampleFactors == 0))
expect_true(sort((dim(sampleFactors)))[1] == params@nPatterns)
expect_true(sort((dim(featureLoadings)))[1] == params@nPatterns)
expect_true(sort((dim(sampleFactors)))[2] == ncol(GIST.matrix))
expect_true(sort((dim(featureLoadings)))[2] == nrow(GIST.matrix))
})

test_that("featureLoadings and sampleFactors are not all 0s in genome-wide", {
params <- CogapsParams(seed=42,
nIterations = 100,
nPatterns = 2,
sparseOptimization = as.logical(0),
distributed="genome-wide")

params <- setDistributedParams(params, nSets = 2)
data(GIST)
cg <- CoGAPS(GIST.matrix, params=params)

featureLoadings <- cg@featureLoadings
sampleFactors <- cg@sampleFactors

# Check featureLoadings and sampleFactors: smaller dimension is nPatterns
# and larger dimension matches data dimensions
expect_false(all(featureLoadings == 0))
expect_false(all(sampleFactors == 0))
expect_true(sort((dim(sampleFactors)))[1] == params@nPatterns)
expect_true(sort((dim(featureLoadings)))[1] == params@nPatterns)
expect_true(sort((dim(sampleFactors)))[2] == ncol(GIST.matrix))
expect_true(sort((dim(featureLoadings)))[2] == nrow(GIST.matrix))
})
Loading