Merged
Conversation
…x' when testing on GBM data
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds phenotype-based pseudobulking functionality to the TCR toolkit pipeline, allowing samples to be pseudo-bulked by phenotype annotations from Seurat objects and then analyzed through the existing SAMPLE and COMPARE modules.
Changes:
- Added phenotype pseudobulking workflow that processes samples by phenotype when a Seurat GEX object is provided
- Extended the AIRR_CONVERT subworkflow to generate phenotype-specific pseudobulk files from CellRanger data
- Created parallel analysis paths (SAMPLE_PHENO and COMPARE_PHENO) to process phenotype-segregated data
- Enhanced Python scripts to support phenotype metadata extraction and improved data handling in compare_calc.py
Reviewed changes
Copilot reviewed 12 out of 15 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| workflows/tcrtoolkit.nf | Main workflow integration of phenotype analysis with conditional execution based on sobject_gex parameter |
| subworkflows/local/sample_pheno.nf | New subworkflow for sample-level analysis of phenotype-pseudobulked data |
| subworkflows/local/compare_pheno.nf | New subworkflow for comparison analysis of phenotype-pseudobulked data |
| subworkflows/local/resolve_samplesheet_pheno.nf | New subworkflow to collect phenotype sample files for downstream processing |
| subworkflows/local/map_phenotypes.nf | New subworkflow to transform phenotype files and generate phenotype-specific samplesheet |
| subworkflows/local/airr_convert.nf | Extended to support phenotype pseudobulking for CellRanger input format |
| subworkflows/local/sample.nf | Removed commented-out emit statements |
| subworkflows/local/compare.nf | Added collectFile operations for similarity matrices and removed commented code |
| modules/local/samplesheet/generate_pheno_samplesheet.nf | New module to generate phenotype-specific samplesheet from metadata |
| modules/local/airr_convert/pseudobulk_phenotype_cellranger.nf | New module to run phenotype-based pseudobulking on CellRanger data |
| modules/local/airr_convert/pseudobulk_cellranger.nf | Added container directive |
| conf/modules.config | Added publishDir configurations for phenotype analysis outputs |
| bin/pseudobulk.py | Added phenotype processing functions and command-line options |
| bin/create_pheno_samplesheet.py | New script to generate phenotype samplesheet from JSON metadata |
| bin/compare_calc.py | Improved data handling with better validation and numeric type handling |
Comments suppressed due to low confidence (1)
subworkflows/local/compare.nf:56
- The collectFile operations are collecting outputs from COMPARE_CALC.out but then COMPARE_PLOT is still using the original COMPARE_CALC.out channels (lines 50-52), not the collected files. This creates redundant file collection. Either use the collected files in COMPARE_PLOT or remove the collectFile operations if they're not needed.
COMPARE_CALC.out.jaccard_mat
.collectFile(name: 'jaccard_mat.csv', sort: true,
storeDir: "${params.outdir}/compare")
.set { jaccard_mat }
COMPARE_CALC.out.sorensen_mat
.collectFile(name: 'sorensen_mat.csv', sort: true,
storeDir: "${params.outdir}/compare")
.set { sorensen_mat }
COMPARE_CALC.out.morisita_mat
.collectFile(name: 'morisita_mat.csv', sort: true,
storeDir: "${params.outdir}/compare")
.set { morisita_mat }
COMPARE_PLOT( samplesheet_resolved,
COMPARE_CALC.out.jaccard_mat,
COMPARE_CALC.out.sorensen_mat,
COMPARE_CALC.out.morisita_mat,
file(params.compare_stats_template),
params.project_name,
all_sample_files
)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- reformat convert subworkflow into more granular branches - temporarily using phenotype .csv from SCRATCH-annotate as GEX input - code cleanup to minimize needed inputs To do: - clean up pseudobulk.py - remove hardcoding of metadata - needs more robust sample/cell id/barcode matching - update GEX input from SCRATCH (make generalizable) - update readme - reformat data flow so ps-phenotype files can go through truncated version of main pipeline, rather than its own subworkflow
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This section allows the pipeline to run 'SAMPLE' and 'COMPARE' modules on samples pseudo-bulked by phenotype.