Pheno pseudobulk by KevinMLanderos · Pull Request #71 · KarchinLab/TCRtoolkit

KevinMLanderos · 2026-01-11T23:09:36Z

This section allows the pipeline to run 'SAMPLE' and 'COMPARE' modules on samples pseudo-bulked by phenotype.

…x' when testing on GBM data

… folder

github-actions · 2026-01-11T23:13:04Z

Unit Test Results

10 tests ±0 10 ✅ ±0 2m 45s ⏱️ -1s
2 suites ±0 0 💤 ±0
1 files ±0 0 ❌ ±0

Results for commit 9113490. ± Comparison against base commit b862c22.

♻️ This comment has been updated with latest results.

Copilot

Pull request overview

This PR adds phenotype-based pseudobulking functionality to the TCR toolkit pipeline, allowing samples to be pseudo-bulked by phenotype annotations from Seurat objects and then analyzed through the existing SAMPLE and COMPARE modules.

Changes:

Added phenotype pseudobulking workflow that processes samples by phenotype when a Seurat GEX object is provided
Extended the AIRR_CONVERT subworkflow to generate phenotype-specific pseudobulk files from CellRanger data
Created parallel analysis paths (SAMPLE_PHENO and COMPARE_PHENO) to process phenotype-segregated data
Enhanced Python scripts to support phenotype metadata extraction and improved data handling in compare_calc.py

Reviewed changes

Copilot reviewed 12 out of 15 changed files in this pull request and generated 13 comments.

Show a summary per file

File	Description
workflows/tcrtoolkit.nf	Main workflow integration of phenotype analysis with conditional execution based on sobject_gex parameter
subworkflows/local/sample_pheno.nf	New subworkflow for sample-level analysis of phenotype-pseudobulked data
subworkflows/local/compare_pheno.nf	New subworkflow for comparison analysis of phenotype-pseudobulked data
subworkflows/local/resolve_samplesheet_pheno.nf	New subworkflow to collect phenotype sample files for downstream processing
subworkflows/local/map_phenotypes.nf	New subworkflow to transform phenotype files and generate phenotype-specific samplesheet
subworkflows/local/airr_convert.nf	Extended to support phenotype pseudobulking for CellRanger input format
subworkflows/local/sample.nf	Removed commented-out emit statements
subworkflows/local/compare.nf	Added collectFile operations for similarity matrices and removed commented code
modules/local/samplesheet/generate_pheno_samplesheet.nf	New module to generate phenotype-specific samplesheet from metadata
modules/local/airr_convert/pseudobulk_phenotype_cellranger.nf	New module to run phenotype-based pseudobulking on CellRanger data
modules/local/airr_convert/pseudobulk_cellranger.nf	Added container directive
conf/modules.config	Added publishDir configurations for phenotype analysis outputs
bin/pseudobulk.py	Added phenotype processing functions and command-line options
bin/create_pheno_samplesheet.py	New script to generate phenotype samplesheet from JSON metadata
bin/compare_calc.py	Improved data handling with better validation and numeric type handling

Comments suppressed due to low confidence (1)

subworkflows/local/compare.nf:56

The collectFile operations are collecting outputs from COMPARE_CALC.out but then COMPARE_PLOT is still using the original COMPARE_CALC.out channels (lines 50-52), not the collected files. This creates redundant file collection. Either use the collected files in COMPARE_PLOT or remove the collectFile operations if they're not needed.

    COMPARE_CALC.out.jaccard_mat
        .collectFile(name: 'jaccard_mat.csv', sort: true, 
                     storeDir: "${params.outdir}/compare")
        .set { jaccard_mat }

    COMPARE_CALC.out.sorensen_mat
        .collectFile(name: 'sorensen_mat.csv', sort: true, 
                     storeDir: "${params.outdir}/compare")
        .set { sorensen_mat }

    COMPARE_CALC.out.morisita_mat
        .collectFile(name: 'morisita_mat.csv', sort: true, 
                     storeDir: "${params.outdir}/compare")
        .set { morisita_mat }

    COMPARE_PLOT( samplesheet_resolved,
                  COMPARE_CALC.out.jaccard_mat,
                  COMPARE_CALC.out.sorensen_mat,
                  COMPARE_CALC.out.morisita_mat,
                  file(params.compare_stats_template),
                  params.project_name,
                  all_sample_files
                  )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

modules/local/airr_convert/pseudobulk_phenotype_cellranger.nf

subworkflows/local/map_phenotypes.nf

subworkflows/local/airr_convert.nf

workflows/tcrtoolkit.nf

subworkflows/local/map_phenotypes.nf

modules/local/samplesheet/generate_pheno_samplesheet.nf

workflows/tcrtoolkit.nf

- reformat convert subworkflow into more granular branches - temporarily using phenotype .csv from SCRATCH-annotate as GEX input - code cleanup to minimize needed inputs To do: - clean up pseudobulk.py - remove hardcoding of metadata - needs more robust sample/cell id/barcode matching - update GEX input from SCRATCH (make generalizable) - update readme - reformat data flow so ps-phenotype files can go through truncated version of main pipeline, rather than its own subworkflow

KevinMLanderos added 4 commits January 8, 2026 20:20

Added psedobulk-by-phenotype feature to pipeline

6a4e054

Fixed AttributeError: 'numpy.ndarray' object has no attribute 'reinde…

ef0ee0f

…x' when testing on GBM data

Set a different output folder for compare_pheno results

1260480

Set compare_pheno and sample_pheno results in their respective output…

74e4a48

… folder

Add RESOLVE_SAMPLESHEET_PHENO workflow

32c731c

dimalvovs requested a review from Copilot January 20, 2026 16:33

Copilot started reviewing on behalf of dimalvovs January 20, 2026 16:34 View session

Copilot AI reviewed Jan 20, 2026

View reviewed changes

dltamayo merged commit 55167e4 into main Jan 23, 2026
3 checks passed

dltamayo mentioned this pull request Feb 6, 2026

Phenotype-grouped pseudobulk #48

Closed

dltamayo deleted the pheno-pseudobulk branch March 3, 2026 18:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pheno pseudobulk#71

Pheno pseudobulk#71
dltamayo merged 6 commits intomainfrom
pheno-pseudobulk

KevinMLanderos commented Jan 11, 2026

Uh oh!

github-actions bot commented Jan 11, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

KevinMLanderos commented Jan 11, 2026

Uh oh!

github-actions bot commented Jan 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Unit Test Results

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Jan 11, 2026 •

edited

Loading