Wildtype sequencing or false-double variant based count error corrections by MaximilianStammnitz · Pull Request #44 · nf-core/deepmutscan

MaximilianStammnitz · 2026-03-21T17:58:16Z

PR checklist

…templates/environment.yml

MaximilianStammnitz · 2026-03-21T18:35:07Z

@BenjaminWehnert1008 here there'll be quite a few things still to discuss / edit:

How should the WT sequencing sample(s) be specified in the sample sheet .csv?
Should we run this seq error correction function (only takes a few seconds) as a loop over all samples in a single process, or rather as one process per sample?
Add this option to the nf-core input command (--wt_count_error_correction) and in that case there needs to be an extra check if the corresponding sequencing fastqs are actually specified in the sample sheet .csv
Need to generate the corresponding process / main.nf, potentially update config file(s), etc.
Need to adapt this R script so that the input path / output path / threshold parameter are correct
All the downstream tasks (e.g. heatmaps, fitness calculation, etc.) need to use variantCounts_filtered_by_library_err_corrected.csv instead of variantCounts_filtered_by_library.csv and variantCounts_for_heatmaps_err_corrected.csv instead of variantCounts_for_heatmaps.csv

…tion/templates/readme.md

MaximilianStammnitz · 2026-03-23T19:11:44Z

@BenjaminWehnert1008 here there'll be quite a few things still to discuss / edit:

1. How should the WT sequencing sample(s) be specified in the sample sheet .csv?
2. Should we run this seq error correction function (only takes a few seconds) as a loop over all samples in a single process, or rather as one process per sample?
3. Add this option to the nf-core input command (`--wt_count_error_correction`) and in that case there needs to be an extra check if the corresponding sequencing fastqs are actually specified in the sample sheet .csv
4. Need to generate the corresponding process / main.nf, potentially update config file(s), etc.
5. Need to adapt this R script so that the input path / output path / threshold parameter are correct
6. All the downstream tasks (e.g. heatmaps, fitness calculation, etc.) need to use `variantCounts_filtered_by_library_err_corrected.csv` instead of `variantCounts_filtered_by_library.csv` and `variantCounts_for_heatmaps_err_corrected.csv` instead of `variantCounts_for_heatmaps.csv`

Now also added the preliminary scripts for sample-intrinsic, false-double variant based count error normalisation. Many of the above points which affect the pipeline apply similarly here, too. We definitely need to benchmark all of these additions/edits very carefully.

In the end, users should specify a flag --count_error_correction with three options: none (default), false_doubles or WT_seq; and in the latter case we'll need to add a check for that particular PE fastq in the sample design .csv

MaximilianStammnitz added 4 commits March 21, 2026 18:54

Create wt_based_seq_error_correction.R

19461e9

Add files via upload

2a1585d

Add files via upload

496b9f8

Delete modules/local/dmsanalysis/wildtype_based_seq_error_correction/…

eb7d738

…templates/environment.yml

MaximilianStammnitz requested a review from BenjaminWehnert1008 March 21, 2026 17:58

MaximilianStammnitz added 5 commits March 23, 2026 13:56

Update wt_based_seq_error_correction.R

f91eab1

Create readme.md

d159fdc

Create false-doubles_based_seq_error_correction.R

f4a20a7

Delete modules/local/dmsanalysis/false-doubles_based_seq_error_correc…

0cc82cc

…tion/templates/readme.md

Add files via upload

f4d570f

MaximilianStammnitz changed the title ~~Wildtype sequencing-based count error correction~~ Wildtype sequencing or or false-double based count error corrections Mar 23, 2026

Update wt_based_seq_error_correction.R

63651c1

MaximilianStammnitz changed the title ~~Wildtype sequencing or or false-double based count error corrections~~ Wildtype sequencing or false-double based count error corrections Mar 23, 2026

MaximilianStammnitz changed the title ~~Wildtype sequencing or false-double based count error corrections~~ Wildtype sequencing or false-double variant based count error corrections Mar 23, 2026

MaximilianStammnitz added 2 commits March 25, 2026 00:31

Update false-doubles_based_seq_error_correction.R

7506bc0

Update gatk_to_fitness.R

007e57b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wildtype sequencing or false-double variant based count error corrections#44

Wildtype sequencing or false-double variant based count error corrections#44
MaximilianStammnitz wants to merge 12 commits intonf-core:devfrom
MaximilianStammnitz:dev

MaximilianStammnitz commented Mar 21, 2026

Uh oh!

MaximilianStammnitz commented Mar 21, 2026

Uh oh!

MaximilianStammnitz commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MaximilianStammnitz commented Mar 21, 2026

PR checklist

Uh oh!

MaximilianStammnitz commented Mar 21, 2026

Uh oh!

MaximilianStammnitz commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant