Skip to content

Wildtype sequencing or false-double variant based count error corrections#44

Open
MaximilianStammnitz wants to merge 12 commits intonf-core:devfrom
MaximilianStammnitz:dev
Open

Wildtype sequencing or false-double variant based count error corrections#44
MaximilianStammnitz wants to merge 12 commits intonf-core:devfrom
MaximilianStammnitz:dev

Conversation

@MaximilianStammnitz
Copy link
Copy Markdown
Collaborator

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/deepmutscan branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@MaximilianStammnitz
Copy link
Copy Markdown
Collaborator Author

@BenjaminWehnert1008 here there'll be quite a few things still to discuss / edit:

  1. How should the WT sequencing sample(s) be specified in the sample sheet .csv?
  2. Should we run this seq error correction function (only takes a few seconds) as a loop over all samples in a single process, or rather as one process per sample?
  3. Add this option to the nf-core input command (--wt_count_error_correction) and in that case there needs to be an extra check if the corresponding sequencing fastqs are actually specified in the sample sheet .csv
  4. Need to generate the corresponding process / main.nf, potentially update config file(s), etc.
  5. Need to adapt this R script so that the input path / output path / threshold parameter are correct
  6. All the downstream tasks (e.g. heatmaps, fitness calculation, etc.) need to use variantCounts_filtered_by_library_err_corrected.csv instead of variantCounts_filtered_by_library.csv and variantCounts_for_heatmaps_err_corrected.csv instead of variantCounts_for_heatmaps.csv

@MaximilianStammnitz MaximilianStammnitz changed the title Wildtype sequencing-based count error correction Wildtype sequencing or or false-double based count error corrections Mar 23, 2026
@MaximilianStammnitz
Copy link
Copy Markdown
Collaborator Author

@BenjaminWehnert1008 here there'll be quite a few things still to discuss / edit:

1. How should the WT sequencing sample(s) be specified in the sample sheet .csv?
2. Should we run this seq error correction function (only takes a few seconds) as a loop over all samples in a single process, or rather as one process per sample?
3. Add this option to the nf-core input command (`--wt_count_error_correction`) and in that case there needs to be an extra check if the corresponding sequencing fastqs are actually specified in the sample sheet .csv
4. Need to generate the corresponding process / main.nf, potentially update config file(s), etc.
5. Need to adapt this R script so that the input path / output path / threshold parameter are correct
6. All the downstream tasks (e.g. heatmaps, fitness calculation, etc.) need to use `variantCounts_filtered_by_library_err_corrected.csv` instead of `variantCounts_filtered_by_library.csv` and `variantCounts_for_heatmaps_err_corrected.csv` instead of `variantCounts_for_heatmaps.csv`

Now also added the preliminary scripts for sample-intrinsic, false-double variant based count error normalisation. Many of the above points which affect the pipeline apply similarly here, too. We definitely need to benchmark all of these additions/edits very carefully.

In the end, users should specify a flag --count_error_correction with three options: none (default), false_doubles or WT_seq; and in the latter case we'll need to add a check for that particular PE fastq in the sample design .csv

@MaximilianStammnitz MaximilianStammnitz changed the title Wildtype sequencing or or false-double based count error corrections Wildtype sequencing or false-double based count error corrections Mar 23, 2026
@MaximilianStammnitz MaximilianStammnitz changed the title Wildtype sequencing or false-double based count error corrections Wildtype sequencing or false-double variant based count error corrections Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant