Fix prepDE.py3 KeyError for transcripts missing from first sample by Theob0t · Pull Request #503 · gpertea/stringtie

Theob0t · 2026-02-26T16:38:26Z

Fixes #428, related to #337.

Problem

geneIDs is built from the first sample only (loop 1 breaks after first successful parse, line 176). t_dict accumulates transcripts across all samples. When loop 2 iterates t_dict to build the gene count matrix, transcripts present in later samples but absent from sample 1 cause a KeyError at line 279.

This triggers when some samples have zero coverage for transcripts present in other samples. Common with large diverse cohorts, especially when upstream filtering reduces read counts (e.g. samtools view -q 255 | stringtie -e).

The current defaultdict(lambda: str) on master suppresses the crash but maps missing transcripts to the str type object as a key, silently corrupting the gene count matrix.

Fix

Skip transcripts not in geneIDs. Their per-sample counts are still written correctly to the transcript count matrix from t_dict.

Testing

334 RNA-seq samples via TEProf3 (samtools view -q 255 | stringtie -e). Crashes without fix, both matrices generated successfully with fix.

When StringTie -e is used with piped input (e.g. samtools view -q 255 | stringtie -), transcripts with zero passing-filter reads may be omitted from the output GTF. Since geneIDs is only populated from the first sample (loop 1 breaks after the first successful parse), transcripts that appear in later samples but not the first cause a KeyError at line 279. The current defaultdict(lambda: str) suppresses the crash but silently maps missing transcripts to the str type object, corrupting the gene count matrix. Fix: skip transcripts not present in geneIDs. These are transcripts with zero counts in the first sample — their transcript-level counts are still written correctly to the transcript count matrix from t_dict. Fixes gpertea#428, related to gpertea#337.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix prepDE.py3 KeyError for transcripts missing from first sample#503

Fix prepDE.py3 KeyError for transcripts missing from first sample#503
Theob0t wants to merge 1 commit intogpertea:masterfrom
Theob0t:fix-prepde-keyerror-large-cohorts

Theob0t commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Theob0t commented Feb 26, 2026

Problem

Fix

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant