Problem
Relatvant to #954 mentioning that we will handle NON_SYN, etc.. in another discussion.
In malariagen_data/veff.py, the _get_within_cds_effect() function currently
classifies all missense SNPs that don't match START_LOST, STOP_LOST, or
STOP_GAINED as NON_SYNONYMOUS_CODING, regardless of whether the mutation
is at a start or stop codon position.
There was already a # TODO NON_SYNONYMOUS_START and NON_SYNONYMOUS_STOP
comment at line 359 acknowledging this gap.
This leads to two missing classifications:
- A missense SNP at CDS position 0 (non-canonical start codon) is labeled
NON_SYNONYMOUS_CODING — should be NON_SYNONYMOUS_START
- A SNP where two distinct stop codon triplets are involved is labeled
NON_SYNONYMOUS_CODING — should be NON_SYNONYMOUS_STOP
Proposed Changes
- malariagen_data/veff.py: Split the
else branch into three cases
- tests/anoph/test_snp_frq.py: Add new effects to
expected_effects allowlist
- tests/test_veff.py: Add unit tests covering the full SNP effect taxonomy
Problem
Relatvant to #954 mentioning that we will handle NON_SYN, etc.. in another discussion.
In malariagen_data/veff.py, the _get_within_cds_effect() function currently
classifies all missense SNPs that don't match
START_LOST,STOP_LOST, orSTOP_GAINEDasNON_SYNONYMOUS_CODING, regardless of whether the mutationis at a start or stop codon position.
There was already a
# TODO NON_SYNONYMOUS_START and NON_SYNONYMOUS_STOPcomment at line 359 acknowledging this gap.
This leads to two missing classifications:
NON_SYNONYMOUS_CODING— should beNON_SYNONYMOUS_STARTNON_SYNONYMOUS_CODING— should beNON_SYNONYMOUS_STOPProposed Changes
elsebranch into three casesexpected_effectsallowlist