It's unclear to me when to use the --no_force_align option to ProGraph. The README describes this as
do not force alignment of initial Methionine
What's the scientific motivation for skipping initial M by default?
I ask because of a potential bug in the interaction with the --repeat option, which matches the sequences to a T-Reks output alignment. These files reference sequence positions, so they cause an off-by-one error if the M was stripped.
I can think of several possible solutions:
- Default to
--no_force_align when the --repeat option is also specified
- For each sequence, store a flag indicating whether it has been truncated. If so, account for that when reading in the repeats file
- Be more permissive when verifying the FASTA/T-REKS alignment. Automatically recover from off-by-one errors in the coordinates. (This would have the side benefit of supporting malformed T-Reks files that used 0-based indexes rather than the correct 1-based positions.)
It's unclear to me when to use the
--no_force_alignoption to ProGraph. The README describes this asWhat's the scientific motivation for skipping initial M by default?
I ask because of a potential bug in the interaction with the
--repeatoption, which matches the sequences to a T-Reks output alignment. These files reference sequence positions, so they cause an off-by-one error if the M was stripped.I can think of several possible solutions:
--no_force_alignwhen the--repeatoption is also specified