I have a BAM file with reads that contain an insertion at a specific position. I noticed that the basecall counts in the --output file produced by rrbsSnp (and subsequently, the genotypes in the VCF file created by BS-Snper.pl) were inaccurate around this insertion.
The CIGAR string for these reads has three operations (e.g., 143M1I7M), but it appears that only the last operation is being processed, regardless of the number of CIGAR operations in a read. I confirmed this by checking the value of record->cigar and record->len at the end of sam_funcs.c:parseBuffer() - in the example above, the values are 7M and 7, respectively, meaning that only 7 bases are included in the MapRecord for this read (instead of 150), and the basecalls are recorded in the wrong reference positions.
I have a BAM file with reads that contain an insertion at a specific position. I noticed that the basecall counts in the
--outputfile produced byrrbsSnp(and subsequently, the genotypes in the VCF file created byBS-Snper.pl) were inaccurate around this insertion.The CIGAR string for these reads has three operations (e.g.,
143M1I7M), but it appears that only the last operation is being processed, regardless of the number of CIGAR operations in a read. I confirmed this by checking the value ofrecord->cigarandrecord->lenat the end ofsam_funcs.c:parseBuffer()- in the example above, the values are7Mand7, respectively, meaning that only 7 bases are included in theMapRecordfor this read (instead of 150), and the basecalls are recorded in the wrong reference positions.