Hey,
thanks for the great simulation tool! We simulated reads using this command:
conda activate SWAMPy
python simulate_metagenome.py \
--genomes_file ../example/viralref.fasta \
--temp_folder ../example/${sample}_temp \
--genome_abundances ../example/${sample}.tsv \
--primer_set a4 \
--output_folder ../eebg24_highq \
--output_filename_prefix ${sample} \
--n_reads 1000000 \
--seqSys MSv3 \
--read_length 250 \
--seed 10 \
--amplicon_distribution dirichlet_1 \
--amplicon_pseudocounts 200 \
--unique_insertion_rate 0.00002 \
--unique_deletion_rate 0.000115 \
--unique_substitution_rate 0.002485 \
--recurrent_insertion_rate 0.00002 \
--recurrent_deletion_rate 0 \
--recurrent_substitution_rate 0.003357 \
--max_insertion_length 14 \
--subs_VAF_alpha 0.36,0.27 \
--del_VAF_alpha 0.59,0.42 \
--ins_VAF_alpha 0.33,0.45 \
--r_subs_VAF_alpha 0.36,0.27 \
--r_del_VAF_alpha 0.59,0.42 \
--r_ins_VAF_alpha 0.33,0.45
but when we check the quality pattern with FASTQC we see such profiles for the paired-end reads:
Forward read
Reverse read
Usually, the read quality drops towards the 5' end and not the 3' end (see https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). So it looks flipped? Although the general error profile looks quite good otherwise!
Did we do something wrong in the simulation, or is this a bug?
(we see the same pattern for all reads we simulated)
Thanks!
Hey,
thanks for the great simulation tool! We simulated reads using this command:
conda activate SWAMPy python simulate_metagenome.py \ --genomes_file ../example/viralref.fasta \ --temp_folder ../example/${sample}_temp \ --genome_abundances ../example/${sample}.tsv \ --primer_set a4 \ --output_folder ../eebg24_highq \ --output_filename_prefix ${sample} \ --n_reads 1000000 \ --seqSys MSv3 \ --read_length 250 \ --seed 10 \ --amplicon_distribution dirichlet_1 \ --amplicon_pseudocounts 200 \ --unique_insertion_rate 0.00002 \ --unique_deletion_rate 0.000115 \ --unique_substitution_rate 0.002485 \ --recurrent_insertion_rate 0.00002 \ --recurrent_deletion_rate 0 \ --recurrent_substitution_rate 0.003357 \ --max_insertion_length 14 \ --subs_VAF_alpha 0.36,0.27 \ --del_VAF_alpha 0.59,0.42 \ --ins_VAF_alpha 0.33,0.45 \ --r_subs_VAF_alpha 0.36,0.27 \ --r_del_VAF_alpha 0.59,0.42 \ --r_ins_VAF_alpha 0.33,0.45but when we check the quality pattern with FASTQC we see such profiles for the paired-end reads:
Forward read
Reverse read
Usually, the read quality drops towards the 5' end and not the 3' end (see https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). So it looks flipped? Although the general error profile looks quite good otherwise!
Did we do something wrong in the simulation, or is this a bug?
(we see the same pattern for all reads we simulated)
Thanks!