The Illumina Sequencing Sample Sheet Format Specifications document cited in the sample-sheet code:
|
# From the section "Character Encoding" in the Illumina format specification. |
|
# |
|
# https://www.illumina.com/content/dam/illumina-marketing/ |
|
# documents/products/technotes/ |
|
# sequencing-sheet-format-specifications-technical-note-970-2017-004.pdf |
explicitly mentions additional restrictions on Sample_ID column values:
The field for the Sample_ID column has special character restrictions as only alphanumeric (ASCII codes 48-57, 65-90, and 97-122), dash (ASCII code 45), and underscore (ASCII code 95) are permitted. The Sample_ID length is limited to 100 characters maximum.
The sample_sheet validation code currently allows some invalid Sample_ID values (e.g., containing +) that some tools (like bcl2fastq) reject. Could the sample_sheet validation code be enhanced to detect Sample_IDs that don't conform to the Illumina spec?
The Illumina Sequencing Sample Sheet Format Specifications document cited in the sample-sheet code:
sample-sheet/sample_sheet/__init__.py
Lines 58 to 62 in 06d2566
explicitly mentions additional restrictions on Sample_ID column values:
The sample_sheet validation code currently allows some invalid Sample_ID values (e.g., containing
+) that some tools (like bcl2fastq) reject. Could the sample_sheet validation code be enhanced to detect Sample_IDs that don't conform to the Illumina spec?