Sample_ID validation

The Illumina Sequencing Sample Sheet Format Specifications document cited in the sample-sheet code:
https://github.com/clintval/sample-sheet/blob/06d2566c0bf9a0f3b14856e97e4e6cea2827ca89/sample_sheet/__init__.py#L58-L62

explicitly mentions additional restrictions on Sample_ID column values:
> The field for the Sample_ID column has special character restrictions as only alphanumeric (ASCII codes 48-57, 65-90, and 97-122), dash (ASCII code 45), and underscore (ASCII code 95) are permitted. The Sample_ID length is limited to 100 characters maximum.

The sample_sheet validation code currently allows some invalid Sample_ID values (e.g., containing `+`) that some tools (like bcl2fastq) reject. Could the sample_sheet validation code be enhanced to detect Sample_IDs that don't conform to the Illumina spec?  

	# From the section "Character Encoding" in the Illumina format specification.
	#
	# https://www.illumina.com/content/dam/illumina-marketing/
	# documents/products/technotes/
	# sequencing-sheet-format-specifications-technical-note-970-2017-004.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sample_ID validation #125

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Sample_ID validation #125

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions