Given influenza A H5 query full genomes, the workflow maps queries to predefined major/minor genotypes by placing aligned queries onto segment reference trees using TIPars.
Web version is availabel at www.ggflu.org and www.ggflu.com.
- Global H5 genotyping framework with a two-level nomenclature:
- Major genotype: gene constellation across 8 segments.
- Minor genotype: finer reassortment-level sub-lineage within a major genotype.
- HA is used as lineage anchor (WHO/WOAH/FAO clade context).
- Internal and NA segment clusters are defined using distance/host/geography criteria.
- Query assignment uses phylogenetic placement (TIPars) onto fixed segment reference trees.
- Results are harmonized with external systems through mapping fields (e.g., GenoFLU and GenIn labels).
- Pango (SARS-CoV-2) style naming convention is adopted (e.g., B.1) to simplify communication without losing evolutionary history.
Expected FASTA file with header style includes segment tags, e.g.:
>sample_001|HA
ATG...
>sample_001|NA
ATG...
>sample_001|PB2
ATG...
The splitter uses |HA, |NA, |PB2, |PB1, |PA, |NP, |MP, |NS tags in headers.
Main file:
QueryGenotypeTable.txt
Key fields:
qid: query sequence ID.HAclade: inferred HA clade/subtype context for assignment.MajorGenotype_alias: major genotype alias label.MajorGenotype_lineage: major genotype lineage label.MinorGenotype: minor genotype assignment.*means no confident mapped minor genotype in current reference/mapping set.
MinorGenotype_GenoFlu,MinorGenotype_Genin:- Cross-system mapping labels when available.
Not assignedmeans no mapping entry.
- Segment cluster columns (
PB2ClusterName,PB1ClusterName,PAClusterName,NPClusterName,NAClusterName,MPClusterName,NSClusterName):- Segment-level source cluster assignment.
- Where available, includes mapped minor cluster IDs in parentheses.
Notes:- Quality warnings from alignment diagnostics, e.g., high gap/N percentage in specific segments.
Interpretation:
- The naming starts with A and extends to A.1, B.1,... for each H5 clade. Use H5 2.3.4.4b B.4, H5 2.3.4.4h A.1 to refer to the major genotypes. There are 2.3.4.4b B.5 and 2.3.2.1a B.5 for each clade.
- Existing major + existing minor match -> known genotype.
- Existing major + unmatched minor (
*) -> candidate new minor genotype under known major backbone. - Unmatched major pattern (
*) -> candidate new major genotype
- Genotypes for 2.3.2.1a, 2.3.2.1e, 2.3.2.1g, 2.3.4.4b, 2.3.4.4h.
