Purpose: The purpose of this study was to describe a microarray-based assay to discriminate pathogenic and non-pathogenic E. coli.
Methods: A high-density custom DNA microarray was designed with informative genetic features extracted from 368 whole genome sequences (WGS) for rapid and high-throughput pathogen identification. The FDA-ECID microarray contains three sets of molecularly informative features that function together to stratify strain identification and relatedness. This includes molecular serotyping, E. coli pan-genome content information, and recapitulating the phylogeny of E. coli, the latter based on 9984 SNPs providing the most discriminatory capability. We analyzed 103 diverse E. coli isolates with available WGS data, including those associated with past foodborne illnesses, to determine robustness and accuracy of the array.
Results: The array was able to accurately identify the molecular O and H serotypes of all 103 isolates tested. In addition, molecular risk assessment was possible with virulence maker identifications, as exemplified with the targeted stx and eae alleles. Epidemiologically, each strain had a unique comparative genomic fingerprint that was extended to an additional 507 strains with strain-level resolution demonstrated for food and clinical examples. Finally, a 99% phylogenetic concordance was established between microarray analysis and WGS using SNP-level data for advance genome typing.
Significance: The current study confirms the FDA-ECID microarray as a powerful tool for epidemiology and molecular risk assessment with the capacity to profile the global landscape and diversity of E. coli.