Purpose: We describe the use of a novel, high-density DNA microarray representing informative single nucleotide polymorphisms (SNPs) from E. coli mined from approximately 300 whole genome sequences. The custom FDA-ECID microarray has been designed and manufactured using next-generation Affymetrix PEG-GeneAtlas technology. This array is a rapid resequencing-based genomic tool for E. coli characterization and subtyping.
Methods: Three hundred whole genome sequences were used to identify ~125,000 conserved 25-mers each containing a central SNP. Of these, ~10,000 informative SNPs were selected for inclusion and are represented on our custom FDA-ECID microarray using a SNP-typing probe strategy.
Results: Using our optimized SNP-calling algorithms, we have analyzed data from a vast collection of temporally and geographically diverse E. coli isolates. The major phylogenetic lineages within the species were recapitulated using the array SNP data. In addition, the array data was compared to the same in silico SNPs from whole genome sequence (WGS) data as well as to a more comprehensive set of chromosomal backbone SNPs mined from the WGS archive. Comparisons of the microarray SNP data to the WGS data show greater than 95% similarity in classifying the isolates examined.
Significance: In summary, the FDA-ECID microarray is a powerful tool for molecular subtyping and phylogenetic analysis of E. coli.