P2-119 FDA-ECID:  A Novel Microarray Representing the PanGenome of Escherichia coli:  A Tool for Molecular Epidemiology, Molecular Serotyping, and Phylogeny

Tuesday, August 5, 2014
Exhibit Hall D (Indiana Convention Center)
Scott Jackson, NIST, Gaithersburg, MD
Jayanthi Gangiredla, U.S. Food and Drug Administration, Laurel, MD
Mark Mammel, U.S. Food and Drug Administration, Laurel, MD
Isha Patel, U.S. Food and Drug Administration, Laurel, MD
David Lacher, U.S. Food and Drug Administration, Laurel, MD
Christopher Elkins, U.S. Food and Drug Administration, Laurel, MD
Introduction: Illnesses associated with the consumption of foods contaminated with pathogenic Escherichia coli result in thousands of hospitalizations and hundreds of deaths annually throughout the world. The ability of these pathogens to rapidly adapt to novel environmental niches necessitates highly parallel analysis methods in order to accurately identify and discriminate individual strains.

Purpose: Here we describe the development and validation of a novel, high density DNA microarray representing all known E. coli genes mined from approximately 300 whole genome sequences.   The FDA-ECID array has been designed and manufactured using next-generation Affymetrix PEG-GeneAtlas technology.  This custom tool is rapid, affordable and high-throughput.

Methods: Using BLASTCLUST and NETCLUST tools, we analyzed 300 whole genome sequences and determined the non-redundant pangenome of the species of E. coli to be ~40k unique genes.  Each of these ~40k genes is represented as a probe set on our FDA-ECID microarray.  Additionally, we have represented each allele from the fliC, wzx, and wzy genes; thereby allowing this microarray the ability to perform molecular serotyping. Using the same 300 genome sequences, we identified ~125k conserved 25-mers each containing a central single nucleotide polymorphism (SNP).  Of these, we filtered the most informative 10% that were capable of accurately recapitulating the phylogeny of E. coli.  Each of the 10k informative SNPs is represented on the FDA-ECID microarray using a SNP-typing probe strategy.

Results: As part of a validation process, we have performed hybridizations in quadruplicate of 4 diverse, well characterized, sequenced reference strains (Sakai, 55989, CFT073, MG1655).  These data allowed us to optimize gene-calling and SNP-calling algorithms. We also present the results from our interrogation of a vast collection (>900) of temporally and geographically diverse E. coli isolates.

Significance: In summary, the FDA-ECID microarray is a powerful tool for molecular epidemiology, phylogenetic analysis, virulence assessment, molecular serotyping, and exploring the global genomic diversity of Escherichia coli.