P2-121 Design, Development and Utilization of an Escherichia coli Resequencing Microarray:  A Tool for Understanding Phylogeny, Genetic Diversity and Molecular Epidemiology

Tuesday, August 5, 2014
Exhibit Hall D (Indiana Convention Center)
Jayanthi Gangiredla, U.S. Food and Drug Administration, Laurel, MD
Anjan Purkayastha, TessArae LLC, Potomac Falls, VA
Clark Tibbetts, TessArae LLC, Potomac Falls, VA
Mathew Lorence, TessArae LLC, Potomac Falls, VA
Christopher Elkins, U.S. Food and Drug Administration, Laurel, MD
Scott Jackson, NIST, Gaithersburg, MD
Introduction:  As a commensal, a pathogen, and an emerging pathogen, understanding the evolution and phylogeny of Escherichia coli will provide an understanding of its ability to adapt to new environmental niches and to acquire novel virulence and metabolic mechanisms. Over the past three decades, a plethora of molecular assays have been developed to examine its genomic diversity and evolution; including, but not limited to MLEE, MLST, and WGS.

Purpose:  The purpose of this study was to describe a MLST-like microarray-based resequencing assay that evolved from an intelligent, rational design strategy. 

Methods:  Our EC-MLST microarray represents approximately 100 E. coli tiles (genes) in a standard resequencing probe design strategy, each approximately 500 bp long and targets the most heterogenic region of each gene, as determined based on gene sequence alignments.  Also included are the 30 “standard” loci utilized by Achtman, Whittam, and Pasteur MLST typing schemes and 80 virulence genes that were chosen based on literature searches that revealed their association with particular pathogen types and clinical outcomes. 

Results: As part of a validation study, we have determined the accuracy (sequencing error rate) of this resequencing-based assay to be equivalent to Q30. As such, we are able to accurately assess both phylogeny and horizontal gene transfer (recombination).  In addition to the validation study, we also present the results of our examination of a vast collection of temporally and geographically diverse isolates of commensal and pathogenic E. coli strains. Finally, we present and discuss a data analysis pipeline that allows for automated base calling, curating of reference genome sequence data from Genbank, and comparative genomic-phylogenetic analysis.

Significance: The availability of such a molecular detection method and data analysis pipeline allows for routine use of the EC-MLST array in molecular epidemiological or molecular subtyping scenarios.