Purpose: The purpose of this study was to describe a MLST-like microarray-based resequencing assay that evolved from an intelligent, rational design strategy.
Methods: Our EC-MLST microarray represents approximately 100 E. coli tiles (genes) in a standard resequencing probe design strategy, each approximately 500 bp long and targets the most heterogenic region of each gene, as determined based on gene sequence alignments. Also included are the 30 “standard” loci utilized by Achtman, Whittam, and Pasteur MLST typing schemes and 80 virulence genes that were chosen based on literature searches that revealed their association with particular pathogen types and clinical outcomes.
Results: As part of a validation study, we have determined the accuracy (sequencing error rate) of this resequencing-based assay to be equivalent to Q30. As such, we are able to accurately assess both phylogeny and horizontal gene transfer (recombination). In addition to the validation study, we also present the results of our examination of a vast collection of temporally and geographically diverse isolates of commensal and pathogenic E. coli strains. Finally, we present and discuss a data analysis pipeline that allows for automated base calling, curating of reference genome sequence data from Genbank, and comparative genomic-phylogenetic analysis.
Significance: The availability of such a molecular detection method and data analysis pipeline allows for routine use of the EC-MLST array in molecular epidemiological or molecular subtyping scenarios.