Purpose: Application of whole genome sequencing has become a routine for tracing infections or outbreaks. Core genome MLST (cgMLST) analysis is one of the most straightforward ways to explore complex genomic data in an epidemiological context. Therefore, there is a need to generate a new, portable, standardized, and more advanced system that provides higher resolution among V. parahaemolyticus strains using WGS data.
Methods: To establish this cgMLST scheme, we sequenced 92 V. parahaemolyticus genomes and used the genome of strain RIMD 2210633, as the reference (total 4,832 genes), to determine which genes were suitable for establishing the V. parahaemolyticus cgMLST scheme.
Results: The initial analysis resulted in the identification of 2,254 suitable core genes for use in the scheme. To evaluate the performance of this scheme, we performed a cgMLST analysis of the 92 newly sequenced genomes, plus an additional 142 strains with genomes available at NCBI. The cgMLST scheme distinguished related and unrelated strains, including those with the same sequencing type; clearly showing its enhanced resolution over conventional MLST analysis. This cgMLST also distinguished outbreak-related from unrelated strains within the same sequencing type.
Significance: Application of this cgMLST scheme to V. parahaemolyticus strains from different laboratories around the world will facilitate a global picture of the epidemiology, spread, and evolution of this pathogen.