Shaokang Zhang
, Center for Food Safety, Department of Food Science and Technology, University of Georgia
, Griffin
, GA
Yanlong Yin
, Department of Computer Science, Illinois Institute of Technology, Chicago, IL
, Chicago
, IL
Marcus Jones
, Department of Infectious Diseases, J. Craig Venter Institute
, Rockville
, MD
Zhenzhen Zhang
, Department of Biostatistics, School of Public Health, University of Michigan.
, Ann Arbor
, MI
Brooke Kaiser
, Biological Sciences Division, Pacific Northwest National Laboratory,
, Richard
, WA
Blake Dinsmore
, Division of Foodborne, Waterborne and Environmental Diseases, Centers for Disease Control and Prevention
, Atlanta
, GA
Collette Fitzgerald
, Division of Foodborne, Waterborne and Environmental Diseases, Centers for Disease Control and Prevention
, Atlanta
, GA
Patricia Fields
, Centers for Disease Control and Prevention
, Atlanta
, GA
Xiangyu Deng
, Kraft Foods R&D
, Glenview
, IL
Introduction:
Salmonella is the most prevalent foodborne pathogen in the United States, causing more than 1 million cases of illness annually and the largest economic burden among all bacterial pathogens. The U.S. National
Salmonella Surveillance System has been built upon serotyping, a subtyping method traditionally performed through the agglutination of
Salmonella cells with specific antisera that detects lipopolysaccharide O and flagellar H antigens. Specific combinations of O and H antigenic types represent serotypes. More than 2,500
Salmonella serotypes have been described in the White-Kauffmann-Le Minor scheme.
Purpose: To develop a bioinformatics tool for the determination of Salmonella serotypes using high-throughput genome sequencing data.
Methods: Databases for Salmonella serotype determinants and bioinformatics pipelines were built to allow in silico determination of antigenic profiles from raw sequencing reads and genome assemblies. A web application, SeqSero, was developed to allow public access to this tool.
Results: SeqSero was validated by testing: 1) raw reads from genomes of 308 Salmonella isolates of known serotype; 2) raw reads from genomes of 3,306 Salmonella isolates sequenced and made publicly available by GenomeTrakr, a U.S. national surveillance network; and 3) 354 publicly available draft/complete Salmonella genomes. It achieved accuracy rates of 98.7%, 92.6% and 91.5%, respectively, for the three datasets. Together, SeqSero successfully determined a total of 200 serotypes and was predicted to perform near full spectrum (more than 2,300) Salmonella serotype determination. We also demonstrated Salmonella serotype determination from raw sequencing reads of fecal metagenomes from mice orally infected with this pathogen.
Significance: Public health microbiology is being transformed by whole genome sequencing (WGS) which opens the door to serotype determination using WGS data. SeqSero is a fast and robust serotype prediction tool that helps to maintain the well-established utility of Salmonella serotyping by integrating it into the platform of WGS-based pathogen subtyping and characterization.