P2-63 Bioinformatics Analysis of Salmonella fliC Gene Diversity from Next-generation Sequencing Data

Tuesday, August 5, 2014
Exhibit Hall D (Indiana Convention Center)
Wen Zou, U.S. Food and Drug Administration-NCTR, Jefferson, AR
Weizhong Zhao, U.S. Food and Drug Administration-NCTR, Jefferson, AR
James Chen, U.S. Food and Drug Administration-NCTR, Jefferson, AR
Introduction: Standard Salmonella serotyping methods rely on the detection of somatic (O) and flagellar (H) antigens present on the cell surface. Salmonella fliC gene, encoding Salmonella phase 1 H antigen, is one of the Salmonella serotype determinant genes. Next-generation sequencing (NGS) technology has recently been widely applied in clinical and public health laboratory investigations for pathogen detection and surveillance. Hundreds of Salmonella strains had been collected from food, clinical and environmental sources and their whole genome sequences were obtained by NGS technology.

Purpose: The purpose of this study was to investigate the genetic diversity of Salmonella fliC gene to discover biomarkers for rapid serotype detection. 

Methods: A bioinformatics pipeline was developed and implemented for sequence acquisition and genetic diversity analysis from NGS data. It consisted of several steps: reference sequence retrieval and template sequence determination; retrieval of NGS sequence reads of Salmonella outbreak isolates; multiple sequence alignment and phylogenetic analysis.

Results: The flic reference sequences of 24 Salmonella strains of 13 serotypes were retrieved from National Center for Biotechnology Information (NCBI) database, and the phylogenetic tree revealed the relationships among the 13 serotypes based on SNPs variations in the data set. The genetic diversity of Salmonella fliC gene was distinguished by applying the pipeline on the NGS reads of 48 S. Newport, 48 S. Montevideo, and 115 S. Enteritidis outbreak isolates, respectively. The marker sequences for Salmonella fliC gene were identified.

Significance: The developed pipeline provides an effective bioinformatics tool for genetic diversity clarification and marker sequences discovery which will enhance the NGS data analysis and its applications on pathogen identification, source tracking, and population genome evolution.