Purpose: The purpose of this study was to investigate the genetic diversity of Salmonella fliC gene to discover biomarkers for rapid serotype detection.
Methods: A bioinformatics pipeline was developed and implemented for sequence acquisition and genetic diversity analysis from NGS data. It consisted of several steps: reference sequence retrieval and template sequence determination; retrieval of NGS sequence reads of Salmonella outbreak isolates; multiple sequence alignment and phylogenetic analysis.
Results: The flic reference sequences of 24 Salmonella strains of 13 serotypes were retrieved from National Center for Biotechnology Information (NCBI) database, and the phylogenetic tree revealed the relationships among the 13 serotypes based on SNPs variations in the data set. The genetic diversity of Salmonella fliC gene was distinguished by applying the pipeline on the NGS reads of 48 S. Newport, 48 S. Montevideo, and 115 S. Enteritidis outbreak isolates, respectively. The marker sequences for Salmonella fliC gene were identified.
Significance: The developed pipeline provides an effective bioinformatics tool for genetic diversity clarification and marker sequences discovery which will enhance the NGS data analysis and its applications on pathogen identification, source tracking, and population genome evolution.