Genome wide evolutionary analyses reveal serotype specific patterns of positive selection in selected Salmonella serotypes
BMC Evolutionary Biology volume 9, Article number: 264 (2009)
The bacterium Salmonella enterica includes a diversity of serotypes that cause disease in humans and different animal species. Some Salmonella serotypes show a broad host range, some are host restricted and exclusively associated with one particular host, and some are associated with one particular host species, but able to cause disease in other host species and are thus considered "host adapted". Five Salmonella genome sequences, representing a broad host range serotype (Typhimurium), two host restricted serotypes (Typhi [two genomes] and Paratyphi) and one host adapted serotype (Choleraesuis) were used to identify core genome genes that show evidence for recombination and positive selection.
Overall, 3323 orthologous genes were identified in all 5 Salmonella genomes analyzed. Use of four different methods to assess homologous recombination identified 270 genes that showed evidence for recombination with at least one of these methods (false discovery rate [FDR] <10%). After exclusion of genes with evidence for recombination, site and branch specific models identified 41 genes as showing evidence for positive selection (FDR <20%), including a number of genes with confirmed or likely roles in virulence and ompC, a gene encoding an outer membrane protein, which has also been found to be under positive selection in other bacteria. A total of 8, 16, 7, and 5 genes showed evidence for positive selection in Choleraesuis, Typhi, Typhimurium, and Paratyphi branch analyses, respectively. Sequencing and evolutionary analyses of four genes in an additional 42 isolates representing 23 serotypes confirmed branch specific positive selection and recombination patterns.
Our data show that, among the four serotypes analyzed, (i) less than 10% of Salmonella genes in the core genome show evidence for homologous recombination, (ii) a number of Salmonella genes are under positive selection, including genes that appear to contribute to virulence, and (iii) branch specific positive selection contributes to the evolution of host restricted Salmonella serotypes.
Salmonella is a ubiquitous human and animal pathogen. This genus contains >2,500 recognized serotypes and is divided into two species, Salmonella bongori and Salmonella enterica. S. enterica consists of six subspecies (i.e., enterica, salamae, arizonae, diarizonae, houtenae, and indica) . Salmonella enterica subsp. enterica serotypes can also be divided into subdivisions according to their host adaptation . For example, Uzzau et al.  proposed that Salmonella serotypes can be divided into (i) host-restricted Salmonella serotypes (i.e., serotypes exclusively associated with one particular host, e.g., Salmonella Typhi and Paratyphi A); (ii) host-adapted Salmonella serotypes (i.e., serotypes prevalent in one particular host species, but able to cause disease in other host species, e.g., Salmonella Choleraesuis); and (iii) unrestricted Salmonella serotypes (i.e., serotypes capable of causing self-limiting gastroenteritis and, less commonly, systemic disease in a wide range of host species, e.g., Salmonella Typhimurium).
Multi-locus sequence typing (MLST) data indicate that the last common ancestor of the human host-adapted Salmonella Typhi existed 15,000-150,000 years ago . The evolution of Salmonella Typhi towards a lifestyle characterized by systemic infection and transmission by excretion through the gall bladder rather than luminal gut colonization  involved a combination of acquisition events (e.g., acquisition of Vi capsule related genes), and deletion events (e.g., loss of virulence-associated genes, such as several genes in SPI-1, SPI-2, SPI-3, SPI-4 and SPI-5). Salmonella Paratyphi A also causes typhoid fever, although the symptoms are typically milder than those caused by Salmonella Typhi. While Salmonella Paratyphi A also appears to have evolved recently, Salmonella Typhi and Paratyphi A clearly show distinct differences in their genome evolution, including a number of unique gene inactivation events in these two serotypes . Non-typhoidal Salmonella serotypes are responsible for gastroenteritis in humans and other animals. These serotypes are mainly transmitted by ingestion of food, feed, or water contaminated with infected feces , but can also be transmitted by direct contact [7, 8]. Disease caused by non-typhoidal Salmonella is one of the most common bacterial foodborne diseases worldwide . Salmonella Typhimurium is one of the most common non-typhoidal Salmonella serotypes, is found worldwide, and can cause disease, predominantly self limiting gastroenteritis, in a large number of animal species . The host adapted Salmonella Choleraesuis can cause severe disease, characterized by septicemia and enterocolitis, in swine. While relatively uncommon, this serotype can also infect humans where it typically causes severe invasive infections, e.g., infective aneurysm .
The importance of acquisition of novel (non-homologous) genes by lateral gene transfer has been clearly demonstrated in a number of bacteria, including a number of bacterial pathogens [11–14]. For example, acquisition of pathogenicity islands has played a critical role in the evolution of Salmonella  and other Gram-negative and Gram-positive pathogens . Gene degradation and gene deletions also have been shown to play a critical role in bacterial evolution, particularly when organisms with a broad niche specificity adapt to narrow and specific ecological niches [5, 16]. For example, it has been suggested that gene degradation and gene deletion contribute to host adaptation in both Salmonella Typhi and Salmonella Paratyphi A . Microarray technologies have also allowed for rapid and large scale studies on gene presence/absence in a large number of isolates, including in Salmonella . In addition to gene acquisition and deletion, positive selection and homologous recombination, play important roles in the evolution of bacteria and bacterial pathogens [18–21].
Genome wide studies on positive selection and homologous recombination in bacterial pathogens, including Streptococcus spp. , Listeria monocytogenes , Campylobacter , E. coli [23, 24], and Shigella  have contributed to a better understanding of the evolution of these important pathogens. So far, no genome wide analyses of positive selection in Salmonella have been reported. One study  evaluated 410 genes present in both S. enterica and E. coli and reported that 50% of amino acid substitutions in these genes appear to have been fixed by positive selection in one of these species. In order to further improve our understanding of the evolution of Salmonella, we performed full genome analyses for homologous recombination and positive selection using the completed and published genome sequences for five Salmonella strains, including the host restricted Salmonella Typhi (two strains) and Paratyphi A, the host adapted Salmonella Choleraesuis, and the broad host range Salmonella Typhimurium. Our analyses focused on the evolution of core genome genes (i.e., genes found in all 5 genomes) and did not include efforts to detect genes acquired by Salmonella through horizontal gene transfer and subsequent non-homologous recombination (e.g., virulence gene islands), as these types of evolutionary events have already been well characterized [13, 26, 27]. Analysis of the Salmonella serotypes included in our study here will, in particular, provide an improved understanding in the roles of positive selection and homologous recombination in the evolution of host-adapted pathogen strains and lineages.
Five available annotated Salmonella enterica subsp.enterica genome sequences were used in this study (Table 1). Genome sequences were downloaded from the Comprehensive Microbial Resource at The Institute for Genomic Research (TIGR; current J. Craig Venter Institute, JCVI) on November 25, 2005. Updated role category information for all genes was obtained from JCVI on October 14, 2008; the Salmonella Typhi CT18 genome was used as reference for role categories. While, as of August 20, 2009, 16 fully sequenced Salmonella genomes, including the 5 genomes used in our study, were available in GenBank (see Additional file 1), the 5 genomes used were the only fully sequenced Salmonella genomes available when our analyses were initiated. These 5 genomes allow for evaluation of evolutionary trends among host-restricted and host adapted Salmonella strains as they include the serotypes Typhi, Paratyphi A, and Choleraesuis.
Identification of orthologous genes presents in all five Salmonellagenomes analyzed
OrthoMCL , which has previously been used for prokaryotic genome analyses [20, 22], was used to identify orthologous genes in the five Salmonella genomes. Orthologs present in all five genomes were aligned using ClustalW . Multiple sequence alignments were carried out on amino acid sequences from each orthologous group, followed by conversion to nucleotide sequence alignments using the PAL2NAL software . This strategy was used to allow for correct alignment of diversified regions in which multiple nucleotide substitution events have taken place; since amino acid sequences are more conserved than DNA sequences, they are easier to align and the final alignments are more reliable. Alignments containing variable sequence lengths or having low alignment scores were manually evaluated and edited, using BioEdit software , as previously described . For example, alignments containing sequences with different lengths and alignments that contained multiple indels that caused incorrect alignments were reviewed and edited as detailed in .
Detection of genes under positive selection
Positive selection can be detected by comparing the rate of non-synonymous substitutions (dN) to the rate of synonymous substitutions (dS). While different methods exist for detection of positive selection, PAML (Phylogenetic Analysis by Maximum Likelihood) was used here as (i) its use for detection of signals of positive selection in bacteria [18, 20, 23, 24, 32, 33], viruses , and eukaryotes [35, 36] has been well documented, (ii) it has been shown to have a relatively good power to detect positive selection even with as few as 5 sequences, while keeping the number of false positives low , and (iii) it allows for detection of signals of branch specific positive selection. We used two types of tests implemented in PAML v3.15 to identify genes with evidence for positive selection , as previously detailed . Briefly, an overall test for positive selection (Test Overall; TO) was carried out to identify genes under positive selection in any or all of the branches of a given phylogeny; this test compares the null model M1a (nearly-neutral) to the alternative model M2a (positive selection) . To identify genes that are under positive selection in specific branches of the Salmonella phylogeny, the branch-site test2  was used. The branch-site test was specifically used to identify genes under positive selection in the ancestral branches of (i) the human restricted serotypes Typhi (Ty#) and Paratyphi A (Pty#), (ii) the porcine adapted serotype Choleraesuis (Ch#), and (iii) the unrestricted serotype Typhimurium (Tym#) (Figure 1). Overall, 18 different phylogenetic trees represented the phylogeny of the 3316 Salmonella orthologous genes, including one tree that represented the phylogeny of 1198 genes. Both the overall test and the branch site tests were performed using the gene specific trees.
For each test, nested models (one null model that does not allow for positive selection and one alternative model that allows for positive selection) were compared using a Likelihood Ratio Test (LRT) . For each model, three replicates were generated and the maximum likelihood values for each model were used in the LRT in order to eliminate the runs that could not reach the global maximum likelihood score. Tests that yielded LRT values < -0.1 were re-run 10 times and the maximum values for each model were used to calculate the LRT. Negative LRT values (i.e., some tests yielded values ≥ - 0.1) were rounded to zero (p-value = 1). For all branch-specific tests, one degree of freedom was used to calculate p-values, while for the overall test, two degrees of freedom were used to calculate p-values. Because recombination may generate false positive results with PAML, the final analysis of positive selection was carried out only for those genes that showed no evidence for recombination with any of the four methods used to detect evidence of recombination.
Detection of genes with evidence of recombination
Recombination analyses were performed using GENECONV version 1.81 , Maximum χ2 , pairwise homoplasy index (PHI)  and neighbor similarity score (NSS)  to specifically detect evidence of homologous recombination among orthologous genes found in all 5 genomes; the 3316 alignments of orthologous genes used for these analyses thus contained one sequence from each genome and only recombination events between sequences present in the alignment were considered. Except for GENECONV, the approaches used are implemented in PhiPack . GENECONV and Maximum χ2 are substitution distribution methods, while NSS and Phi are compatibility methods [45, 46]. None of these tests require that the true phylogenetic tree is known. GENECONV detects the evidence of recombination by assessing the significance of long tracts of identical sites among pairs of sequences in a multiple alignment of informative sites. Maximum χ2 searches for recombination breakpoints in the alignment by comparing the number of polymorphic and non-polymorphic sites downstream and upstream of each putative break point (in this method, all polymorphic sites are initially considered as putative recombination breakpoints). NSS uses pairs of informative sites to detect evidence for recombination by assessing the tendency of neighboring sites to be more compatible than sites that are farther apart. PHI measures the similarity between closely linked sites to assess whether a fragment shows evidence for recombination. GENECONV, Maximum χ2 and NSS were used here as these methods, in a comparison of several methods (not including PHI), were shown to perform best (high power and low false positive rates) for sequences with divergence around 5% - 20% , representing a level of divergence expected between different Salmonella serotypes. These methods still differ in their relative power and specificity for detecting recombination though (e.g., depending on sequence divergence) and multiple methods were thus used to identify genes with evidence of recombination, particularly to allow for exclusion of any genes that may have evolved through recombination from subsequent positive selection analyses, which may be affected by recombination.
For the GENECONV analyses, the parameter g-scale was set to 1 and inner p-values were used to identify genes with evidence for recombination . For Maximum χ2, a fixed window size of 2/3 the number of polymorphic sites was used, while for PHI, a window size of 50 nucleotides was used. P-values were estimated using 10,000 permutations of the alignment for GENECONV and 1,000 permutations for NSS, Maximum χ2 and PHI.
Assessment of codon bias, nucleotide diversity and number of informative sites
To assess the codon bias, we identified the effective number of codons used in a gene (NC) using the program "chips" in the EMBOSS package . NC values range from 20, where one codon is used for each amino acid, to 61, where all alternative synonymous codons are used. Lower values of NC indicate higher codon bias in the gene, while higher values of NC values indicate lower codon bias. Nucleotide diversity and number of informative sites were obtained from PhiPack outputs.
Correction for multiple testing was performed using the procedure reported by Benjamini and Hochberg  as implemented in the program Q-Value . As previously detailed by our group , for each p-value, the q-value was calculated; the q-value represents the false discovery rate [FDR], i.e., the expected proportion of false positives among the significant tests. Corrections were performed separately for each test to account for testing of multiple genes. In a preliminary analysis of positive selection, all 3,316 genes were used for FDR correction. As recombination affects the tests for positive selection, the final positive selection analysis was performed using only those 3,046 genes that showed no evidence for recombination; FDR correction for this final positive selection analysis was thus performed with 3,046 genes. As the tests used for positive selection are already conservative , a false discovery rate (FDR) cutoff of 20% was used for the positive selection analyses . For recombination analyses, an FDR cut-off of 10% was used to compensate the fact that no correction for multiple tests (GENECONV, NSS, Maximum χ2 and PHI) was carried out due to the high correlation among the tests .
Associations between JCVI role categories and number of genes with (i) evidence of positive selection and (ii) evidence of recombination were tested using chi-square tests (or Fisher's exact tests where appropriate). Mann-Whitney U-tests (Wilcoxon tests) were used to determine whether selected continuous variables (i.e., gene length, codon bias, and nucleotide diversity) differed between a given role categories and all other role categories. In addition, Mann-Whitney U-tests were used to test whether the p-values of the positive selection tests for genes in a given role category were significantly lower than the p-values among the genes in the other role categories. All Mann-Whitney U-tests were performed as one-sided tests. All tests were performed in the Statistical Analysis System (SAS) 9.1 (SAS Institute Inc., Cary, NC).
Bonferroni corrections for all tests were performed based on the number of tests performed. The cut off value for significance was set at 0.05; Bonferroni corrected p-values are reported unless otherwise stated. Actual p-values are reported unless p-values were < 0.001 or < 0.0001.
Verification of positive selection and recombination patterns in selected genes in a larger Salmonellaset
For four genes (Table 2), including two genes that showed evidence for positive selection and recombination (i.e., folK-2, sseC) and two genes that only showed evidence for positive selection (i.e., STM3258, purE) in the initial genome wide analyses, gene sequences were determined for an additional 42 Salmonella isolates to further test positive selection and recombination patterns. The 42 Salmonella isolates were selected to reflect a diversity of human and animal associated serotypes; specifically, the isolates were selected to represent the 15 most common human and animal associated serotypes in the US (as detailed in the 2003 Salmonella Annual Report from the US Centers of Disease Control and Prevention ) as well as two additional Salmonella Typhi isolates. Human and cattle isolates representing the common human and animal associated serotypes were conveniently selected from the strain collection available at Cornell University Food Safety Laboratory, which include human and animal clinical isolates originally obtained from the New York State Department of Health and the Cornell University Animal Health Diagnostic Center, respectively. For common serotypes (e.g., Typhimurium) more isolates were included in this set as compared to less common serotypes (e.g., Dublin) (see Additional file 2 for a listing of all isolates used). Multiple isolates with the same serotype were selected to represent the most common distinct Pulsed Field Gel Electrophoresis (PFGE) and multilocus sequence typing (MLST) types within a given serotype.
PCR conditions and primers for folK-2, sseC, purE, and STM3258 amplification are described in Additional file 3. PCR products were purified using Exonuclease I (USB) and shrimp alkaline phosphatase (USB). Purified PCR products were sequenced using the Applied Biosystems Automated 3730 DNA Analyzer at the Cornell University Life Sciences Core Laboratories Center. Big Dye Terminator chemistry and AmpliTaq-FS DNA Polymerase were used for sequencing. Alignments for positive selection and recombination analyses, which were performed as detailed above, were constructed using the gene sequences for the five genomes analyzed and the gene sequences for the additional isolates sequenced.
Initial identification and characterization of orthologous genes present in the five Salmonellagenomes representing serotypes Typhi, Typhimurium, Choleraesuis, and Paratyphi A
Using OrthoMCL, a total of 3323 orthologous genes present in all 5 Salmonella genomes were identified. Since seven orthologous genes had low quality alignments, we excluded these genes and used 3316 orthologous genes for the analyses described below. Genes that were not found in all of the five strains were excluded from our analyses. The 3316 core genes represented 69, 81, 73, and 75%, respectively, of the Salmonella Choleraesuis, Paratyphi A, Typhimurium, and Typhi genes annotated in the genomes analyzed.
Interestingly, we identified one 2-gene cluster (i.e., STM0947 and STM0948), which was repeated 12 times in the Salmonella Choleraesuis genome, present once in Typhimurium genome and absent in the Typhi and Paratyphi A genomes. These two genes encode a putative integrase (STM0947) and a putative cytoplasmic protein (STM0948), which differ by 4 and 1 non-synonymous substitution(s), respectively, between Choleraesuis and Typhimurium LT2. In addition, we identified one other gene (NT03ST2087, encoding a putative Tn10 transposase), which was repeated 7 times in the Salmonella Choleraesuis and found once in the Salmonella Typhi CT18, while not present in the other genomes analyzed. Salmonella Choleraesuis thus appears to contain at least two multi-copy mobile genetic elements.
Genes categorized in the JCVI role categories "Hypothetical Proteins", "Protein synthesis", "Unclassified" and "Unknown function" showed a tendency to have shorter alignments (P < 0.001, P = 0.027, P = 0.002, P = 0.017, respectively; one sided U-test) as compared to genes in other role categories, while genes in the JCVI role categories "Amino Acid Biosynthesis", "DNA Metabolism", "Energy Metabolism", and "Transport and Binding Proteins" showed a tendency to have longer alignments (P < 0.001, P = 0.001, P < 0.001, and P < 0.001, respectively; one sided U-test) as compared to genes in other role categories.
Genes in the JCVI role categories "Cellular envelope", "Hypothetical proteins", and "Unclassified" showed a tendency to have more non-synonymous substitutions (P = 0.009, P < 0.001, and P < 0.001, respectively; one sided U-test) as compared to genes in other role categories. Genes in the JCVI role categories "Biosynthesis of cofactors, prosthetic groups, and carriers", "Energy Metabolism", and "Transport and Binding Proteins" showed a tendency to have more synonymous substitutions (P < 0.001, P < 0.001, and P = 0.001, respectively; one sided U-test) as compared to genes in other role categories. Genes in the JCVI role categories "Amino acid biosynthesis", "Energy metabolism", "Protein Synthesis", "Purines, pyrimidines, nucleosides, and nucleotides", "Transcription", and "Transport and binding proteins" showed a tendency to have higher codon bias (P = 0.006, P < 0.001, P < 0.001, P < 0.001, P = 0.033, and P = 0.010, respectively; one sided U-test) as compared to genes in other role categories.
Approximately 8% of core genes show significant evidence for recombination
Among the 3316 orthologous genes, 233 genes showed no substitutions; these genes thus were not analyzed for evidence of homologous recombination (since the methods used cannot detect evidence of recombination if an alignment presents no polymorphisms). While the remaining 3083 genes were analyzed for recombination using GENECONV, only 2849 genes were analyzed using Max χ2, NSS and PHI (467 ortholog alignments had ≤1 informative site and thus could not be analyzed with these programs in PhiPack). Overall, 270 genes (8.14% of all 3,316 core genes) showed evidence for recombination in at least one of the four tests used (FDR < 10%). A total of 192, 155, 69, and 20 orthologs showed evidence of recombination using GENECONV, Max χ2, NSS and PHI, respectively. Only 10 genes showed evidence for recombination with all 4 approaches (Table 3). Substitution methods (i.e., GENECONV and Maximum χ2) thus identified more genes with evidence of recombination as compared to compatibility methods (i.e., NSS and PHI). The differences in the number of genes with evidence of recombination detected with each method are related to (i) the power of the methods to detect recombination in sequences with different divergence and recombination levels, as well as (ii) the number of false positives associated with each method under different scenarios of heterogeneous substitution rates and convergent evolution. For example, GENECONV and Maximum χ2 showed more power to detect recombination as compared to NSS in a study using computer simulations , consistent with the observation that both of these methods identified the largest number of genes with evidence of recombination here. Both GENECONV and NSS also have been found, in a study using empirical data, to show higher levels of false positives as compared to Maximum χ2 when the sequences are very divergent , while, in another study  both NSS and Maximum χ2 have been shown to yield more false positives than PHI particularly in sequences with mutational hot spots. This is consistent with our observation that PHI identified the lowest number of genes with evidence for homologous recombination.
When considering all 270 genes identified as having evidence of recombination by at least one method, genes with higher numbers of informative sites (P < 0.0001; one sided U-test), longer alignments (P < 0.0001; one sided U-test), higher codon bias (P < 0.0001; one sided U-test), and higher nucleotide diversity (P < 0.0001; one sided U-test) were more likely to have evidence for recombination. An overall chi-square test showed that genes with evidence of recombination were not randomly distributed among the 20 JCVI role categories (P < 0.001; Fisher's exact test with Monte Carlo simulation). Subsequent individual chi-square and Fisher's exact tests, determining whether genes with evidence for recombination were associated with individual role categories, showed that genes with evidence of recombination were significantly overrepresented in the role categories "Biosynthesis of cofactors, prosthetic groups, and carriers", "Energy metabolism", "Hypothetical proteins" and "Purines, pyrimidines, nucleosides, and nucleotides" (uncorrected P = 0.0035, P = 0.0037, P = 0.0034, and P = 0.0493, respectively) (Figure 2). However, after corrections for multiple comparisons, the associations are not significant (Bonferroni corrected P = 0.063, P = 0.066, P = 0.061, and P = 0.887, respectively).
Initial analysis revealed a total of 81 Salmonellagenes showing evidence for positive selection
When preliminary positive selection analyses were performed on all 3,316 orthologous genes, 21 genes showed evidence for positive selection (FDR <20%) in the overall test (TO) (Additional file 4). A total of 23, 21, 13, and 14 genes, respectively, showed evidence of positive selection (FDR <20%), using the branch-site test, in the Choleraesuis, Typhi, Typhimurium, and Paratyphi A branch (Additional file 4). As the two Typhi isolates formed a single branch in only the phylogenies for 1261 genes, only these genes were used to test for positive selection in the Typhi branch. While 81 genes showed evidence of positive selection in at least one test (including 11 genes with evidence for positive selection in two tests, see Additional file 4), 32 of these genes also showed evidence of recombination with at least one of the four recombination tests used (Table 4; Additional file 4). Genes with evidence of recombination were more likely to be under positive selection (P < 0.0001; Chi-square test). Although this may indicate that positive selection contributes to fixation of new allelic variants that were generated by recombination , it may also reflect that the positive selection tests were affected by intragenic recombination . Thus, FDR corrections for positive selection analyses were repeated after removal of the 270 genes with evidence of recombination; these new FDR corrections used 3,046 genes for the overall (TO) test and the branch tests of Choleraesuis, Typhimurium and Paratyphi, and 1,108 genes for the Typhi branch test. All data in the subsequent sections represent the data for genes with no evidence for homologous recombination, unless otherwise stated.
A total of 41 Salmonellagenes with no evidence of recombination showed evidence of positive selection
Positive selection tests identified 5 genes with evidence for positive selection (FDR <20%) in the overall test (TO) (Table 5). A total of 8, 16, 7, and 5 genes, respectively, showed evidence of positive selection (FDR <20%), using the branch-site test, in the Choleraesuis, Typhi, Typhimurium, and Paratyphi A branches (Table 5; Additional file 5). None of these genes showed of evidence of positive selection in more than one test.
No association between the low effective number of codons used by a gene (Nc) and evidence for positive selection was observed (P > 0.05; one-sided U-test) suggesting that results of positive selection analyses were not biased by constrains on codon usage, which could result in a low synonymous substitution rate in these genes. Moreover, no association between low dS (the number of synonymous substitutions divided by the number of synonymous sites) and positively selected genes was observed (P > 0.05; one-sided, U-test), supporting that the results were not biased by a low synonymous substitution rate. A Fisher's exact test did not find any significant overall association between the 20 JCVI role categories and the genes under positive selection (Figure 3), possibly due to the low number of genes under positive selection in each role category. To further test for associations between positive selection and gene role category, we thus assessed, for each of the role categories, whether the distribution of the p-values for each positive selection test deviated from the random distribution, using the non-parametric U-test. The JCVI role category "Hypothetical proteins" showed significant trends of having genes with low p-values in the Choleraesuis, Typhimurium and Paratyphi A branch specific tests for positive selection (Bonferroni corrected P = 0.042, P = 0.034 and P < 0.001, respectively; one sided U-test) as compared to genes in other role categories. In addition, genes in the JCVI role categories "Unclassified" and "Protein synthesis" showed a significant trend of having low p-values in the Choleraesuis and Typhimurium branch tests for positive selection, respectively, as compared to genes in other role categories (Bonferroni corrected P = 0.002 and P = 0.013, respectively; one sided U-test).
Among Salmonella pathogenicity islands 1 through 6, three genes showed evidence for positive selection (i.e., pipB, STM1088 [siiB], and safC; see Table 6). Overall, 102 of the orthologs analyzed were located in the 6 Salmonella pathogenicity islands [53, 54]; genes in the pathogenicity islands were not significantly overrepresented (P > 0.05; Fisher's exact test) among the genes with evidence for positive selection. In addition, three SPI-1 genes (i.e., spaM, iagB, and sipD), and one SPI-2 gene (ssaI) showed uncorrected p-values < 0.05 in the TO positive selection test (P = 0.049, 0.017, 0.003 and 0.047, respectively), but failed to meet the FDR cutoff (q-values = 1, 1, 0.925, and 1, respectively). Similarly, one SPI-2 gene (sseF) showed a low uncorrected p-value (P = 0.001) in the Choleraesuis branch test, but failed to meet the FDR cutoff (q-value = 0.332).
Interestingly, ompC showed evidence for positive selection in our study (Table 5) as well as in a previous study of Shigella and E. coli . Our analyses showed that aa residues 228 and 274 show evidence for positive selection (Additional file 6), while aa 163, 202, and 203 showed evidence for positive selection in E. coli and Shigella . Salmonella OmpC aa site 228, which was found to be under positive selection here, is located in a region that is absent from the E. coli and present in Shigella OmpC, while Salmonella OmpC aa site 274 is located in a region that is absent from OmpC in both E. coli and Shigella.
Verification of positive selection and recombination patterns, identified by genome wide analyses, for four genes among 42 Salmonellaisolates
In order to confirm positive selection and recombination patterns identified by the full genome analyses, we used a larger set of 42 Salmonella isolates to sequence and analyze four genes, including two genes that showed evidence for positive selection and recombination (i.e., folK-2, sseC) and two genes that only showed evidence for positive selection (i.e., STM3258, purE). folK-2, which encodes an enzyme involved in the synthesis of folic acid, could not be PCR amplified in 6 Salmonella isolates, representing serotypes Montevideo (n = 2), Oranienburg, Javiana, Urbana, and Muenster. Analyses of 41 folK-2 sequences (5 sequences from the genomes and 36 newly determined sequences) confirmed that this gene shows evidence for recombination (Table 2).sseC, which is located in the Salmonella pathogenicity island 2, could not be PCR amplified in 6 Salmonella isolates, representing serotypes Agona (n = 2), Havana, Kentucky, and Mbandaka (n = 2). Analyses of the sseC sequences also confirmed that this gene shows evidence for recombination (Table 2). The STM3258 gene, which encodes a putative PTS component, could not be PCR amplified in one Salmonella Typhimurium and three serotype 4,5,12:i:-isolates. Results from the analyses of the resulting 43 STM3258 gene sequences was consistent with the genome analyses data and confirmed that this gene shows no evidence for recombination, but is under positive selection in the Salmonella Typhi branch. purE, which encodes an enzyme involved in the synthesis of purine ribonucleotide, was successfully amplified and sequenced in all 42 isolates; analyses of the resulting sequences also found evidence for positive selection in the Salmonella Typhi branch (Table 2); one test (NSS) on all 47 purE gene sequences found evidence for recombination in this gene (P < 0.001).
In this study, we used 5 Salmonella genomes representing host restricted (i.e., Typhi and Paratyphi A), host adapted (i.e., Choleraesuis), and unrestricted (i.e., Typhimurium) serotypes to study the evolution of core genes in different Salmonella serotypes. A total of 3,316 orthologs found in these 5 Salmonella genomes were used to (i) identify genes with evidence of recombination and (ii) identify genes under positive selection. Positive selection and recombination patterns for four genes of interest were confirmed in a larger set of isolates representing 23 different serotypes. Overall, our data show that, among the serotypes evaluated, (i) less than 10% of Salmonella genes in the core genome show evidence for homologous recombination, (ii) a number of core Salmonella genes are under positive selection, including genes that appear to contribute to virulence, and (iii) the cell surface protein ompC, which may contribute to multi drug resistance in Salmonella, is targeted by positive selection in both Salmonella and E. coli .
Less than 10% of Salmonellagenes show evidence for intragenic recombination
Since the first bacterial genome was sequenced in 1995, comparative tools have shown that horizontal gene transfer is the major process for the evolution of prokaryotes [12, 14, 55]. Horizontal gene transfer has also been proposed to have played an important role in the evolution of the Salmonella genome. Salmonella Typhimurium LT2 seems to have acquired a number of novel genomic regions after the divergence from E. coli around 100 millions years ago  and it has been estimated that 25% of the Salmonella Typhimurium genome may have been introduced by horizontal gene transfer . Groups of genes introduced by horizontal gene transfer include prophages and Salmonella pathogenicity islands (SPIs) . While the role of horizontal gene transfer in introducing novel genes into the Salmonella genome has been well established, our analyses show that horizontal transfer (and recombination) of homologous genes also plays an important role in the diversification of Salmonella; 270 of the 3316 genes characterized (8.1%) showed evidence for intragenic homologous recombination. By comparison, analysis of four E. coli and two Shigella genomes found 236 genes with evidence for intragenic recombination, representing approximately 6.3% of genes analyzed . Chen et al.  reported that 12.8% of core genome genes, found in seven E. coli genomes, showed evidence for recombination. A study of 410 genes present in six E. coli and six Salmonella enterica genomes reported that 23% of these genes showed evidence of recombination in Salmonella; this estimate may be higher than the one reported here as the 410 genes evaluated do not represent a random sample of the Salmonella core genome . Interestingly, even novel genes that were initially introduced into the Salmonella genome through horizontal gene transfer and non-homologous recombination, showed evidence for further subsequent diversification through homologous recombination (e.g., one and two genes in SPI-1 and 2, respectively, showed evidence for intragenic recombination). A recent analysis by Didelot et al.  also suggested that convergence of Salmonella Typhi and Paratyphi A, two human host-restricted serotypes, through >100 recombination events involving both transfer of novel genes as well as transfer of homologous genes, further supporting the importance of horizontal transfer of homologous gene sequences in the evolution of Salmonella .
A number of core Salmonellagenes are under positive selection, including genes that appear to contribute to virulence and systemic infection
A total of 1.2% of genes found in all five Salmonella genomes (i.e., 41 genes) showed evidence for positive selection and no evidence for recombination. While 5 genes showed evidence for positive selection in the overall analyses, 36 genes showed evidence for positive selection only in specific branches, indicating considerable branch specific positive selection in the Salmonella serotypes evaluated. Previously, Petersen et al.  reported that, among 3,505 E. coli and Shigella genes that showed no evidence for recombination, a total of 23 genes (0.66%) showed evidence for positive selection. Among Gram-positive pathogens, Orsi et al.  reported that 36 L. monocytogenes and L. innocua genes (1.6%) showed evidence of positive selection (among a total of 2267 genes analyzed), while Lefebure and Stanhope  reported that 11 to 34% of the genes in the Streptococcus core genome showed evidence for positive selection, although this study did not control for multiple comparisons and thus may have somewhat overestimated the number of genes under positive selection. Recently, Lefebure and Stanhope  showed that 92.5% of non-recombinant core genome loci are under positive selection, in at least one lineage, in 17 Campylobacter genomes, which represented 8 different species. While, these different analyses suggest that the proportion of genes with evidence for positive selection appears to vary considerably between different bacterial species or genera, methodological aspects (e.g., approaches used to correct for multiple comparisons, approaches used to identify genes with evidence for recombination) may also affect the number of genes identified as showing evidence for positive selection.
Interestingly, three Salmonella genes with evidence for positive selection were located in Salmonella pathogenicity islands (SPIs). SPIs are chromosomal regions that contain genes contributing to a particular virulence phenotype [26, 58, 59]. So far, five common SPIs (i.e., SPI-1 through SPI-5), found among the majority of Salmonella enterica strains, as well as a number of additional less common SPIs have been reported. siiB, which showed evidence for positive selection, is located in SPI-4 and encodes a probable membrane protein (putative methyl-accepting chemotaxis protein). Morgan et al.  reported that the SPI-4 genes siiD, siiE, and siiF play a role in Salmonella Typhimurium intestinal colonization of calves. Kiss et al.  specifically showed that a Salmonella Typhimurium strain lacking siiB shows reduced secretion of SiiE, as compared to the wildtype, suggesting a possible involvement of siiB in calf virulence (as an siiE mutant showed reduced colonization in a calf model ). pipB, located in SPI-5, also showed evidence for positive selection. SPI-5 encodes T3SS-1 and T3SS-2 effector proteins . PipB localizes to the Salmonella Containing Vacuole (SCV) in mammalian host cells . In addition, Wood et al.  reported that a pipB null mutant showed reduced intestinal secretory and inflammatory responses in ligated bovine ileal loops, suggesting that this, as well as other genes in SPI-5, may contribute to bovine enteric infections. PipB also appears to be required for colonization of the cecum, by Salmonella Typhimurium, in chickens . safC, a gene located in SPI-6 , a region called Salmonella enterica centrisome 7 genomic island (SCI) in Salmonella Typhimurium , was also found to be under positive selection. safC encodes an outer membrane usher protein for Salmonella atypical fimbriae . While a Salmonella Typhimurium strain with a deletion of SPI-6 showed reduced ability to invade Hep2 cells , we are not aware of any studies characterizing virulence of a safC null mutant. While the SPI-2 genes sseC and sseF have previously been reported to (i) show evidence for differential evolution  and (ii) contain distinct clusters of polymorphic sites that might be unique to the human adapted serotypes Typhi and Paratyphi , these genes did not show evidence for positive selection in our final analyses. Both sseC and sseF showed evidence for positive selection in the Choleraesuis branch in our initial analysis, but sseC was removed from the final analysis as this gene also showed evidence of recombination and sseF did not meet the 20% cutoff for FDR. In combination with a previous study  that reported that a number of genes located in Salmonella pathogenicity islands show evidence for differential evolution in different Salmonella serotypes, our findings do support that positive selection contributes to evolution of pathogenicity island genes in Salmonella, even though further analyses on larger data sets will be needed to clarify the contributions of positive selection and recombination to evolution of these genes.
Overall, three genes in the JCVI role category "Purine, pyrimidine, nucleoside and nucleotide biosynthesis" (i.e., wcaH, purE and nrdI) showed evidence for positive selection (while showing no evidence for recombination). wcaH, which encodes a GDP-mannose mannosyl hydrolase, is under positive selection in the Typhimurium branch, while purE and nrdI were found to be under positive selection in the Typhi branch. purE encodes a phosphoribosylaminoimidazole carboxylase, while nrdI, which is located in an operon with genes that encode a Class 1b ribonucleotide reductase, encodes a small flavoprotein with unknown function in Streptococcus pyogenes . Positive selection for purE in the Salmonella Typhi branch was also confirmed in our analyses of 22 human and 20 animal Salmonella isolates, which included two additional Typhi strains. This is a striking finding since Samant et al.  recently reported that de novo nucleotide biosynthesis is essential for bacterial growth in blood. As Salmonella Typhi predominantly causes systemic septicemic infections in humans, these findings suggest that adaptive changes in genes encoding purine, pyrimidine, nucleoside and nucleotide biosynthesis functions may have been critical in the evolution of this host restricted human pathogen. Our findings thus further support that development of novel drugs targeting appropriate purine, pyrimidine, nucleoside and nucleotide biosynthesis pathways may represent an opportunity for therapeutic approaches for bacterial pathogens causing septicemic infections .
Additional genes with evidence for positive selection and possible roles in host infection include katG, which encodes a catalase. While antioxidant defenses mechanism appear to contribute to virulence in a number of pathogens, Salmonella katG null mutations have shown no affect on Salmonella's ability to survive inside phagocytic cells and in a murine model of infection . The importance of adaptive changes in Salmonella katG thus remains to be determined. It seems possible that adaptive changes in genes involved in anaerobic growth may contribute to an improved ability of different strains of this gastrointestinal pathogen to survive under anaerobic conditions encountered in the intestinal tract. We also identified a number of genes with evidence for positive selection that have no apparent link to infection and virulence, including malZ, malT, and mtlA, which encode, respectively, a maltodextrin glucosidase, a transcriptional activator of mal genes, and a mannitol specific PTS system component. While it has been proposed that horizontal transfer of genes encoding proteins involved in acquisition and synthesis of nutrients and genes encoding components of metabolic networks is critical as bacteria adapt to specific environments and ecological niches , our findings suggest that positive selection of genes encoding metabolic capabilities also contribute to adaptation to new environments.
Cell surface proteins are targeted by positive selection in both Salmonella and E. coli
While we identified, in our preliminary analysis, three genes encoding outer membrane proteins (ompC, ompS1 and ompS2) that showed evidence for positive selection, only ompC showed no evidence of recombination. ompC, a highly expressed omp gene, encodes a protein that not only appears to play a role in Salmonella virulence , but also is a receptor for Gifsy-1 and Gifsy-2 phages . An analysis of six E. coli and Shigella genomes also found that three omp genes (i.e., ompF, ompC and ompA) showed evidence of positive selection , while Chen et al.  reported that ompC and ompF were under positive selection in uropathogenic E. coli strains. Furthermore, genes encoding the outer membrane proteins OmpA and OmpB showed evidence for positive selection in Rickettsia spp. . Overall, these data strongly suggest that adaptive changes in genes encoding outer membrane proteins critically contribute to the evolution of a variety of bacteria, including pathogenic enterobacteriaciae. In particular, ompC, which encodes one of the most abundant E. coli proteins , appears to be under positive selection in a number of pathogenic enterobacteriaciae. As proposed by Petersen et al. , positive selection in omp genes may be an important mechanism that facilitates adaptation of bacterial pathogens allowing them to escape recognition by the host immune system and phages. In addition, mutations in porin genes (e.g., those belonging to OmpC and OmpF groups), as well as changes in Omp expression levels, have been linked to increased resistance to β-lactam antibiotics [74–76]. For example, under strong antibiotic pressure, bacteria can reduce the influx of antibiotic through downregulation of porin expression or expression of modified porins. Positive selection in porin genes, particularly ompC thus may also be associated with selection to increase antibiotic resistance. These findings provide potentially interesting avenues for future mutagenesis studies to elucidate the role of ompC polymorphisms in various phenotypes, including β-lactam resistance.
Our analyses strongly suggest that both homologous recombination and positive selection (particularly lineage specific positive selection) contribute to the evolution of the Salmonella core genome, at least in the serotypes analyzed here. While genes with evidence of positive selection identified here may provide promising targets for future mutational studies aimed at further identifying mechanisms that contribute to Salmonella diversification, including its adaptation to specific host species, one cannot extrapolate our findings on a few Salmonella serotypes to other serotypes unless additional analyses are performed. The relevance of the lineage specific positive selection patterns identified here is supported, though, by the convergence of the positive selection patterns identified in the Salmonella Typhi lineage (i.e., for genes encoding proteins involved in purine, pyrimidine, nucleoside and nucleotide biosynthesis) and experimental evidence that genes involved in de novo nucleotide biosynthesis are essential for bacterial growth in blood .
In conjunction with previous genome wide studies on positive selection in uropathogenic E. coli , Shigella and E. coli , Listeria spp. , Campylobacter  and Streptococcus spp. , our data clearly indicate the positive selection and homologous recombination among core genome genes play an important role in the evolution of bacterial pathogens, in addition to the well established importance of gene acquisition and deletion. Positive selection and homologous recombination also appear to contribute to further evolution of novel genes initially acquired by lateral gene transfer, such as selected genes in the Salmonella pathogenicity islands. As additional pathogen genomes, including additional Salmonella genomes, have and continue to become available, positive selection and recombination analyses on larger numbers of genomes will further improve our understanding of bacterial pathogens.
- an overall test for positive selection:
which was carried out using the null model M1a (Nearly-neutral) and the alternative model M2a in PAML
the Choleraesuis branch specific test for positive selection, which was carried out in PAML using the branch-site test2 and the porcine adapted serotype Choleraesuis branch
the Typhi branch specific test for positive selection, which was carried out in PAML using the branch-site test2 and the human restricted serotypes Typhi branch
the Typhimurium branch specific test for positive selection, which was carried out in PAML using the branch-site test2 and the unrestricted serotype Typhimurium branch
the Paratyphi A branch specific test for positive selection, which was carried out in PAML using the branch-site test2 and the human restricted serotypes Paratyphi A
Statistical Test for Detecting Gene Conversion, this test for evidence of recombination was performed using GENECONV version 1.81
- Max χ2:
Maximum χ2, this test for evidence of recombination was performed using Maximum χ2 implemented in the PhiPack software package
Neighbor Similarity Score, this test for evidence of recombination was performed using Neighbor Similarity Score implemented in the PhiPack software package
Pairwise Homoplasy Index, this test for evidence of recombination was performed using Pairwise Homoplasy Index implemented in the PhiPack software package.
Brenner FW, Villar RG, Angulo FJ, Tauxe R, Swaminathan B: Salmonella nomenclature - Guest commentary. J Clin Microbiol. 2000, 38 (7): 2465-2467.
Uzzau S, Brown DJ, Wallis T, Rubino S, Leori G, Bernard S, Casadesus J, Platt DJ, Olsen JE: Host adapted serotypes of Salmonella enterica. Epidemiol Infect. 2000, 125 (2): 229-255. 10.1017/S0950268899004379.
Kidgell C, Reichard U, Wain J, Linz B, Torpdahl M, Dougan G, Achtman M: Salmonella typhi, the causative agent of typhoid fever, is approximately 50,000 years old. Infect Genet Evol. 2002, 2 (1): 39-45. 10.1016/S1567-1348(02)00089-8.
Baker S, Dougan G: The Genome of Salmonella enterica Serovar Typhi. Clin Infect Dis. 2007, 45 (Suppl 1): S29-S33. 10.1086/518143.
McClelland M, Sanderson KE, Clifton SW, Latreille P, Porwollik S, Sabo A, Meyer R, Bieri T, Ozersky P, McLellan M, Harkins CR, Wang CY, Nguyen C, Berghoff A, Elliott G, Kohlberg S, Strong C, Du FY, Carter J, Kremizki C, Layman D, Leonard S, Sun H, Fulton L, Nash W, Miner T, Minx P, Delehaunty K, Fronick C, Magrini V, Nhan M, Warren W, Florea L, Spieth J, Wilson RK: Comparison of genome degradation in Paratyphi A and Typhi, human-restricted serovars of Salmonella enterica that cause typhoid. Nat Gen. 2004, 36 (12): 1268-1274. 10.1038/ng1470.
Ross IL, Heuzenroeder MW: Discrimination within phenotypically closely related definitive types of Salmonella enterica serovar typhimurium by the multiple amplification of phage locus typing technique. J Clin Microbiol. 2005, 43 (4): 1604-1611. 10.1128/JCM.43.4.1604-1611.2005.
Langvad B, Skov MN, Rattenborg E, Olsen JE, Baggesen DL: Transmission routes of Salmonella Typhimurium DT 104 between 14 cattle and pig herds in Denmark demonstrated by molecular fingerprinting. J Appl Microbiol. 2006, 101 (4): 883-890. 10.1111/j.1365-2672.2006.02992.x.
Nagano N, Oana S, Nagano Y, Arakawa Y: A severe Salmonella enterica serotype Paratyphi B infection in a child related to a pet turtle, Trachemys scripta elegans. Jpn J Infect Dis. 2006, 59 (2): 132-134.
Swaminathan B, Gerner-Smidt P: Foodborne disease trends and reports. Foodborne Pathog Dis. 2006, 3 (3): 220-221. 10.1089/fpd.2006.3.220.
Chiu CH, Tang P, Chu CS, Hu SN, Bao QY, Yu J, Chou YY, Wang HS, Lee YS: The genome sequence of Salmonella enterica serovar Choleraesuis, a highly invasive and resistant zoonotic pathogen. Nucleic Acids Res. 2005, 33 (5): 1690-1698. 10.1093/nar/gki297.
Gal-Mor O, Valdez Y, Finlay BB: The temperature-sensing protein TlpA is repressed by PhoP and dispensable for virulence of Salmonella enterica serovar Typhimurium in mice. Microb Infect. 2006, 8 (8): 2154-2162. 10.1016/j.micinf.2006.04.015.
Pal C, Papp B, Lercher MJ: Adaptive evolution of bacterial metabolic networks by horizontal gene transfer. Nat Genet. 2005, 37 (12): 1372-1375. 10.1038/ng1686.
Porwollik S, McClelland M: Lateral gene transfer in Salmonella. Microbes Infect. 2003, 5 (11): 977-989. 10.1016/S1286-4579(03)00186-2.
Lercher MJ, Pal C: Integration of horizontally transferred genes into regulatory interaction networks takes many million years. Mol Biol Evol. 2008, 25 (3): 559-567. 10.1093/molbev/msm283.
Schmidt H, Hensel M: Pathogenicity islands in bacterial pathogenesis. Clin Microbiol Rev. 2004, 17 (1): 14-56. 10.1128/CMR.17.1.14-56.2004.
Thomson NR, Clayton DJ, Windhorst D, Vernikos G, Davidson S, Churcher C, Quail MA, Stevens M, Jones MA, Watson M, Barron A, Layton A, Pickard D, Kingsley RA, Bignell A, Clark L, Harris B, Ormond D, Abdellah Z, Brooks K, Cherevach I, Chillingworth T, Woodward J, Norberczak H, Lord A, Arrowsmith C, Jagels K, Moule S, Mungall K, Sanders M, Whitehead S, Chabalgoity JA, Maskell D, Humphrey T, Roberts M, Barrow PA, Dougan G, Parkhill J: Comparative genome analysis of Salmonella Enteritidis PT4 and Salmonella Gallinarum 287/91 provides insights into evolutionary and host adaptation pathways. Genome Res. 2008, 18 (10): 1624-1637. 10.1101/gr.077404.108.
Porwollik S, Boyd EF, Choy C, Cheng P, Florea L, Proctor E, McClelland M: Characterization of Salmonella enterica subspecies I genovars by use of microarrays. J Bacteriol. 2004, 186 (17): 5883-5898. 10.1128/JB.186.17.5883-5898.2004.
Orsi RH, Sun Q, Wiedmann M: Genome-wide analyses reveal lineage specific contributions of positive selection and recombination to the evolution of Listeria monocytogenes. BMC Evol Biol. 2008, 8: 233-254. 10.1186/1471-2148-8-233.
Chen ZH, Schneider TD: Comparative analysis of tandem T7-like promoter containing regions in enterobacterial genomes reveals a novel group of genetic islands. Nucleic Acids Res. 2006, 34 (4): 1133-1147. 10.1093/nar/gkj511.
Lefebure T, Stanhope MJ: Evolution of the core and pan-genome of Streptococcus: positive selection, recombination, and genome composition. Genome Biol. 2007, 8 (5): R71-10.1186/gb-2007-8-5-r71.
Deng W, Liou SR, Plunkett G, Mayhew GF, Rose DJ, Burland V, Kodoyianni V, Schwartz DC, Blattner FR: Comparative genomics of Salmonella enterica serovar typhi strains Ty2 and CT18. J Bacteriol. 2003, 185 (7): 2330-2337. 10.1128/JB.185.7.2330-2337.2003.
Lefebure T, Stanhope MJ: Pervasive, genome-wide positive selection leading to functional divergence in the bacterial genus Campylobacter. Genome Res. 2009, 19 (7): 1224-1232. 10.1101/gr.089250.108.
Chen SL, Hung CS, Xu J, Reigstad CS, Magrini V, Sabo A, Blasiar D, Bieri T, Meyer RR, Ozersky P, Armstrong JR, Fulton RS, Latreille JP, Spieth J, Hooton TM, Mardis ER, Hultgren SJ, Gordon JI: Identification of genes subject to positive selection in uropathogenic strains of Escherichia coli: a comparative genomics approach. Proc Natl Acad Sci USA. 2006, 103 (15): 5977-5982. 10.1073/pnas.0600938103.
Petersen L, Bollback JP, Dimmic M, Hubisz M, Nielsen R: Genes under positive selection in Escherichia coli. Genome Res. 2007, 17 (9): 1336-1343. 10.1101/gr.6254707.
Charlesworth J, Eyre-Walker A: The rate of adaptive evolution in enteric bacteria. Mol Biol Evol. 2006, 23 (7): 1348-1356. 10.1093/molbev/msk025.
Marcus SL, Brumell JH, Pfeifer CG, Finlay BB: Salmonella pathogenicity islands: big virulence in small packages. Microb Infect. 2000, 2 (2): 145-156. 10.1016/S1286-4579(00)00273-2.
Kelly BG, Vespermann A, Bolton DJ: The role of horizontal gene transfer in the evolution of selected foodborne bacterial pathogens. Food Chem Toxicol. 2009, 47 (5): 951-968. 10.1016/j.fct.2008.02.006.
Li L, Stoeckert CJ, Roos DS: OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 2003, 13 (9): 2178-2189. 10.1101/gr.1224503.
Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22 (22): 4673-4680. 10.1093/nar/22.22.4673.
Suyama M, Torrents D, Bork P: PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res. 2006, W609-612. 10.1093/nar/gkl315. 34 Web Server
Hall TA: BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucl Acids Symp Ser. 1999, 41: 95-98.
Urwin R, Holmes EC, Fox AJ, Derrick JP, Maiden MC: Phylogenetic evidence for frequent positive selection and recombination in the meningococcal surface antigen PorB. Mol Biol Evol. 2002, 19 (10): 1686-1694.
Andrews TD, Gojobori T: Strong positive selection and recombination drive the antigenic variation of the PilE protein of the human pathogen Neisseria meningitidis. Genetics. 2004, 166 (1): 25-32. 10.1534/genetics.166.1.25.
Twiddy SS, Woelk CH, Holmes EC: Phylogenetic evidence for adaptive evolution of dengue viruses in nature. J Gen Virol. 2002, 83 (Pt 7): 1679-1689.
Nielsen R, Bustamante C, Clark AG, Glanowski S, Sackton TB, Hubisz MJ, Fledel-Alon A, Tanenbaum DM, Civello D, White TJ, Sninsky J, Adams MD, Cargill M: A scan for positively selected genes in the genomes of humans and chimpanzees. PLoS Biol. 2005, 3 (6): e170-10.1371/journal.pbio.0030170.
Chapman MA, Leebens-Mack JH, Burke JM: Positive selection and expression divergence following gene duplication in the sunflower CYCLOIDEA gene family. Mol Biol Evol. 2008, 25 (7): 1260-1273. 10.1093/molbev/msn001.
Wong WS, Yang Z, Goldman N, Nielsen R: Accuracy and power of statistical methods for detecting adaptive evolution in protein coding sequences and for identifying positively selected sites. Genetics. 2004, 168 (2): 1041-1051. 10.1534/genetics.104.031153.
Yang Z: PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci. 1997, 13 (5): 555-556.
Zhang JZ, Nielsen R, Yang ZH: Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. Mol Biol Evol. 2005, 22 (12): 2472-2479. 10.1093/molbev/msi237.
Yang Z, Nielsen R, Goldman N, Pedersen AM: Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics. 2000, 155 (1): 431-449.
Sawyer S: Statistical tests for detecting gene conversion. Mol Biol Evol. 1989, 6 (5): 526-538.
Smith JM: Analyzing the Mosaic Structure of Genes. J Mol Evol. 1992, 34 (2): 126-129.
Bruen TC, Philippe H, Bryant D: A simple and robust statistical test for detecting the presence of recombination. Genetics. 2006, 172 (4): 2665-2681. 10.1534/genetics.105.048975.
Jakobsen IB, Easteal S: A program for calculating and displaying compatibility matrices as an aid in determining reticulate evolution in molecular sequences. Comput Appl Biosci. 1996, 12 (4): 291-295.
Posada D, Crandall KA, Holmes EC: Recombination in evolutionary genomics. Annu Rev Genet. 2002, 36: 75-97. 10.1146/annurev.genet.36.040202.111115.
Posada D: Evaluation of methods for detecting recombination from DNA sequences: empirical data. Mol Biol Evol. 2002, 19 (5): 708-717.
Posada D, Crandall KA: Evaluation of methods for detecting recombination from DNA sequences: computer simulations. Proc Natl Acad Sci USA. 2001, 98 (24): 13757-13762. 10.1073/pnas.241370698.
Rice P, Longden I, Bleasby A: EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 2000, 16 (6): 276-277. 10.1016/S0168-9525(00)02024-2.
Benjamini Y, Hochberg Y: Controlling the False Discovery Rate: a Practical and Powerful Approach to Multiple Testing. J Royal Statis Soc B. 1995, 57 (1): 289-300.
Storey JD, Tibshirani R: Statistical significance for genomewide studies. Proc Natl Acad Sci USA. 2003, 100 (16): 9440-9445. 10.1073/pnas.1530509100.
Center for Disease Control and Prevention (CDC): Salmonella surveillance: Annual Summary, 2004. Atlanta, Georgia: US Department of Health and Human Services, CDC, [http://www.cdc.gov/ncidod/dbmd/phlisdata/salmonella.htm]
Anisimova M, Nielsen R, Yang Z: Effect of recombination on the accuracy of the likelihood method for detecting positive selection at amino acid sites. Genetics. 2003, 164 (3): 1229-1236.
McClelland M, Sanderson KE, Spieth J, Clifton SW, Latreille P, Courtney L, Porwollik S, Ali J, Dante M, Du FY, Hou SF, Layman D, Leonard S, Nguyen C, Scott K, Holmes A, Grewal N, Mulvaney E, Ryan E, Sun H, Florea L, Miller W, Stoneking T, Nhan M, Waterston R, Wilson RK: Complete genome sequence of Salmonella enterica serovar typhimurium LT2. Nature. 2001, 413 (6858): 852-856. 10.1038/35101614.
Parkhill J, Dougan G, James KD, Thomson NR, Pickard D, Wain J, Churcher C, Mungall KL, Bentley SD, Holden MTG, Sebaihia M, Baker S, Basham D, Brooks K, Chillingworth T, Connerton P, Cronin A, Davis P, Davies RM, Dowd L, White N, Farrar J, Feltwell T, Hamlin N, Haque A, Hien TT, Holroyd S, Jagels K, Krogh A, Larsen TS, Leather S, Moule S, O'Gaora P, Parry C, Quail M, Rutherford K, Simmonds M, Skelton J, Stevens K, Whitehead S, Barrell BG: Complete genome sequence of a multiple drug resistant Salmonella enterica serovar Typhi CT18. Nature. 2001, 413 (6858): 848-852. 10.1038/35101607.
Koonin EV, Wolf YI: Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world. Nucleic Acids Res. 2008, 36: 6688-6719. 10.1093/nar/gkn668.
Koski LB, Morton RA, Golding GB: Codon bias and base composition are poor indicators of horizontally transferred genes. Mol Biol Evol. 2001, 18 (3): 404-412.
Didelot X, Achtman M, Parkhill J, Thomson NR, Falush D: A bimodal pattern of relatedness between the Salmonella Paratyphi A and Typhi genomes: convergence or divergence by homologous recombination?. Genome Res. 2007, 17 (1): 61-68. 10.1101/gr.5512906.
Hacker J, Kaper JB: Pathogenicity islands and the evolution of microbes. Annual Rev Microbiol. 2000, 54: 641-679. 10.1146/annurev.micro.54.1.641.
van Asten AJ, van Dijk JE: Distribution of "classic" virulence factors among Salmonella spp. FEMS Immunol Med Microbiol. 2005, 44 (3): 251-259. 10.1016/j.femsim.2005.02.002.
Morgan E, Campbell JD, Rowe SC, Bispham J, Stevens MP, Bowen AJ, Barrow PA, Maskell DJ, Wallis TS: Identification of host-specific colonization factors of Salmonella enterica serovar Typhimurium. Mol Microbiol. 2004, 54 (4): 994-1010. 10.1111/j.1365-2958.2004.04323.x.
Kiss T, Morgan E, Nagy G: Contribution of SPI-4 genes to the virulence of Salmonella enterica. FEMS Microbiol Lett. 2007, 275 (1): 153-159. 10.1111/j.1574-6968.2007.00871.x.
Wood MW, Jones MA, Watson PR, Hedges S, Wallis TS, Galyov EE: Identification of a pathogenicity island required for Salmonella enteropathogenicity. Mol Microbiol. 1998, 29 (3): 883-891. 10.1046/j.1365-2958.1998.00984.x.
Knodler LA, Celli J, Hardt WD, Vallance BA, Yip C, Finlay BB: Salmonella effectors within a single pathogenicity island are differentially expressed and translocated by separate type III secretion systems. Mol Microbiol. 2002, 43 (5): 1089-1103. 10.1046/j.1365-2958.2002.02820.x.
Morgan E: Salmonella Pathogenicity Islands. Salmonella: Molecular Biology and Pathogenesis. Edited by: Rhen M, Maskell D, Mastroeni P, Threlfall J. 2007, Norfolk: Horizon Bioscience, 67-88.
Folkesson A, Lofdahl S, Normark S: The Salmonella enterica subspecies I specific centisome 7 genomic island encodes novel protein families present in bacteria living in close contact with eukaryotic cells. Res Microbiol. 2002, 153 (8): 537-545. 10.1016/S0923-2508(02)01348-7.
Eswarappa SM, Janice J, Nagarajan AG, Balasundaram SV, Karnam G, Dixit NM, Chakravortty D: Differentially evolved genes of Salmonella pathogenicity islands: insights into the mechanism of host specificity in Salmonella. PLoS ONE. 2008, 3 (12): e3829-10.1371/journal.pone.0003829.
Tracz DM, Tabor H, Jerome M, Ng LK, Gilmour MW: Genetic determinants and polymorphisms specific for human-adapted serovars of Salmonella enterica that cause enteric fever. J Clin Microbiol. 2006, 44 (6): 2007-2018. 10.1128/JCM.02630-05.
Roca I, Torrents E, Sahlin M, Gibert I, Sjoberg BM: NrdI essentiality for class Ib ribonucleotide reduction in Streptococcus pyogenes. J Bacteriol. 2008, 190 (14): 4849-4858. 10.1128/JB.00185-08.
Samant S, Lee H, Ghassemi M, Chen J, Cook JL, Mankin AS, Neyfakh AA: Nucleotide biosynthesis is critical for growth of bacteria in human blood. PLoS Pathog. 2008, 4 (2): e37-10.1371/journal.ppat.0040037.
Buchmeier NA, Libby SJ, Xu Y, Loewen PC, Switala J, Guiney DG, Fang FC: DNA repair is more important than catalase for Salmonella virulence in mice. J Clin Invest. 1995, 95 (3): 1047-1053. 10.1172/JCI117750.
Negm RS, Pistole TG: The porin OmpC of Salmonella typhimurium mediates adherence to macrophages. Can J Microbiol. 1999, 45 (8): 658-669. 10.1139/cjm-45-8-658.
Ho TD, Slauch JM: OmpC is the receptor for Gifsy-1 and Gifsy-2 bacteriophages of Salmonella. J Bacteriol. 2001, 183 (4): 1495-1498. 10.1128/JB.183.4.1495-1498.2001.
Jiggins FM: Adaptive evolution and recombination of Rickettsia antigens. J Mol Evol. 2006, 62 (1): 99-110. 10.1007/s00239-005-0080-9.
Pages JM, James CE, Winterhalter M: The porin and the permeating antibiotic: a selective diffusion barrier in Gram-negative bacteria. Nat Rev Microbiol. 2008, 6 (12): 893-903. 10.1038/nrmicro1994.
Alcaine SD, Warnick LD, Wiedmann M: Antimicrobial resistance in nontyphoidal Salmonella. J Food Protect. 2007, 70 (3): 780-790.
Medeiros AA, O'Brien TF, Rosenberg EY, Nikaido H: Loss of OmpC porin in a strain of Salmonella typhimurium causes increased resistance to cephalosporins during therapy. J Infect Dis. 1987, 156 (5): 751-757.
This work was partially supported by USDA Special Research Grants 2005-34459-15625 and 34459-16952-06 (to MW) and the National Institute of Allergy and Infectious Disease (NIAID) National Institute of Health (NIH), US Department of Health and Human Services under contract N01-AI-30054 (to Lorin Warnick). The computer cluster used in the data analysis is partially funded by Microsoft. The authors thank Paige Smith for help with DNA sequencing.
YS performed, and interpreted the phylogenetic and statistical analyses, performed some sequencing experiments, and drafted the manuscript. RHO outlined the phylogenetic and statistical analyses and helped with their performance and interpretation as well with drafting the manuscript. LR performed some sequencing experiments. QS performed orthologous gene clustering and alignment, and implemented the analysis on the parallel computer cluster. MW supervised the project, participated in the design of the study and data interpretation, and finalized the manuscript. All authors read and approved the final manuscript.
Electronic supplementary material
Additional file 1: Additional completed Salmonella genomes available in Genbank as of 08/20/2009, but not used in our study.(XLS 25 KB)
Additional file 2: Salmonella isolates (n = 42) used to verify genome wide positive selection and recombination patterns in four selected genes.(DOC 82 KB)
Additional file 3: PCR conditions and primers for the four genes that were used to verify genome wide positive selection and recombination patterns in an additional 42 Salmonellaisolates.(DOC 34 KB)
Additional file 4: Detailed information for 81 genes showing evidence for positive selection from initial results for positive selection analysis (performed using all genes, including those with evidence of recombination).(XLS 98 KB)
Additional file 5: Detailed information for 41 genes showing evidence for positive selection from positive selection analysis for genes without recombination.(XLS 82 KB)
About this article
Cite this article
Soyer, Y., Orsi, R.H., Rodriguez-Rivera, L.D. et al. Genome wide evolutionary analyses reveal serotype specific patterns of positive selection in selected Salmonella serotypes. BMC Evol Biol 9, 264 (2009). https://doi.org/10.1186/1471-2148-9-264