Two major clades of Halobacteriales in the LeuRS phylogeny
aaRS are ancient enzymes that catalyze the attachment of tRNA with its cognate amino acid during the translation process. This function is essential in maintaining the fidelity of the genetic code and all 20 aminoacyl-tRNA species are essential for all living organisms. Although aaRSs are part of the conserved "information processing and storage" gene set, aaRS are frequently transferred across species boundaries and even between domains [18–20], most likely due to the limited interactions with other biomolecules [18].
Phylogenetic reconstruction using the amino acid sequences of LeuRS from Bacteria, Archaea and Eukarya shows the expected canonical pattern of having the archaeal and bacterial versions as distinct clusters, and the archaeal and eukaryal clades as sister groups (Figure 1). Within the Archaea, the two major phyla, Crenarchaeota and Euryarchaeota, can be distinguished (the other proposed archaeal phyla are not labeled; see Additional file 1: Figure S1 for their phylogenetic position). The LeuRS tree shows clustering of sequences into major phyla that suggests an evolutionary history largely dominated by vertical inheritance (Additional file 1: Figure S1.
The existence of two distinct groups of Halobacteriales in this LeuRS phylogeny is noteworthy. A smaller group of haloarchaea clusters within the Euryarchaeaota as expected [8, 21, 22] and a larger group is located at the base of the bacterial domain (Figure 1). We refer to the archaeal version of LeuRS in Halobacteriales as LeuRS-A and the bacterial version as LeuRS-B (cf. Figure 2). The extremely deep branch of the larger Halobacteriales clade relative to the rest of the Bacteria suggests an ancient horizontal acquisition of leuS from an unknown source, most likely from a relative of the ancestor of the Bacteria, to the Halobacteriales. The donor and the recipient may not have lived at the same time, and the transfer might have involved an intermediate carrier.
A single protein can contain parts that differ in phylogeny and substitution rates. We used GARD (Genetic Algorithm for Recombination Detection [23]) to investigate if different parts of the LeuRSs in haloarchaea have different histories. Using MUSCLE [24] and SATé [25] alignments, GARD determined breakpoints corresponding to position 780 (MUSCLE) and 628 (SATé) in the Halogeometricum LeuRS sequence, respectively. Further inspection of the multiple sequence alignment revealed that most of the phylogenetic information distinguishing the archaeal and bacterial type LeuRSs is contained in the larger amino terminal part of the alignment. This part contains the domain that catalyzes the esterification between leucine and tRNA, and contains many positions universally conserved between the domains. The carboxy terminal part of the alignment encodes the tRNA recognition domain. While GARD found a significant difference between the tree topologies determined for the two parts of the multiple sequence alignment, in both phylogenies reconstructed separately for the two parts the SATé alignment, the LeuRS-B sequences group at the base of the bacterial homologs, whereas LeuRS-A group with the euryarchaeal homologs (see Additional file 2: Figure S2). The role of the two parts of LeuRS in interacting with tRNALeu are illustrated in Additional file 3: Figure S3. Using the breakpoint from the GARD analysis of the MUSCLE alignment resulted in a carboxy terminal portion that was too short for reliable phylogenetic reconstruction. It is noteworthy that in the maximum likelihood phylogeny for this short fragment all haloarchaea grouped together, albeit with a bootstrap support value of only 47%. As most of the haloarchaeal fragments failed a chi-square test for compositional homogeneity, this finding may reflect a shared compositional bias in the haloarchaeal sequences, although the possibility that the carboxyterminal part of LeuRS might have a different evolutionary history from the rest of the enzyme cannot be excluded.
To explore the possibility that placement of the haloarchaeal LeuRS-B reflects an artifact created through long branch attraction, we calculated the pairwise distances between representatives of the bacterial LeuRS (Salinibacter ruber and Halanaerobium prevalens), archaeal LeuRS (Haloferax volcanii, Halogeometricum borinquense, Methanocorpusculum labreanum, Pyrococcus furiosus), haloarchaeal LeuRS-B (the two LeuRS-B copies in Halomicrobium mukohataei and Haloterrigena turkmenica) and the outgroup (Isoleucyl-tRNA synthetase from Methanopyrus kandleri and Thermotoga maritima). Mean pairwise distances from the outgroup do not show significant differences (0.5364 ± 0.0511 for the archaeal LeuRS, 0.3915 ± 0.0268 for the bacterial LeuRS, and 0.4038 ± 0.0791 for the haloarchaeal LeuRS-B). Analysis of compositional homogeneity using chi-square test as implemented in the program TREE-PUZZLE [24] indicated that the LeuRS-B sequences do not have atypical composition (P > 0.05). We do not find evidence that the placement of haloarchaeal LeuRS-B at the base of the bacterial homologs is due to an artifact created by these sequences being more divergent or having a different composition, and we find no indication of a close association of Halobacteriales LeuRS-B sequences with any specific bacterial or archaeal group. Nevertheless, artifacts created in the alignment certainly have the potential to increase apparent support values, thus a placement of the LeuRS-B sequences within the cluster of bacterial homologs cannot be excluded.
We performed more detailed phylogenetic analyses of the two haloarchaeal clusters and their closest relatives to determine the phylogenetic relationships among the members of each group (Figure 2). We analyzed 14 haloarchaeal genomes that were available in the NCBI completed microbial genome database. Out of these, only three genomes carry the LeuRS-A form – Haloferax volcanii, Halogeometricum borinquense and Haladaptatus paucihalophilus. Their sequences show close affinities to members of the Methanomicrobiales and Methanobacteriales (Figure 2a). The bacterial version LeuRS-B exhibits a more complicated picture (Figure 2b). Two highly-supported clusters can be observed, which we refer to as B’ and B”. In five of the genomes included in this study (Natrialba magadii, Haloterrigena turkmenica, Halomicrobium mukohataei, Haloarcula marismortui and Halorhabdus utahensis), both B’ and B” are present. Two possible scenarios can explain the observed distribution of LeuRS-B. The observation that B' and B'' group together at the base of the bacteria indicates their divergence occurred either in the donating lineage, or following the transfer. The two distinct scenarios are (a) the B form was already present in the haloarchaeal ancestor; versus (b) the B form was later acquired, but spread to different haloarchaeal groups through biased gene transfer [14].
Supporting evidence for the second scenario is observed in the genomic region around B’ and B”. The two B forms do not sit in the same genomic neighborhood and do not exhibit synteny in Halobacteriales species that possess the B form (Figure 3). Also, genes flanking the B’ form are not conserved among the different organisms carrying the B’ and the same is true for the gene neighborhood of B”. In contrast, genomic neighborhoods of LeuRS-A demonstrate synteny in terms of gene identity and order. Methanogenic archaea also reveal synteny for their gene coding for LeuRS, suggesting that the A form has undergone vertical transmission and/or gene transfer followed by homologous recombination. The B form of the enzyme, however, appears to have been transferred among the Halobacteriales species involving non-homologous recombination into different parts of the recipients’ genomes. If a second LeuRS is integrated into a genome by non-homologous recombination, following a period of coexistence, one of the two homologs may eventually be lost. If the distribution of the two LeuRS-B forms had been generated through gene loss alone, we would expect to see syntenic regions around the gene coding for the B’ form and syntenic regions around the gene coding B”, and these two regions would be distinct from each other. While we do not detect any synteny in our sample of LeuRS-B forms, we cannot rule out the alternative explanation that genomic regions encoding the LeuRS-B forms experienced more frequent rearrangements than regions harboring the LeuRS-A forms.
A second line of support for HGT of the two B forms comes from parametric bootstrapping analysis as implemented in LGT3State [26]. In this test, the null model requires that HGT is absent in the evolution of the LeuRS-B genes and that gene loss events can explain the distribution patterns. This model implies that the most recent Halobacteriales ancestor carried both types of LeuRS-B. The second model assumes gene losses and gains of the alternative forms can occur, that is, a genome carrying LeuRS-B’ can gain the LeuRS-B”, and vice versa, resulting in a genome with both types present, from which one type may eventually be lost. Using the LGT3State program [26], we generated 1000 bootstrap distributions under the gene loss only model. Thus, we have 1000 datasets reflecting the outcomes under the null model, which are compared to the real data. The distribution of the 1000 likelihood values gives us a measure of what to expect under the null hypothesis. The log-likelihood values obtained for the bootstrapped samples evaluated under the HGT model ranged from −43.2 to −49.6, and are much lower than the log-likelihood values when assuming the HGT model for the original tree (−6.35). Hence, we can reject the gene loss only model with a significance level of P<0.001.
Interestingly, we also observed that no genome possesses only the B” form (Figure 2b), i.e., B” is always found to co-exist with the B’. For the genomes that carry the two B copies, maintenance of the two functionally identical enzymes likely confers a selective advantage to the host. In bacteria, differential sensitivity of multiple copies of aaRS with redundant functions may benefit the organism against naturally occurring antibiotics [27]. The antibiotic capabilities of Archaea have only recently been investigated. Peptide antibiotics produced by some members of the Archaea, referred to as archaeocins, have been identified from haloarchaea and Sulfolobus and were reported to exhibit cross-kingdom toxicity [28]. A recent study showed that methanogenic archaea exhibit differences in susceptibility to various antibiotics, such as ampicillin, streptomycin, gentamicin, rifampicin, ofloxacin, tetracycline [29]. It is also possible that there is a difference in the functional efficiency of the two LeuRS-B forms, with B” being less efficient in aminoacylating some of its cognate tRNAs. This may be similar to the intragenomic heterogeneity in the ribosomal operons of Haloarcula marismortui, which exhibit differences in gene expression under different environmental conditions [30]. Alternatively, the functioning enzyme may consist of a B'B'' heterodimer, allowing more degrees of freedom to accommodate destabilizing mutations [31], as observed in Aquifex aeolicus[32, 33]; the transition from a homo- to a heterodimer initially might not have been adaptive, but the resulting heterodimer nevertheless may be under strong purifying selection [34]. However, the latter scenario is unlikely as the genes encoding the B' and B" forms are located in different parts of the genomes (Figure 3).
Haladaptatus paucihalophilus possesses both the A and the B’ form of LeuRS (Figure 2). Both copies are located adjacent to each other and are divergently transcribed. Two of its flanking genes (coding for a thermosome subunit and alanine dehydrogenase) are also found in the genomic neighborhood of leuS in the other two haloarchaea that possess only the A form (Haloferax volcanii and Halogeometricum borinquense; Figure 3). This is compatible with the scenario that Haladaptatus originally had the A form and has subsequently acquired the B’ form through HGT from another haloarchaeon.
The archaeal and bacterial forms of LeuRS are significantly distinct from each other (Additional file 4: Table S1). The identities between the A and B forms range from 21-26%, reflecting the very deep divergence that gave rise to these two forms. In contrast, the two LeuRS-B forms exhibit 46–53% identity between the two B-types suggesting a more recent divergence event.
Scattered distribution of the different LeuRS in the Halobacteriales
Previous studies have reported the challenge of using the 16S rRNA phylogeny to determine the evolutionary relationships of the Halobacteriales [35]. Two factors have been implicated: the presence of multiple divergent copies of this gene in a single genome in many haloarchaeal species and that recombination of the rRNA gene occurs frequently between species [36]. Paralogous copies of rRNA operons in these organisms have been reported to show more than 5% divergence [35], and identical sequences have been found in strains that are otherwise clearly differentiated, making it difficult to establish accurate Halobacteriales relationships.
In light of the problems posed by using 16S rRNA sequences in haloarchaeal phylogeny, alternative markers have been used to establish relationships within the Halobacteriales. The RNA polymerase subunit B’ (RpoB’) has been put forward to be a more useful alternative [37, 38], but it is also subject to HGT. More recently, the multilocus sequence analysis (MLSA) approach has been demonstrated to effectively discriminate among strains and species in the Halobacteriales [39]. Using this method, we concatenated the amino acid sequences of five housekeeping proteins from the 14 Halobacteriales species that we used in the LeuRS phylogeny. Phylogenetic reconstruction revealed the two highly supported clades (Figure 4), similar to the results of [39]. In the MLSA tree in our study, Clade I consists of Haloterrigena and Natrialba, while Clade II is comprised of Halogeometricum, Haloquadratum, Haloferax and Halorubrum (Figure 4). We also obtained another highly supported group, consisting of Haloarcula Halomicrobium and Halorhabdus (Figure 4). For the purposes of this study, we will refer to the third group as clade III. This phylogeny is also similar to one obtained from concatenated ribosomal proteins (Williams, Gogarten, Papke, personal communication) and the phylogeny inferred from a 3,853 gene supermatrix [40]. In particular, the three major groups of haloarchaea were also identified in these studies.
Mapping the presence and absence of the three LeuRS in the MLSA tree shows that all species belonging to clades I and III possess both B forms of the bacterial LeuRS. Given that LeuRS genes were frequently transferred within the haloarchaea, we do not interpret the co-occurrence of the B' and B" forms as shared derived character for clade I and III. For the archaeal version (LeuRS-A), we observed a dispersed distribution, mostly in branches that appear to have diverged more recently. If we consider the MLSA tree as a suitable representation of the species phylogeny of this group, and only take into account the distribution of LeuRS types within this group, then the initial assumption would be that the ancestor of the Halobacteriales possessed the bacterial form of LeuRS. However, another more likely scenario is that the presence of the archaeal version of the enzyme (LeuRS-A) is the ancestral state in the Halobacteriales. The clustering of the haloarchaeal LeuRS-A cluster within the euryarchaeal homologs, specifically with those from methanogens, would indicate shared ancestry [21, 22], and the archaeal LeuRS would be vertically inherited by the Halobacteriales. The single divergence event that gave rise to the B' and B'' forms likely took place early in the evolution of the Halobacteriales, followed by the spread or retention of both forms of LeuRS-B within the order.
Assuming that the Halobacteriales ancestor originally possessed the archaeal form acquired through vertical inheritance from the common ancestor of all Archaea, it later on gained the bacterial LeuRS through horizontal transfer from a deep branching bacterial lineage, possibly still unsampled or now extinct. The finding that the haloarchaeal LeuRS-B diverged before the homologs found in bacteria suggests that either the lineage donating LeuRS-B to the haloarchaea or the haloarchaea themselves coexisted with the bacterial most recent common ancestor. More than one lineage could have carried the bacterial version of LeuRS before it was transferred to the haloarchaea; however, provided that the deep branching of the haloarchaeal LeuRS form B is not an artifact, all the scenarios imply that the bacterial version now residing in the haloarchaea, coexisted with the ancestor of the bacterial domain. Following transfer to the haloarchaea, the bacterial form spread among the majority of the Halobacteriales through vertical inheritance and HGT biased toward close relatives [14, 41], with some species possessing one form while in others, both forms of the bacterial LeuRS are retained.
We then compared the LeuRS-A (Figure 2a) and LeuRS-B (Figure 2b) with the MLSA tree (Figure 4) to see if there are any conflicting topologies between them. For LeuRS-A, we observed similarity regarding the placement of the three species. Haloferax and Halogeometricum group together, and Haladaptatus is found at the base (Figure 2a). The topology of the LeuRS-B” tree was also similar to the MLSA tree, except for the placement of Halorhabdus (Figure 2b). This, however, is not highly supported and therefore we cannot draw any conclusion from it. In LeuRS-B”, the groupings of Natrialba and Haloterrigena, and of Haloarcula and Halomicrobium are similar to what we found in the MLSA tree. In comparing the LeuRS-B’ and the MLSA tree, we also observed the same clustering of the above mentioned two pairs of haloarchaea. An important conflict, however, is the phylogenetic position of Halomicrobium; the MLSA tree places it in clade III, while in the LeuRS-B’ tree, its position is highly supported at the base of the clade II (Figure 2a). Within clade III of the MLSA tree, Haloarcula and Halomicrobium have a closer relationship than with Halorhabdus. Hence, the LeuRS-B’ topology indicates a transfer from clade II to Halomicrobium. Another possible conflict is that of Natronomonas, which clusters with the clade II species in the LeuRS tree.
Topologies of the MLSA tree and each of the LeuRS trees indicate that (1) the Halobacteriales came to possess the archaeal form through common ancestry with the rest of the Archaea that was eventually lost in a majority of the Halobacteriales, and (2) the bacterial LeuRS types were vertically and horizontally inherited within the group. We can be certain that at least one HGT event took place – the transfer from a deep branching, currently unsampled bacterial lineage diverging most likely before the bacterial common ancestor to the Halobacteriales.
Archaeal tRNALeu phylogeny shows two groups of haloarchaea
Transfer RNAs (tRNAs) are considered to be one of the primordial molecules that arose in the RNA world before protein biosynthesis emerged on Earth. They are a critical component in the translation machinery, linking their anticodon triplet between the mRNA and amino acid. To determine if the divergence of LeuRS influenced the evolutionary route of their cognate tRNA, phylogenetic reconstruction of the archaeal tRNALeu sequences was performed (Figure 5). We did not obtain high bootstrap support for the tRNALeu tree due to the short sequences of tRNA molecules. The length of canonical tRNA sequences is only about 76 nucleotides [42] and this does not provide sufficient phylogenetic information for a well-resolved phylogeny. However, both maximum likelihood and Bayesian methods revealed similar results.
Superficially similar to the LeuRS tree, two main groups of Halobacteriales are found in the tRNALeu tree (Figure 5). However, the distribution of the haloarchaea into the two groups differs significantly from that found in their corresponding synthetase tree. In the LeuRS tree, the smaller group of Halobacteriales consists of Haloferax, Halogeometricum and Haladaptatus, and the majority is found in a bigger cluster distinct from it (Figure 2). In contrast, the three genera mentioned above do not group together in the tRNALeu tree (Figure 5). One cluster consists of Haloferax and Haladaptatus, together with Haloarcula, Halobacterium, Halorhabdus, Natronomonas, Haloquadratum, Natrialba and Halorubrum. A second cluster is comprised of Halogeometricum, Haloterrigena, Halomicrobium and Halalkalicoccus.
The discovery of the conflicting groupings of haloarchaea in the LeuRS and the tRNALeu phylogenies begs the question of the evolution regarding LeuRS-tRNALeu metabolic interaction in these organisms. Our results suggest that the evolutionary route that the haloarchaeal tRNALeu took was independent of the evolution of the aaRS that aminoacylates it. This implies that the LeuRS and tRNALeu can be horizontally acquired independently, and one does not seem to strongly restrict the evolution of the other. tRNAs are often involved in HGT, with many found in close proximity to mobile elements and genomic islands [43]. The lack of co-evolution we find for tRNALeu and LeuRS is in contrast to the finding that human but not E. coli TyrRS could complement yeast whose TyrRS gene had been disrupted [44]. However, this reported "species specificity" was found to be due to a small peptide element in TyrRS, whose modification allowed the switching of species-specific aminoacylation across taxonomic domains [44].
The horizontal acquisition of aaRS of the same specificity might reflect a stochastic event in the evolution of these ancient enzymes. Numerous HGT events have been reported in many aaRS of different amino acid specificity, and these involved transfers at different taxonomic levels [18–20]. If these enzymes have been undergoing horizontal transfers in many extant lineages without affecting the evolution of their cognate tRNA, we cannot exclude the possibility that these transfers occurred without any impact to their aminoacylation capacities. Hence, the frequent transfers and current distribution of aaRS may instead reflect neutral stochastic transfers [45] and replacements. On the other hand, different aaRS forms in some instances were shown to provide differential sensitivity to naturally occurring antibiotics (see discussion in [46]). The possibility of selection through antibiotic resistance is seen in duplicate forms of same-specificity aaRS in Bacteria [47–49], and was suggested as a possible driving force behind the replacement of aaRS homeoalleles [46]. However, this hypothesis still requires further investigation.