A novel phylogenomic strategy identifies the proteome of the urancestor
A number of recent studies have focused on the complexity of the urancestor, its primordial functions, alternative rootings of the tree of life, and the rampancy of HGT [6–12, 23, 46]. However, results are not congruent, the rooting of the tree of life is still controversial, and the make up of the urancestor remains shrouded in mystery. It is indeed difficult to characterize an entity that existed billions of years ago using information in molecules of life that are modern. When using for example nucleic acid or protein sequences, the pervasive effects of mutation can cloud (saturate) any significant evolutionary signal. Some early studies focused on genomic sequence conservation but suffered for example from the effects of genetic losses and take-overs (e.g., non-orthologous gene displacement [9–11]). The analysis we here report is based on genomic content of protein domain structures at FSF level of the structural hierarchy. FSFs are much more conserved than protein sequence and are highly shared by organismal lineages. About half of FSFs (682 out of 1,420) in the 420 FL proteomes we analyzed in this study are common to the three superkingdoms (Venn diagram; Figure 2A). It is clear that FSFs are more robust against genetic losses and take-overs than corresponding sequences and carry deep evolutionary signatures [36, 42, 47, 48]. Furthermore, the distribution of FSFs in proteomes meets crucial phylogenetic marker criteria since: (i) it rarely changes relative to speciation events (SCOP domains and FSFs are discovered at average rates of once every ~0.1 and ~5 million years, respectively [27]), (ii) is minimally affected by HGT [37, 38], and (iii) is not under active natural selection (structural designs spread at rates of gene duplication and are vastly unaffected by change at sequence level [33]). We caution however that secondary adaptations in organism lifestyles such as parasitism could significantly affect FSF abundances in lineages undergoing reductive evolution. Given all these features, FSFs are well suited for inferring the make up of the urancestor.
Current top-down phylogenetic strategies used to build a tree of life generate unrooted trees and use deep taxa as outgroups a posteriori to root phylogenies. One major technical limitation of determining if given genes have an origin in the urancestor using this approach is the need of a universal tree that is accurately rooted. In some traditional studies that were based on sequence conservation, gene genealogies were largely dependent on the position of the root in trees that are used as reference (guide trees)[6, 8, 9, 15]. However, guide trees built from rRNAs or some ancient proteins, such as elongation factors and aminoacyl-tRNA synthases (aRSs), produced rooting scenarios that were not congruent. Inconsistencies of this kind and the possibility of unrecognized paralogy leading to incorrect gene trees have made urancestral gene assignments unreliable [15]. Although a recent technical advance makes it possible to infer ancestral states of a given gene content using the parsimony method (GeneTRACE [49]), it still requires organismal or genomic trees as guides and does not explore the effects of gene abundance. Due to these technical limitations, the first use of FSFs to make inferences of the urancestor was solely based on the distributions of FSFs in genomes (analogous to the f index we use here) without any phylogenetic consideration [11].
In contrast, our bottom-up phylogenetic strategy uses the Lundberg method [50] to generate rooted phylogenomic trees without the need of outgroups (Figure 1 and Additional file 1, Figures S1 and S2). Evolution's arrow is established directly by the evolutionary model, the rationale and assumptions of which have been recently reviewed [51]. Operationally, the tree reconstruction algorithm finds the shortest unrooted tree(s) without specifying character polarity and then roots the tree(s) by invoking a hypothetical ancestor defined by ancestral character states and selecting the rooted topology that minimizes overall tree length (see Methods). In this study, a phylogenomic tree that describes the evolution of 420 FL proteomes revealed the three superkingdoms as distinct groups and placed Archaea at the root, with a rooting that was internal (paraphyletic) to the superkingdom (Figure 1A). A tree of life describing the evolution of a balanced set of proteomes corresponding to the three superkingsdoms revealed the same diversification patterns (Additional file 1, Figure S2), suggesting that biases in taxon sampling do not affect the rooting of trees. The archaeal rooting of the tree of life has been reliably obtained in numerous studies with different proteomic sets [27, 35, 42] and is congruent with phylogenetic analysis of the structure of tRNA [22, 52], 5S rRNA [53] and RNase P [54], and of tRNA paralogs [55–58]. While its significance is not the focus and will not be discussed in this paper, a rooting in Archaea (see discussion in [42]) departs significantly from the 'canonical' bacterial rooting of the tree of life, which is traditionally derived from analyses of the sequence of ancient gene paralogs (e.g., ATPases, aaRSs, elongation factors). It thus questions the bacterial-like origin of cellular life inferred from sequence comparisons. In turn, the phylogenomic tree that describes the evolution of 1,420 FSF domain structures showed that the most ancient FSFs at the base of the tree (the basal_set) were shared by the three superkingdoms and were mostly universal (Figure 1B). Remarkably, the first loss of ancient FSFs occurred exclusively in archaeal lineages, an observation that also supports the ancestrality of Archaea. Again, patterns of distribution of FSFs in the trees were obtained congruently with numerous proteomic sets and releases of SCOP as genomic sequences and structures were acquired with time [27, 35, 42, 47, 59, 60].
Trees of proteomes and trees of FSFs are generated from the same genomic structural census but represent two sides of the same story (Figure 1). They describe the evolution of proteomes or the evolution of the FSF structures that make up the potein complement, respectively. Proteomes at the root of the tree of life are populated by FSFs that are shared by all three superkingdoms and proteomes at its crown are enriched in 'signature' FSFs that are unique to individual lineages. Relatively few signature FSFs exist that are specific to superkingdoms. Remarkably, proteome comparisons reveal these signatures are very unequally divided among superkingdoms [61] and already suggest (following parsimony thinking) that bacterial and archaeal lineages evolved from a primordial eukaryotic-like lineage by reductive loss [62]. Phylogenomic analysis confirms this reductive evolutionary tendency, showing that the first diversified lineage to emerge by loss of FSFs gives rise to Archaea, which in turn has the least number of signature FSFs and expresses the lowest levels of diversity and reuse of FSFs in nature [42]. The results we here report confirm once again these patterns (Figure 1B), indicating that the urancestral proteome is populated by ancestral sets of FSFs at the base of the tree of life that appeared in the tree of FSFs before the reductive evolutionary tendency in Archaea was evident [42].
In order to define the proteomic make up of the urancestor, we first identified a set of 352 primitive (plesiomorphic) FSFs (the 352_set) at the root branch of the tree of FL proteomes. All FSFs of the 352_set exhibited only gains in genomic abundance, most (314 FSFs) were common to the three superkingdoms, and interestingly, all were present in Archaea (see the four cells that are occupied in the Venn diagram; Figure 2A). However, tracing the 352_set FSFs in the tree of domain structures revealed the set was not conservative enough to define the urancestor. A timeline describing the age of each FSF unfolded directly from the tree shows that the FSFs that are not universally shared by superkingdoms appeared for the first time in evolution in the order: BE (nd = 0.210), AB FSFs (nd = 0.415), B (nd = 0.433), E (nd = 0.538), A (nd = 0.538) and AE (nd = 0.589) FSF groups (Figure 1B). Remarkably, no FSFs of the ancient BE-specific group were present in the 352_set, which contains besides the universal basal_set the more derived AB, AE and A groups. Similarly, the single A-specific FSF in the set (related to transcriptional regulation; d.236.1) cannot be part of the urancestor (Figure 2A). The FSF is not only absent in Bacteria and Eukarya but is also absent in nearly 50% of archaeal lineages examined (Additional file 1, Table S1). These results suggest many non-universal FSFs in the 352_set should not be considered urancestral and were the result of the 'modern effect'.
In order to decrease the number of false negatives, FSFs in the 352_set were assigned as initial characters in a parallel and iterative exercise of tree building and plesiomorphic character selection, with the goal of selecting for the most parsimonious tree of proteomes and the minimum number of FSFs. In each of 30 chains, 50 cycles of iteration dramatically reduced the space of urancestral FSFs (Figure 2A). The iterative procedure resulted in a more realistic max_set of 152 FSFs, which: (i) were common to the three superkingdoms (Figure 2A), (ii) excluded FSFs with decreased organismal distribution and genomic abundance levels (i.e with smaller values of f and mean of G; Figure 2C), and (iii) excluded FSFs that had ambiguous character-state changes in the root branch of the tree of proteomes (Additional file 1, Figure S3). Consequently, the iteration strategy works well to selectively filter false positives assigned to the root branch by the modern effect, decreases biases introduced by the archaeal rooting, and mitigates uncertainties in character-state reconstructions.
We note that real urancestral FSFs that have evolved with intensive losses in numerous proteomic lineages are still possible and their origins will be seen as more derived under the parsimony criterion. These false negatives in the urancestral set cannot be dissected from FSFs that diverged more recently. Since the initial 352_set includes a significant number of of FSFs in the proteomes examined (~25%), this initial large coverage shields against exclusion of unknown false negatives and makes the max_set a maximum bound for the urancestral proteome. On the other hand, false positives resulting from ancient HGT events [6, 40] can still occur. For example, FSFs that appeared soon after organismal diversification but transferred extensively to different lineages may be regarded as urancestral. The gap that exists between the discovery of urancestral FSFs and FSFs that emerged at the start of organismal diversification can be identified in the tree of protein domains, since this tree unfolds the evolutionary order of appearances of each of the 1,420 FSFs that are present in the modern proteomes we sampled (Figure 1B). Tracing the max_set FSFs in the tree of domain structures revealed that the set was not conservative enough to accurately define the urancestor and that many FSFs had low proteomic distribution (f) and abundance (mean of G) levels. We therefore defined a more conservative min_set by intersecting the max_set derived from phylogenetic iteration and the basal_set derived from the tree of domain structure (Figure 1B). The min_set excluded 82 FSF with relatively smaller f and mean of G values. This set can be considered a lower bound for the urancestral proteome.
The proteomic and functional complexity of the urancestor
We here define the urancestor as an entity that accumulated genetic information in a period that spans the emergence of life and the emergence of diversified cellular life. We also consider the urancestor as a primordial isoform of the modern ribonucleoprotein world, regardless of it being a single organism or a communal population [6], especially because it contains fully functional ribosomes (see below). We therefore compare the proteomic and functional sets of the two worlds, the ancient world of the urancestor and the modern world of extant organisms, and make inferences about biological complexity using information in molecules that are modern.
We find that the upper bound urancestral FSF max_set contains almost all essential biological processes, including crucial metabolism and transport activities linked to amino acids, nucleotides, carbohydrates, polysaccharides, and coenzymes, and functions associated with the Information (translation, DNA replication/repair, transcription, RNA processing), Intra-cellular processes (transport, protein modification, proteases), Regulation (e.g. kinases/phosphatases, DNA binding, RNA binding), and General (small molecule binding, protein interaction) categories (Figure 3, Additional file 1, Table S2). As expected, the set lacks the Extra-cellular processes category, which includes molecular functions linked to definition of self and inter-cellular interactions (toxins, cell adhesion, immunity, etc). Although some of the sub-categories (i.e. transcription, RNA processing, and RNA binding) were not present in the urancestral min_set, the functions of the two urancestral sets are similar and suggest a functional complex entity [8, 10]. However, the numbers of urancestral FSFs participating in individual subcategories were always smaller than those of FSFs in modern proteomes (Figure 4, Additional file 1, Table S2). Consequently, the functional repertoire of the urancestor while exhibiting almost all essential functions should be regarded as being simpler than the repertoire of modern proteomes. We suggest FSFs of this limited repertoire acted as melting pot for new molecular functions when organismal lineages emerged, with founder biological activities being primitive and relatively non-specific [19]. The development of the ribosome illustrates such an origin [18].
The numbers of the FSFs in major categories of the min_set, especially in Information, were smaller than those of the max_set (Figure 4). In general, informational genes tend to form multi-component complexes stabilized by protein-protein interactions. For this reason, it has been thought that these genes are refractory to HGTs [63]. The robustness of informational genes against transfer was previously contrasted with the rampant transfer among lineages of ancient metabolic (operational) genes [6, 8]. However, a recent study reveals HGT does not exhibit functional preferences and occurs randomly [64]. In turn, analysis of HGT in trees that describe the evolution of function directly from ontological data are congruent with our analysis and suggests a preferential role of HGT in shaping information-related functions [45]. Similarly, a recent comparative statistical analysis of homoplasy levels in trees of proteomes reveals information-related domains at FF level suffered limited but comparatively significant levels of lateral exchange [19]. These FFs were discovered quite early in protein evolution before and after the start of the diversified world. Thus, phylogenomic analysis of HGT of both biological functions and FSF structures can be used to explain why the min_set excludes preferentially these ancient HGT-susceptible FSFs. To further examine the role of ancient HGT processes on urancestral proteomic make up, we tested which of 49 functional sub-categories were preferentially enriched in the max_set and min_set relative to extant FL proteomes. Remarkably, only two functional categories, translation (Information) and nucleotide m/tr (Metabolism) were enriched in the max_set while metabolism-related transferases (Metabolism), small molecule binding (General), and nucleotide m/tr (Metabolism) functions were enriched in the min_set (Table 1). We note that the three sub-categories enriched in the min_set include FSF belonging to primordial metabolic folds (see below). While we do not know how many real urancestral FSFs that evolved without major HGT effects were excluded in the max_set to min_set transition, it is apparent that the proportion of horizontally transferred FSFs in the min_set is smaller than that in the max_set. Consequently, the absence of translation and presence of nucleotide-related metabolic activities in the enriched functions of the min_set provides statistical support to operational genes being more robust against ancient HGTs than translation-related informational genes, a result that is in contrast with previous proposals [6, 8]. We hypothetize that translation was necessarily simple and flexible during its early metabolic urancestral inception [19]. Fewer molecular interactions between components in a simpler translation system left HGT unchecked and free to shape the spread of FSFs that were recruited for the new translation functions. The numbers of translation FSFs are 6 and 34 in the min_set and max_set, respectively, which represent 6.7% and 38.2% of modern translational FSFs. While ancient HGT processes appear to have shaped the evolution of ancient translation-related genes during urancestral history, HGT may have not affected metabolism to such levels, especially because metabolism was already quite developed when translation materialized in evolution [19]. A detailed phylogenomic analysis of protein domain structure in metabolic networks reveals that the nine most ancient folds were responsible for the explosive appearance of most modern enzymatic functions [36]. A succession of recruitment gateways, each mediated by the discovery of a new fold showed metabolism originated in enzymes of nucleotide metabolism harboring the P-loop-containing NTP hydrolase fold (c.37), probably in pathways linked to the purine metabolic subnetwork [36]. Crucial FSFs of these primordial metabolic folds are part of the urancestral min_set and many are part of nucleotide metabolism subnetworks [e.g., P-loop-containing NTP hydrolases (c.37.1); ribulose-phosphate binding barrel (c.1.2); NAD(P)-binding Rossmann-fold domain (c.2.1); ribonuclease H-like (c.55.3)]. The congruent enrichment of nucleotide m/tr in both urancestral sets suggests operational genes encoding nucleotide metabolism were built in the urancestor and diverged vertically in primary lineages of the superkingdoms. Interestingly, highly conserved protein-encoding sequences related to nucleotide biosynthetic pathways, including putative phosphoribosyl pyrophosphate synthase and thioredoxin enzymes, were previously identified as being important part of the urancestral set [46]. This is also consistent with a study of physical clustering of genes in bacterial genomes, which also reveals the most ancient group of genes is related to metabolism [65]. Due to statistical limitations of the hypergeometric distribution significant enrichments could not be resolved for the remaining 5 informational and 14 operational categories. A more comprehensive study will be needed to evaluate the extent of HGT in whole sets of ancient operational and informational genes.
Finally, a comparison of structural and functional components of the urancestor and FL proteomes revealed the complexity of the make up of the urancestor relative to extant organisms. In terms of FSF repertoires, the numbers of distinct FSFs (diversity) of the urancestral sets were significantly smaller than those of each and every one of the FL organisms we analyzed (Figure 5A), even if FSF reuse was considered (Figure 5B). Our estimates therefore indicate that the FSF repertoire of the urancestor (70-152 FSFs) and its reuse in domains (303-507 domains) was at least 5 and 3 times smaller than that of extant FL organisms, respectively. Furthermore, the inclusion of artificial urancestral proteomes resurrected in silico in tree reconstructions generated trees of proteomes that always placed the urancestor at their base (Figure 6). These results support the distant relationship that exists between the urancestor with the simplest of extant proteomes, confirming the relative simplicity of the reconstructed ancestral entity. Similarly, phylogenetic reconstructions derived from functional data confirm urancestral functions were quantitatively and phylogenetically simpler than functions in any extant FL proteome (Figure 7). Consequently, the actual repertoire of the urancestor inferred from FSFs in modern proteomes, while relatively complex in the number of molecular functions it embodies, is closer to the simple progenote model and distant from the complex cenancestor model. We note however that proteins in the relatively simple FSF repertoire of the urancestor could have been non-specific, harboring a multiplicity of functions. These would have increased the effective complexity of this primordial organism. Furthermore, the complex functional repertoire we reveal suggests the urancestor was a quite advanced version of the progenote, with a multiplicity of metabolic and biosynthetic functions.
Emergence of translation and ribosomal machinery in the urancestor during the Late Archean
In a previous study, we used phylogenies of FSF and FF to study the emergence of the translation apparatus [19]. Dissection of first appearance of fundamental innovations in molecular machinery (evolutionary landmarks) associated with metabolism and translation revealed translation had metabolic origins. It appeared after the discovery of a large number of metabolic functions but before enzymes necessary for the synthesis of DNA. A clear timeline of molecular diversification was apparent, with domains associated with aminoacylation appearing first, immediately followed by molecular switches and regulatory factors important for tRNA shepherding and RNA transport. Additional file 1, Table S1 shows landmark domains that interact with RNA (some of which have metabolic roles) were present in the min_set, including class I (c.26.1, ndFSF = 0.064) and II catalytic (d.104.1, ndFSF = 0.128) and anticodon-binding (a.27.1; ndFSF = 0.141) domains of aRSs, GTP-binding (c.37.1, ndFSF = 0) and elongation factor (b.43.3, ndFSF = 0.128) domains of translation factors, and even ribonuclease P and PH domains (d.14.1, ndFSF = 0.059) crucial for endo- and exoribonucleolytic cleavage of RNA and nucleotydiltransferase activities necessary for damage repair. In contrast, none of domains present in ribonucleotide reductase enzymes responsible for producing the deoxyribonucleotide components necessary for DNA-linked functions, the ferritin-like domain (a.25.1, ndFSF = 0.242), N-terminal domain of cbl (a.48.1, ndFSF = 0.685), and the the PFL-like glycyl radical enzyme domain (c.7.1, ndFSF = 0.279), were present in the min_set. Only one of these domains, c.7.1, was present in the max_set. We note that the reduction of ribonucleotides to deoxyribonucleotides involves the production of an active site thiyl radical that requires contacts with cysteins in all protein domains of the catalytic subunit of the oligomeric enzymatic complex [66], suggesting modern ribonucleotide reductase functions is indeed derived. We also note that the active site domains of class III ribonucleotide reductases share the c.7.1 domain and the associated radical-based chemistry with pyruvate formate-lyase enzymes, a link proposed to have mediated the RNA-to-DNA biological transition [67]. However, phylogenomic analysis at FF level [19] suggests the pyruvate formate-lyase domain (c.7.1.1; ndFF = 0.518) emerged later than its ribonucleotide reductase counterpart (c.7.1.2; ndFF = 0.235). It is therefore likely that the urancestor stored genetic information as RNA and not DNA.
The set of FSFs of ribosomal proteins that are universal establish crucial contacts with substructures of the rRNA subunits in the ribosome and appear much later than aRSs and regulatory factors [19]. A careful phylogenetic analysis of ribosomal history directly from protein and RNA structure established the relative time of appearance of ribosomal proteins and rRNA substructures in the ribosome [18]. The study reveals that proteins and RNA co-evolved form the start and structures supporting protein synthesis appeared in a fundamental major transition once processivity functions involving interactions with transfer and templating RNA were already functional. A set of four FSFs were recruited into ribosomal function during this initial period, including ancient ribosomal proteins with OB-fold and related SH3-like small β-barrel folds. Remarkably, all of these FSFs belong exclusively to the urancestral min_set (Table 2). A set of 6 additional FSFs that associated with the ribosome immediately after the major transition but before the appearance of the L7/L12 protein complex [18], are all exclusively included in the urancestral max_set (Table 2). The L7/L12 complex crucially stimulates the GTPase activity of elongation factor G, a ribosomal factor that catalyzes elongation and enhances ribosomal processivity [68, 69]. Primordial protein synthesis was therefore active in the urancestor and the processivity and efficiency of the ribosome was actively improved during urancestral evolution.
In order to place the history of the urancestor and of the ribosome in a timeline, we used a molecular clock of protein domain structure to define evolutionary timescales [20]. Using a clock derived from the tree of FSFs of Figure 1B but using the calibration points of Wang et al. [20], the appearance of the youngest FSFs in the urancestral min_set and max_set suggests organismal diversification was established sometime between 2.9 and 2 Ga ago (Figure 8). Remarkably, the earliest date coincides with the discovery of arobic metabolism and the start of planet exigenation [20] that lead to the Great Oxidation Event [70], a geological time where oxygen reached 1% of present atmospheric levels. We note that integration of molecular, physiological, paleontological, and geochemical data suggests that a diversified clade of cyanobacteria with marked heterocyst and cell differentiation appeared no later than 2.1 Ga ago [71] and a number of FSFs linked to events of organismal diversification and the fossil record [20] (Figure 8) are only compatible with the existence of an urancestor before that time. This indicates the urancestral max_set is not conservative enough to accurately define the urancestor and that the min_set is clearly more appropriate. Remarkably, the first and second transitions in ribosomal evolution occurred before the youngest age of min_set FSFs, 3.04 and 2.41 Ga ago, respectively, suggesting efficient ribosomal protein synthesis was pre-requisite for organismal diversification and emerged prior to the aerobic metabolism and the start of planet oxygenation (Figure 8).
The diversification of cellular membranes marks the end of the urancestor
The chirality and chemistry of glycerol membrane lipids is different in Archaea (sn2,3 isoprenoid ether lipids) than in Bacteria and Eukarya (sn1,2 fatty acid ester lipids), a feature that is claimed to be important for the rise of a diversified organismal world [72–74]. In fact, a widely popular model for organismal diversification is the existence of heterochiral glycerolipids in the primordial membranes of the urancestor, which were synthetized as racemates but then segregated in sn1,2 and sn2,3 lineages during organismal diversification [75]. These different chiral forms are synthetized in two different metabolic pathways that start with the reduction of a keto group from dihydroxyacetone phosphate (DHAP or glycerone phosphate) and use two different stereochemistry-specific glycerol phosphate backbones. In Bacteria and Eukarya, the synthesis of sn1,2 fatty acid ester lipids starts with the convertion of DHAP to sn-glycerol 3-phosphate (G3P) by the activity of glycerol-3-phosphate dehydrogenase (G3PDH)(EC 1.1.1.94). This enzymatic reaction is the first step of pathways needed to produce the ester-fatty acid double layer typical of eukaryotes and mesophylic and psychrophilic bacteria.
In contrast, the first step in the synthesis of sn2,3 isoprenoid ether lipids in Archaea starts by the reduction of DHAP to sn-glycerol 1-phosphate (G1P), the enantiomer of G3P, by the Zn2+-dependent glycerol-1-phosphate dehydrogenase (G1PDH) metalloenzyme (EC 1.1.1.261). The biosynthesis of polar lipids in Archaea requires the activity of two additional enzymes, the (S)-3-O-geranylgeranylglyceryl phosphate synthase (GGGPS)(EC 2.5.1.41) and the (S)-2,3-di-O-geranylgeranylglyceryl phosphate synthase (DGGGPS)(EC 2.5.1.42) that together alkylate the hydroxy groups of G1P to give sn-3-O-(geranylgeranyl)glycerol 1-phosphate (GGGP) and 2,3-bis-O-(geranylgeranyl)glycerol (DGGGP) and later produce unsaturated archaetidic acid with geranylgeranyl chains and CDP-unsaturated archaeol through downstream activity of the CDP-archaeol synthase (EC 2.7.7.67) enzyme.
To date there are no known exceptions to a G1P backbone chemistry in Archaea and a G3P backbone chemistry in the other two superkingdoms. However, some extremophilic bacteria also contain membrane ether lipids (the typical archaeal trait), including di-glycerol ether lipids, tetraether non-isoprenoid lipids, and mixed ester-ether lipids, but they are all of the sn1,2 kind (reviewed in [12]).
Since DGGGPS is not stereospecific, the chirality of the isoprenoid ether lipids in Archaea appears entirely determined by GGGPS [76]. Remarkably, the 6-phosphogluconate dehydrogenase C-terminal domain-like (a.100.1; ndFSF = 0.110) FSF of the bacterial and eukaryal G3PDH (a.100.1.6; ndFF = 0.233) and the FMN-linked oxydoreductase domain (c.1.4; ndFSF = 0.114) FSF of the GGGPS (c.1.4.1; ndFF = 0.041) necessary for downstream membrane lipid biosynthesis in Archaea were present in the urancestral min_set and appeared in the timeline of domain architectures quite early in evolution. A min_set G3DPH suggests primordial sn1,2 fatty acid ester lipids of some kind were already present in the urancestor. In turn, a min_set GGGPS enzyme harboring the only TIM β/α-barrel fold-containing enzyme with a prenyltransferase function [76] suggests chirality-enabling enzymatic activities downstream of G1DPH were already present in the urancestor.
We note that CDP-archaeol synthase activity, which is downstream of G1PDG, does not require specificity for ester or ether bonds nor of glycerol phosphate enantiomers [77] and that full conversion of bacterial-like lipids to archaeal-like lipids requires the crucial discovery or recruitment of G1PDH metalloenzyme activities necessary for the production fo the G1P backbone [76]. The urancestral GGGPS catalyses the first CDP-archaeol biosynthesis pathway-specific step of isoprenoid ether lipids in Archaea but displays a strong preference for the G1P substrate [78]. G1PDH does not yet have a crystallographic structural entry. However, molecular modeling suggests the enzyme was derived from glycerol dehydrogenase (EC 1.1.1.6; e.22.1.2; ndFSF = 0.288; ndFF = 0.191) [79], which is present in the three superkingdoms and has the dehydroquinate synthase-like domain (e.22.1) FSF. A total of 110 UniProt entries with EC 1.1.1.261 functions were analyzed with HMMs of structural recognition. All sequence entries were indeed assigned the e.22.1 FSF. Two entries had also the HAD-like domain (c.108.1). While this analysis confirms the original structural assingment [79], we find the G1PDH FSF was not part of any of the urancestral FSF sets. This strongly suggests isoprenoid ether lipids derived from G1P were not present in the urancestor. Instead, and as suggested by Glansdorff et al. [12], sn1,2 isoprenoid ether lipids were probably synthetized by the activity of the urancestral GGGPS, which at that time could have exhibited a preference for G3P. Recruitment of a G1PDH in emerging archaeal lineages displaced the possible use of G3P by GGGPS and enabled the synthesis of sn2,3 isoprenoid ether lipids using the G1P backbone. It is significant that the molecular clock of FSFs established that the GGGPS c.1.4 FSF appeared 3.27 Ga ago and that the G1PDH e.22.1 FSF appeared 2.45 Ga ago at a time that coincides with the Great Oxidation Event (arrows, Figure 8). In this regard, cyclic and acyclic phytanes and biphytanes, which are present in sediments and petroleum in 2.7-Ga-old metasedimentary rocks in several parts of the world [80, 81] and are biomarkers of methanotrophic pelagic microbes, are thought to derive from archaeol and caldarchaeol molecules of Archaea [80]. The existense of GGGPS prior to this date supports the existence of a primordial CDP-archaeol biosynthetic pathway and of ether and ester membrane lipids in the urancestor, prior to the loss of the first FSF in a superkingdom (Archaea) that marks the start of the tripartite world and the earliest date of organismal diversification (~2.9 Ga ago; Figure 8). The synthesis of an enantiomeric alternative of G3P in primordial archaeal lineages, 800 million years later and during late planet oxygenation, provided the proper molecular chirality and backbone necessary for cell membrane diversification. Our findings are important and support the proposal that the urancestor had sn1,2 ester and ether fatty acid lipids and that discovery and recruitment of new enzymatic activities resulted in the synthesis of sn3,4 isoprenoid ether lipids and the emergence of thermophilic archaeal lineages [12]. Under this scenario, the ester and ether lipids in the urancestor provided already adaptations to adverse conditions (high temperature, pressure, etc), a trend that was later exploited by the emerging Archaeal lineages in a primordial quest towards extremophily.
Comments
View archived comments (1)