Dating when new metabolisms evolved and when major clades of Bacteria arose, particularly on the order of hundreds of millions of years, remains a key challenge in biology [1]. Despite progress in understanding the molecular record of extant bacterial genomes, the timing of the evolution of major clades of Bacteria is especially problematic to resolve due to complex gene histories and a lack of clear phenotypic traits that can be correlated with a diagnostic fossil record [2]. In the near-absence of physical (geochemical or fossil) records of microbial evolution, it is difficult to determine and date the evolutionary history of bacterial lineages [3].
Leveraging the information contained in horizontal gene transfer (HGT) events can substantially improve estimates of the timing of events within microbial evolution [4,5,6,7,8,9]. Vertical inheritance passes genetic information from parent to offspring, but HGT passes genetic information between organismal lineages, across all degrees of evolutionary distance. This can be particularly useful for molecular clock dating, as HGTs establish cross-cutting relationships between lineages and serve as a “temporal scaffold” upon which fossil calibrations or other date information from even distantly related taxa may be placed [5, 8, 10]. While HGT is a major process in microbial evolution [7, 11], HGT events between microbes and eukaryotes with a fossil record are less frequently identified [12]. Furthermore, the donor-recipient relationships are often difficult to infer for many gene histories due to multiple HGT events and gene losses or the lack of a strong phylogenetic signal [13]. The function of a gene is not necessarily relevant to its utility in propagating time constraints (e.g. [4]); however, in some cases, this gene function may be additionally informative, and provide independent support for age estimates. This is the case, for example, if the protein encoded by the transferred gene is specific for a substrate that can, itself, be temporally constrained. Given all of these criteria, a very small number of HGT events may be especially valuable for dating microbial lineages; these “index transfers” [9] can be even more valuable if multiple HGT recipients are present, closely correlating the ages of the recipients in time, a “standard candle” (a term used in astronomy to describe an object with known luminosity used to infer the cosmic distances to other objects of interest) [14].
Environmental distribution of chitin
Chitin is one of the most abundant structural polysaccharides in nature [15, 16], and chitin degradation by chitinase enzymes is a critical process in the biogeochemical cycling of carbon and nitrogen in terrestrial and aquatic ecosystems [15]. There are two dominant biogenic sources of chitin: arthropods [16] and fungi [16]. Chitin may therefore have increased in abundance in terrestrial systems following the terrestrialization of arthropods, sometime after the Cambrian [17]. In modern aquatic systems, arthropods are the dominant chitin-producing organisms. While there is a great deal of uncertainty in these estimates, the chitin sourced from arthropods is roughly 2.8 × 107 Mg yr− 1 in freshwater ecosystems and 1.3 × 109 Mg yr− 1 in marine ecosystems [18]. The majority of chitin in terrestrial ecosystems is produced by fungi [19] largely due to their contribution of biomass to the soil environment [20]. While global estimates for the contribution of arthropod biomass, and thus chitin, to the environment over time are lacking, arthropods nonetheless make up the largest pool of animal biomass today [21].
Chitin production and the evolution of Fungi
The evolution of chitin producers is anchored to the fossil record through diagnostic morphological characters [22,23,24,25,26]. In the case of Fungi, Cryptomycota form the most deeply branching fungal clade, and contain the most deeply branching chitinous Fungi (e.g., Rozella) [24, 25]. Fossil-calibrated molecular clock studies generally agree that early Fungi diverged around 1145–738 million years ago (Ma) [27]. Fossil and molecular clock evidence also indicates that divergence of Ascomycota and Basidiomycota within the major fungal group Dikarya occurred around 830–518 Ma [24] with a fossil minimum around 405 Ma [23, 28, 29]. Posterior age estimates from molecular clock studies suggest that crown Ascomycota diversified 715–408 Ma [30] and crown Basidiomycota diversified 655–400 Ma [28]. Therefore, studies of fungal evolution can inform the timing of chitinase gene evolution.
Based on fossil and molecular clock dating methods, marine crown-group euarthropods appeared around 521–514 Ma, shortly after the start of the Cambrian, and radiated into the lower and middle Cambrian [29, 31]. Molecular clock and fossil evidence suggests that terrestrialization of major arthropod groups occurred from the Cambrian into the Silurian [32]. The oldest terrestrial myriapod body fossil (the oldest undisputedly terrestrial animal) is the 416 Ma Crussolum sp. [29]. However, the radiation of terrestrial arthropods (including insects) likely continued into the Devonian [17, 33, 34].
The evolution of chitinase gene families
Chitinases are proteins that catalyze the breakdown of glycosidic linkages in polymers of chitin [16]. Chitinases are a type of glycoside hydrolase (GH) specific to chitin [16, 35]. There are two main families of chitinases: glycoside hydrolase family 18 (GH18) and glycoside hydrolase 19 (GH19) [16]. GH18 chitinases are distributed across the three domains of life [16, 36], whereas GH19 chitinases are restricted to mostly plants and are rarely associated with bacteria [36]. In one well-studied bacterial model, Streptomyces, there were ten genes associated with the GH18 family of chitinases (homologs chiA-E, and H- L) and two genes associated with GH19 (chiF, G) [37]. It has been suggested that some of these genes may have evolved under selective pressures related to the host environment or to the presence and proximity to other organisms, which may have even precipitated HGT events [37,38,39]. Myxobacterial chitinases have been hypothesized to have evolved via HGT [40], and other bacterial lineages within Actinobacteria are hypothesized to have co-opted a fungal chitinase for self-defense [37]. Because of the specific associations between substrate and gene, it stands to reason that there may be an evolutionary link between the major producers of environmental chitin (fungi and arthropods) and chitin-degrading genes in bacteria. It has been shown that some bacterial chitin degradation systems are even adapted to the environments (aquatic vs. terrestrial) and most abundant chitin producers (exoskeletons of crustaceans vs. fungal cell walls) that they encounter [15]. Nonetheless, it remains to be tested whether chitinase genes also reflect widespread environmental adaptations over geological time.
It has been shown that chitinases may retain a molecular record of evolutionary events hundreds of millions of years ago [41]. While some of the phylogenetic distribution of these genes may indicate a pattern of vertical inheritance, other chitinase genes may have evolved via horizontal gene transfer [37]. For these reasons, and the criteria described above, chitinase genes are an attractive potential source of temporal information for microbial evolution. Therefore, we sought to test the hypothesis that specific bacterial chitinases evolved via HGT, and if so, if these HGT events could be leveraged to propagate known fossil calibrations between donor and recipient lineages. Bacterial chitinases are especially useful because they metabolize chitin, a specific biopolymer only produced in abundance by arthropods and fungi, two groups with fossil records, and thus likely age estimates, much more precise than those of most microbial groups. Previous work has also suggested that some chitinases are distributed between the domains of life via HGT, for example, postulating that some chitinase genes were transferred from plants to Actinobacteria and then to arthropods [42]. However, the evolutionary history of the many disparate chitinase gene families in microbes has not been fully investigated.
Bayesian molecular dating
Fossil-calibrated molecular clock models are applied to estimate divergence times of organisms (e.g. [3, 43]). Many divergence time analysis parameters have only been recently developed, and few have been applied to microbes with divergence time estimates that span geologic time or have undergone rampant horizontal gene transfer events (e.g. [8, 44]). For a more detailed review of these parameters and challenges see, for example, [43, 45,46,47,48,49,50,51,52,53]. The issues inherent to assessing microbial evolution present a challenge for this work, but also an opportunity to explicitly test these model parameters and assumptions in order to determine those that are valid for this specific set of evolutionary conditions.
Molecular clock dating is based on a Bayesian framework, reviewed in greater detail by others [51, 52, 54]. There are a few major components used to determine posterior probabilities or date distributions such as data selection, calibrations, the molecular clock model, the tree process prior, and the rate distribution model. The sequence data assessed in this work are the chitinase genes present in bacterial and eukaryotic lineages. Tree process priors include birth-death and uniform. Rate distribution models include lognormal autocorrelated and uncorrelated gamma.
We tested the uniform prior and the birth-death tree process priors. The uniform prior considers every possible topology to be equal and favors divergences that are evenly spaced across the tree from the root to tip [55, 56]. The birth-death model is defined by speciation (“birth”) and extinction (“death”). In contrast to the uniform prior, this tree process ascribes more weight to tree topologies with certain branching patterns [57]. The birth-death process generally biases the model such that deeper branches are longer and the more shallow branches are shorter, because it is assumed the “older” lineages more often end in extinction [52]. Biases such as this can have large effects on the posterior age estimates and inappropriate model selection can result in less precise dates.
All models in this study assume a relaxed molecular clock model for a prior on the branch rate. However, two relaxed clock models for the branch rates are assessed: autocorrelated and uncorrelated. Uncorrelated clocks make no assumption that branches next to each other on the tree should share similar rates. In other words, the rate on each branch of the tree is independent. Conversely, autocorrelated clocks assume that more closely related branches on the tree should also have more similar rates [46, 56, 58, 59]. The assumption that neighboring branches should share more similar rates makes sense when we consider that the evolution of genetic information between related lineages is often affected by many of the same processes that affect the rates of evolution (e.g. environment, population) [52]. Biological events such as horizontal gene transfer may invalidate model assumptions, but the mechanisms of rate variation and quantifying the relative importance of various biological events are still debated [1]. Choosing between these models is a matter of ongoing debate in the field, and is often dependent on the data [52, 56, 60]. Thus, we detail the effects of model selection in our analyses.
The primary objective of this work is to test whether fossil-calibrated age estimates within fungi can be propagated to bacterial lineages through the use of HGT events between these lineages under different model assumptions. Secondarily, we seek to understand possible ecological implications of the evolution of chitinases in fungi and bacteria. If bacterial chitinase genes were acquired in response to environmental chitin availability, then arthropod evolutionary history provides a prediction for the timing of these events within bacterial lineages. We hypothesize that terrestrial bacterial chitinases diversified from the Cambrian into the Devonian following the distribution of environmental chitin. We independently date chitinase evolution in microbial lineages by first testing and then applying molecular clock models to chitinase gene trees, constrained by fungal date calibrations tethered via HGT. We show that certain model parameters seem to outperform others. Moreover, our posterior date distributions for bacterial lineages support the utility of HGT-propagated fossil calibrations in accurately estimating the ages of microbial lineages as an avenue for future work.