 Methodology article
 Open Access
 Published:
A Bayesian framework to estimate diversification rates and their variation through time and space
BMC Evolutionary Biology volume 11, Article number: 311 (2011)
Abstract
Background
Patterns of species diversity are the result of speciation and extinction processes, and molecular phylogenetic data can provide valuable information to derive their variability through time and across clades. Bayesian Markov chain Monte Carlo methods offer a promising framework to incorporate phylogenetic uncertainty when estimating rates of diversification.
Results
We introduce a new approach to estimate diversification rates in a Bayesian framework over a distribution of trees under various constant and variable rate birthdeath and purebirth models, and test it on simulated phylogenies. Furthermore, speciation and extinction rates and their posterior credibility intervals can be estimated while accounting for nonrandom taxon sampling. The framework is particularly suitable for hypothesis testing using Bayes factors, as we demonstrate analyzing dated phylogenies of Chondrostoma (Cyprinidae) and Lupinus (Fabaceae). In addition, we develop a model that extends the rate estimation to a metaanalysis framework in which different data sets are combined in a single analysis to detect general temporal and spatial trends in diversification.
Conclusions
Our approach provides a flexible framework for the estimation of diversification parameters and hypothesis testing while simultaneously accounting for uncertainties in the divergence times and incomplete taxon sampling.
Background
Patterns of species diversity have been shaped by both speciation and extinction throughout the history of life, and one of the key questions in evolutionary biology is to understand the temporal and spatial dynamics of these processes [1–6]. In addition to the fossil record, molecular phylogenetic data of extant lineages can provide valuable information on the process of diversification in form of branch length and the distribution of divergence times throughout the evolutionary history of a clade. Despite the omission of extinct lineages, it has been shown that differential patterns of speciation and extinction can leave a discernible signature on phylogenetic trees of extant taxa [7, 8]. Methodological advances [9–14] as well as the growing number of well sampled, dated molecular phylogenies have generated considerable interest in unraveling the temporal dynamics of species diversification. Indeed, diversification rates have been assessed for a wide range of taxa from the tree of life to address questions concerning rapid radiations [15–18], mass extinction events [19], and differences among lineages [20, 21] and geographic regions [11, 22, 23]. In particular, the identification of potential correlates of speciation and/or extinction rates, either extrinsic [e.g. climate or ecology; [24]] and/or intrinsic [e.g. key innovations; [25–28]], has received increased attention by relating rate shifts to external conditions or the evolution of species' traits.
Nee at al. [9] first applied the generalized birthdeath process [29] to molecular phylogenies of extant lineages to extract information on the evolutionary process and proposed a likelihood approach to estimate both speciation and extinction rates (λ and μ, respectively). Given the need to distinguish different modes of diversification (e.g. deviations from constant rates), further approaches have been developed to incorporate rate variation through time [8, 12, 13, 30] and across clades [14, 31]. In addition, the original birthdeath process was modified to correct rate estimates in case of incomplete taxon sampling [32–35].
While Bayesian Markov chain Monte Carlo (MCMC) methods are now commonly employed in phylogenetics to accommodate for uncertainties in model parameters, the temporal uncertainty of node age estimates is usually not taken into account when studying the dynamics of species diversification, resulting in an erroneous impression of precision. Here, we present a novel MCMC approach to estimate rates of speciation and extinction over the posterior distribution of trees generated in Bayesian molecular clock analyses. Several models of diversification have been implemented, including the constant rate birthdeath and purebirth processes modified to account for incomplete taxon sampling [33], a birthdeath process with continuously varying rates [8], and a purebirth process with rate shifts [12], for which the posterior distribution of λ and the temporal position of each rate shift are jointly estimated. Within the Bayesian framework, we describe a metaanalysis approach that aims at evaluating general patterns of species diversification across different taxonomic groups. In addition to the estimation of rate parameters, the approach presented here can also be used to distinguish between different modes of diversification, and test explicit hypotheses of rate variation through time and between clades using Bayes factors. We assess the power of the MCMC and of the Bayes factor test using simulated data sets, and demonstrate the application on empirical data sets: rate variation through time in the diversification of Mediterranean cyprinid genus Chondrostoma, geographic patterns in the radiation of the genus Lupinus, and a metaanalysis of four clades from the Cape flora of South Africa.
Results
Bayesian rate estimation across phylogenies
The birthdeath process was implemented in a Markov chain Monte Carlo framework to estimate the parameters of species diversification (speciation and extinction rate) while accounting for phylogenetic uncertainty. Several modifications of the birthdeath process originally described by Nee et al. [9] were implemented in the MCMC algorithm to describe different patterns of diversification and allow model selection and hypothesis testing. We included a modification of the birthdeath process that accounts for incomplete taxon sampling based on Yang and Rannala [33] and Stadler [34] in which the fraction of the sampled species out of the total diversity (ρ) is used to correct the estimate of the diversification parameters. Although the missing species are assumed to be randomly distributed within the phylogeny, unlike in other models [e.g. [35]], we incorporate an option to assign different sampling fractions to predefined clades (see partitioned models below). In addition, Rabosky and Lovette's [8] SPVAR model was implemented to analyze the commonly observed pattern of "explosiveearly" radiations, in which clades show an initial burst of diversification followed by a gradually declining speciation rate. Another model that accounts for rate variation through time is a purebirth process in which a fixed number of shifts in diversification rate is assumed [12]. In contrast to the continuously varying birthdeath process, the rate is assumed to vary only at specific times and otherwise remain constant. For a given number of rate shifts, the MCMC estimates their temporal position and the respective rates. Finally, our approach can be used to assign independent rates to predefined clades and is especially intended for hypothesis testing, complementing other approaches in which the rate constancy acrossclades is relaxed [31], or in which the number and position of the rate shifts on the tree are estimated [14]. The data set is partitioned a priori by defining clades of interest, based for example on morphology or biogeography, and independent rates, models, and sampling proportions can be assigned to each clade.
Fmodel: A metaanalysis approach
We develop a new method to investigate the strength and significance of general patterns of species diversification across different taxonomic groups through time or between clades in a metaanalysis framework. Within a collection of N data sets d_{ 1 }, d_{ 2 }, ..., d_{ N } (e.g. phylogenies of different taxonomic groups), each data set d_{ i } is partitioned a priori into two time frames or clades d_{ i } ^{(p) }and d_{ i } ^{(q)}. The definition of these partitions can be based on criteria that are applicable to all phylogenies analyzed e.g. geologic events or geographic distribution. Their respective speciation rates λ_{ i }^{(p) }and λ_{ i }^{(q) }are described as a function of a multiplier m_{ i } and a parameter F so that:
where m_{ i } = 1/2(λ_{ i }^{(p) }+ λ_{ i }^{(q)}) represents a taxon specific mean rate that is assumed independent for each data set d_{ i } , and F = λ^{(p)}/λ^{(q) }is constrained to be equal for all data sets and quantifies the overall magnitude of the rate difference between the two partitions. Based on these definitions, the rates are equal when F = 1, whereas λ^{(p) }> λ^{(q) }with F greater than 1, and λ^{(p) }< λ^{(q) }with F smaller than 1. We use MCMC sampling to obtain posterior estimates of the parameters m_{ 1 } , m_{ 2 } , ..., m_{ N } and F from the joint likelihood of all data sets L_{ D }
Proposals for the parameters m are sampled from normal distributions centered on their current values, whereas new values of the F parameter are obtained from a lognormal distribution to achieve a symmetric proposal distribution in log(F). Reflection at the boundary was used to avoid proposals outside of the valid range (e.g. m ≤ 0). Uniform priors are assigned to m in range [0, 10] and to log(F) in range [2.3, 2.3], which corresponds to an F value in range [0.1, 10]. The cladespecific Fmodel can be extended to a birthdeath process by assigning the parameter m to the mean net diversification (r) and introducing a second parameter n for the mean extinction fraction (a). Consequently, two parameters F_{ r } and F_{ a } are defined to measure the overall variation of r and a between clades of each data set. The significance of an overall rate difference across partitions is assessed via Bayes factor between a model in which F is allowed to vary and a model with constrained F = 1 (i.e. equal rates across partitions).
Model selection using Bayes factor
Our analyses on simulated data sets show that the power of the Bayes factor test (BF) in finding the correct model is generally very high and not particularly affected by the model settings. Bayes factors were calculated between the model used to simulate each data set and a range of possible alternative models (Table 1) based on their respective marginal likelihoods (Additional file 1) obtained through thermodynamic integration [36–38]. Positive Bayes factors (Table 1) allow to correctly distinguish between diversification models in the majority of the simulations even in data sets with very low taxon sampling. Only when the extinction fraction is low (10%), the pure birth model obtains a slightly higher marginal likelihood than the birthdeath. With variable rate purebirth models, the number of rate shifts is correctly estimated when the magnitude of rate variation is moderate (fivefold) or higher. The effect of extinction and an increase in speciation rate in absence of extinction, both resulting in a similar pattern of increasing net diversification through time, can be distinguished with intermediate to high extinction fraction (> 50%) or a moderate (> twofold) rate increase. Furthermore, the power of Bayes factors in model selection improves with the size of the phylogeny (Table 1). For instance in case of a small (twofold) increase in the diversification rate, the correct model is found only on larger phylogenies (100 taxa).
Rate estimation on simulated phylogenies
Analyses on simulated data sets indicate that the MCMC has a rather short burnin phase and achieves a good chain mixing (measured as Effective Sample Size) with 110, 000 generations and a sampling frequency of 100. The posterior estimates of the speciation rate under the different models of diversification are found to be accurate with a relative error generally below 10% (Tables 2, 3, and 4). The relative error drops below 5% in data sets with 100 or more taxa, indicating that the size of the phylogeny has an impact on the accuracy of the parameter estimation. In addition the width of the rates' credibility intervals decreases with increasing size of the phylogeny: The HPDs are on average 25% narrower with 100 taxa compared to 50, and further reduce by another 50% with 400 tips (Additional file 2). In addition, about one third of the width of the 95% HPD is due to accounting for phylogenetic uncertainty (Figure 1).
The constant rate birthdeath model yields accurate posterior estimates of the speciation rate λ (Table 2) and efficient measures of the extinction rate are obtained when the extinction fraction is high (a = 0.9). The accuracy of the estimate, however, decreases substantially when the extinction is low (a = 0.1). This is likely due to the MCMC sampling, which is constrained by the fact that μ cannot become negative [32]. This results in a strongly skewed posterior distribution for which the mean is a poor estimator; a more accurate estimate is in this case provided by the mode. The 95% credibility interval of the posterior rates is always wide for μ (0  0.47 with μ = 0.05, a = 0.1, and 0.19  0.71 with μ = 0.45, a = 0.9; Additional file 2). While the sampling proportion does not significantly affect the accuracy in the estimation of λ and μ (Table 2), it has a strong impact on the width of the credibility intervals. The size of the 95% HPD increases by 20, 30, and 50 percent with ρ = 0.75, 0.5, and 0.25, respectively, compared to the complete data set (Figure 2).
The model with continuously decreasing diversification rates (SPVAR) yields accurate estimates of the parameter k (which determines the magnitude of the temporal decrease of the speciation rate; Table 3). On the other hand, the estimated initial speciation rate λ_{0} tends to be overestimated when k is small (with relative errors between 0.2 and 0.3), and underestimated for higher k values.
Estimates of the speciation rates in data sets with rateshifts were found to be accurate with a relative error on average lower than 0.1 (Table 4). The marginal rates estimated within 1 Myr time frames, reflect the rate variation through time (Figure 3). For time frames in which a rate shift occurs, the marginal rate is often represented by a bimodal distribution, which reflects the uncertainty on the temporal placement of the shift and results in an intermediate rate estimate with a wider 95% credibility interval (Figure 3A). This uncertainty is reflected in the frequency distribution of the rate shift in the posterior sample (Figure 3B). A highly accurate estimate of the time of rate shift is provided by the modal value of its sampling frequency, with relative errors lower than 0.05 (Table 4).
Contrasting times of rate shift: Diversification of Mediterranean cyprinids
The use of a purebirth process with rate shift and its implementation in hypothesis testing are illustrated in an analysis of the cyprinid genus Chondrostoma (Teleostei: Cyprinidae). A recently published molecular phylogeny of the genus [39] places the origin of the present lineages in the midMiocene around 15 Mya. Two alternative hypotheses on the diversification of Chondrostoma have been proposed, placing its radiation in the Mediterranean region either during the Messinian salinity crisis [40] or earlier in the Miocene [41]. The comparison of different models of diversification using Bayes factor tests led to the selection of a tworate purebirth process and estimated a fourfold decrease in speciation rates (dropping from an initial 0.441 to 0.108), and indicating a substantial slowdown during the Miocene. We used alternative tworate purebirth models to specifically test the fit of a rate shift during the Messinian [5.33  7.25 Mya; [40]] or earlier in the Miocene [7.25  23.03 Mya; [41]]. The rate shift was constrained in two separate analyses to lie within those periods, and the two models were compared by approximating a Bayes factor. Robalo et al. [42] favored the latter hypothesis based on their molecular clock analysis, although without specifically testing it in a statistical framework. Our analysis suggests that the Messinian Lago Mare phase had no particular effect on the radiation of the genus Chondrostoma (BF = 3.14), as a significant decrease in the speciation rate has to be placed before that period, thus supporting Robalo et al.'s conclusion.
Cladespecific analysis: geographic patterns in the radiation of Lupinus(Fabaceae)
We demonstrate the application of models in which the rates can vary between predefined clades by analyzing the geographic patterns of diversification in Lupinus (Fabaceae) [16]. The phylogeny of the genus Lupinus shows a strong geographic structure which we used to define four partitions: I) an early diverging Old World clade, II) a group of eastern New World taxa, III) a clade occurring mainly in western North America and Central America, and IV) a clade including most of the Andean species (Figure 4). The latter displays a radiation from which around 80 species arose in the past 1.5 million years. While the model with equal rates among clades was strongly rejected in favor of variable rate models (Table 5), the highest marginal likelihood was assigned to a model with three different rates assigned to the clades I, II+III, and IV, respectively. The posterior distributions of the diversification rates are plotted as relative densities (Figure 4), showing a fourfold variation in speciation rate between the Old World Lupinus (λ_{I} = 0.191) and the nonAndean New World lineages (λ_{II+III} = 0.687). The Andean clade, as described by Hughes and Eastwood [16], represents an explosive radiation with a posterior rate estimate of λ_{IV} = 2.510.
A metaanalysis approach: Diversification of the Cape flora, South Africa
The high and unique plant diversity of the Cape Floristic Region (CFR) of South Africa is the result of an extraordinary contribution of lineages, that radiated extensively in the CFR [so called 'Cape floral clades'; [43]]. The high diversity and endemism suggest that Cape clades may have diversified at a faster rate within the CFR than elsewhere. Valente et al. [23] however recently showed that in the genus Protea, diversification rates in the Cape were, if anything, lower than in neighboring regions. To test for a general rate difference between Cape and nonCape clades, we analyzed four data sets [23, 44], all containing clades distributed within and outside the CFR, representing in total 537 plant species. The posterior rates estimated for the Cape/nonCape clades of each individual data set are: Babiana 0.500/0.556, Moraea 0.259/0.288, Podalyrieae 0.150/0.167, and Protea 0.195/0.218 (Figure 5B). The F parameter is estimated as 1.118 (95% HPD 0.86  1.39), indicating that diversification rates are overall slightly higher outside of the Cape (Figure 5A) region. However a Bayes factor of 4.78 between the constrained Fmodel with F = 1 (i.e. no rate difference between clades) and the unconstrained model suggests that this difference is not significant and that equal rates should be preferred. These results indicate an overall rate uniformity between Cape and nonCape clades based on the four data sets analyzed, suggesting that the great diversity in the CFR might not be the result of a faster diversification process. It should be noted however that, as pointed out by Valente et al. [23], species ranges in these regions are vastly different, indicating that the key to understanding the Cape biodiversity hotspot instead lies in understanding why so many lineages have speciated and persisted in such a small area.
Implementation
The method described has been implemented in a computer program called "BayesRate" (available at http://sourceforge.net/projects/bayesrate/ or from the authors) written in Python [45] (based on the Numpy [46] and Scipy [47] libraries) and R [48], integrating codes using the Python module rpy2 [49]. The thermodynamic integration supports multicore computation by simultaneously running individual Markov chains on different processors. The log files can be examined with the program Tracer [50] to check for efficiency of the sampling (ESS), and convergence between independent runs.
Discussion
We presented a new Bayesian approach, which provides a powerful tool to estimate rates of speciation and extinction on dated phylogenies based on the likelihood functions of the purebirth, and birthdeath processes while accounting for phylogenetic uncertainty and incomplete taxon sampling. On empirical data, our rate estimation requires a two steps analysis: 1) sampling a posterior distribution of dated phylogenies using a Bayesian molecular clock approach, and 2) estimating posterior diversification rates on these trees. Available programs such as BEAST [51] and mcmctree [52] apply in their relaxed molecular clock implementation different birthdeath processes as priors on the node ages [53, 54]. When applied to phylogenies obtained under these assumptions, our approach therefore requires that priors on the diversification parameters are specified twice, while ideally divergence times and diversification rates should be estimated jointly. The majority of the diversification processes considered here are however currently not implemented in these programs, and thus need to be estimated independently. Analyses on simulated data show that the default uniform priors on net diversification and extinction fraction in BEAST do not affect the subsequent rate estimates (relative rate variations lower than 3%; Additional file 3). Alternatively, researchers could choose to run molecular clock analyses in which the prior on the node ages is not based on a birthdeath process, but modeled using uniform or Dirichlet distributions [55, 56], as implemented in e.g. Multidivtime [57] and PhyloBayes [58].
The range of models implemented can be used to detect a number of different scenarios of rate variation, including specific events of rate increase or decrease, continuous rate variation through time, and cladespecific diversification rates, while simultaneously accounting for taxon sampling. In addition, the Bayesian framework extends the use of the birthdeath models beyond the simple rate estimation allowing the comparison of alternative scenarios of diversification for hypothesis testing.
The Bayes factor test computed via thermodynamic integration has shown to represent a reliable and powerful approach to choose among different models of diversification. BF can be applied to compare nonnested models and does not require to be explicitly corrected for the number of model parameters. For these properties it is particularly suitable not only to select the best model in a Bayesian framework, but also to compare specific hypotheses. The power of Bayes factors to detect the correct number of rates predominantly depends on the magnitude of the rate shifts. Similarly, different birthdeath and purebirth processes can generate diversification patterns that might be difficult to distinguish [4, 8]. For instance, an increase in the net diversification rate can be the result of an increased speciation rate in the absence of extinction or a high extinction rate in a constant rate birthdeath process. Nevertheless, we found that the Bayes factors test has the power to discern between most of such scenarios.
Analyses on simulated data show that for both speciation and extinction rates the posterior estimates are accurate. However, the width of the 95% HPD intervals also highlights the sometimes considerable uncertainty in the parameter estimates, especially in case of small phylogenies. Our simulations have shown that this uncertainty is most pronounced with a low relative extinction rate, in which case extinction tends to be overestimated [see also [32]], and estimates have a wide 95% credibility interval. This corroborates previous studies that pointed out that the estimation of extinction rates from molecular phylogenies with reasonable degrees of confidence is very problematic [7, 12, 59, 60]. It should be noted, however, that the wide credibility intervals reflect not only the uncertainties of the parameter estimation, but also the uncertainty of the data (i.e. the node ages). Estimates of the speciation rate on the other hand appear to be more robust. In contrast to Paradis [60], we find that the posterior estimate of λ has a small relative error, even under high relative extinction rates, suggesting that the accuracy in the estimates of λ might be decoupled from the relative rate of extinction. The purebirth and birthdeath models with taxon sampling are found to provide accurate rate estimates, although a poor sampling yields substantially wider 95% HPDs. Finally, the purebirth process with rateshifts tends to slightly underestimate the true variation of λ, as a consequence of accounting for the uncertainty of the time of rate shift.
The approach provides a very flexible framework for customized analyses and hypothesis testing such as predefined times of rate shift or constrained parameter values. We have shown with the diversification of Chondrostoma, that the purebirth model with variable rates can be easily adapted to test for specific hypotheses, running the analysis on fixed time frames defined for example on the basis of geological events or climate changes. The implementation of cladespecific rate estimation further extends the range of options for hypothesis testing, and its application on the radiation of Lupinus showed that it can be used to identify differential rates of diversification between clades. In particular, the option to account for cladespecific sampling biases provides an important feature, as complete taxon sampling is often difficult to achieve, especially for speciesrich groups.
Finally, with the Fmodel, we introduce a new approach to test hypotheses in a metaanalysis framework, and extend the focus from the taxonspecific rate of diversification to a parameter that might be linked to the difference between e.g. geologic periods or geographic regions. The current implementation allows to compare hypotheses with two rates, assigned to either fixed time frames or clades, while accounting for cladespecific taxon sampling. Because the F parameter is constant across data sets, we assume that the magnitude of the rate variation in time or between clades is equal among all data sets. While this certainly represents a simplification of the diversification process, the Fmodel allows the analyses of potentially many data sets, limiting the number of parameters, and yielding an estimation of general trends across different taxonomic groups. Moreover, even relatively small rate variations can be detected if supported by a sufficient number of data sets.
Conclusions
In summary, the approach presented here shows that temporal dynamics of species diversification resulting from biologically relevant events such as key innovations or the impact of environmental change should best be studied in a Bayesian framework. The use of MCMC sampling provides an elegant way to estimate speciation and extinction rates while taking into account the often considerable uncertainty on divergence times. Furthermore, the model with taxon sampling represents an important step towards a more realistic estimation of the diversification parameters, where a nonrandom distribution of missing taxa can be incorporated with cladespecific sampling proportions. In addition to the models implemented in this study, recently developed modifications of the birthdeath process [13, 61] could also be integrated in the algorithm. With the possibility to run customized analyses specifically designed for hypothesis testing, this method provides a useful and flexible statistical framework to investigate diversification processes. A promising future development would be to relax the Fmodel to incorporate more than two rates, by assigning a specific multiplier to each time frame or group of clades.
Methods
Bayesian estimation of the diversification parameters
The likelihood of a birthdeath process (BD) describing the speciation and extinction events of a dated phylogeny can be written as a function of the branching times x, the number of extant species s, and the speciation and extinction rates λ and μ, respectively [9]:
The function reduces to a purebirth process (PB) in the absence of extinction (μ = 0).
We implemented this likelihood function in a Markov Chain Monte Carlo framework and applied the MetropolisHastings algorithm [62, 63] to sample the posterior distribution of the birthdeath model parameters. The algorithm is structured as follows:

1.
Assign initial values to the model parameters (e.g. λ, μ)

2.
Sample new r, a values (from which new λ, μ are obtained)

3.
Accept or reject the proposal based on the acceptance probability

4.
Repeat steps 2 and 3 many times

5.
Repeat steps 1 to 4 over different trees sampled from their posterior distribution

6.
Summarize the MCMC over all sampled trees by calculating mean and credibility interval for each parameter of interest
The MCMC iteration starts with random parameter values and successive proposals for λ and μ (step 2) are based on the sampling strategy described by Bokma [32], randomly drawing values of r = λ  μ (net diversification) and a = μ/λ (extinction fraction) from normal distributions centered on their current values. To avoid proposals lying outside of the valid interval (e.g. negative values) we use reflection at the boundary. The acceptance probability is proportional to the likelihood ratio, and uniform distributions are applied as flat priors on the rates. The MCMC is run over a distribution of trees, sampling λ and μ on each tree individually after a burnin phase (step 5) and the parameters of interest are summarized over all trees to account for the uncertainty on the node ages (step 6). The means of the posterior distributions of λ and μ are used as rate estimates and the respective credibility intervals are calculated as the 95% highest posterior density (HPD) intervals.
Assuming that a number of species are missing in a phylogeny, the missing lineages can be modeled as the result of an extinction event that occurs exactly at the present time [33]. Thus, when only a subset s of the total species S is included in the phylogeny, the likelihood of a set of branching times becomes a function of the proportion of sampled species ρ = s/S. This model assumes that taxon sampling is random with respect to the phylogeny. However, in case of a nonrandom sampling bias, individual clades in the phylogeny are represented to different extents. Thus, purebirth and birthdeath models were implemented in the MCMC framework with the possibility to assign a different sampling proportion (ρ) to each clade.
An approach to measure the variation of speciation and extinction rates through time has been introduced by Rabosky and Lovette [8], to model high initial rates of diversification followed by gradually declining net diversification rates. Their maximum likelihood method uses an exponential transformation of λ and μ through time with the introduction of two additional parameters, namely k and z, which specify the magnitude of λ decrease and μ increase, respectively:
where λ_{0} is the initial speciation rate, and μ_{0} the final extinction rate. A constant speciation rate is found with k = 0, whereas the extinction tends to be constant when z is very large. We implement Rabosky and Lovette's [8] SPVAR model (where speciation rates decrease through time while extinction rates remain constant) by applying a uniform prior in range [0, 10] for k and setting z to 10, 000. The parameters sampled by the algorithm are λ_{0}, μ, and k.
The assumption of a purebirth process (μ = 0) simplifies equation (3) as described by Kendall [29] and Nee et al. [9], and a likelihoodbased approach has been described to detect shifts in diversification rates through time [12]. We implement this variable rate purebirth model in which, given a number of rate shifts n, the estimated parameters are the temporal position of the shifts s= s_{ 1 }, s_{ 2 }, ..., s_{ n } , and the corresponding rates λ = λ_{ 1 }, λ_{ 2 }, ..., λ_{ n+1 }. Proposals for λ and s are sampled from normal distributions centered on their current values. The likelihood ratio is based on the product of the likelihoods of the branching times x_{ i } within each time frame delimited by s_{ i1 }, s_{ i } under the rate λ_{ i }:
A uniform prior from 0 to the root age is assumed for the temporal position of the rate shift s. Because the temporal position of the rateshift is not fixed but estimated through the MCMC sampling, we summarize the marginal rates as mean value and 95% credibility interval within predefined time frames (e.g. 1 Myr intervals) from the root age to the tips of the trees and use these estimates to draw ratesthroughtime plots (RTT). Thus, the marginal rates reflect the uncertainty on the time of rate shift. The sampling frequencies of the rate shifts through time are used to infer the temporal placement of the rate variation events. We identify the time of rate shift by finding the modal value of the frequency distribution of each parameter s_{ i } , representing the most frequently sampled value and approximating the maximumaposteriori estimate (MAP).
When testing for rate differences across predefined clades, the joint likelihood of all clades C is used to estimate the posterior distribution of the speciation and extinction rates (λ_{c}, μ_{c}) of each individual clade c. The parameters λ_{c} and μ_{c} can be constrained to be equal among clades (linked model) or estimated independently (unlinked model). A model comparison between linked and unlinked parameters is performed using Bayes factors (see below) to assess the significance of the rate difference between clades.
Model selection: Bayes factors via thermodynamic integration
The fit of different models of diversification was assessed by comparing their respective marginal likelihoods, which are defined as the probability of the data D conditional on the model M, p(DM). Alternative models can be compared using the Bayes factor test, which is defined as the ratio between their respective marginal likelihoods [64, 65]. Calculating the marginal likelihood involves the integration of the probability of the data over the entire parameter space Θ:
Several approaches have been described to approximate L_{ M } . One simple approximation of L_{ M } is obtained as the harmonic mean of the likelihood values sampled via MCMC [65, 66]. Although commonly used for model comparison in phylogenetics [e.g. [67, 68]], the harmonic mean estimator has been found unstable, and thus often unreliable [37]. An alternative approach is thermodynamic integration (TDI) or path sampling [36–38], that has been shown to provide more accurate estimates of the marginal likelihood and has recently been applied in phylogenetics [37, 69–71] and population genetics [72]. This method allows the exploration of regions of the parameters space with low likelihood by altering the acceptance ratio of the MCMC by a scaling factor β. The scaling factor ranges from 0 to 1 and is applied as an exponent to the likelihood function so that with β = 0 the MCMC samples from the prior distribution only, and with β = 1 the distribution of interest is sampled. The marginal likelihood L_{ M } is then obtained by integrating the likelihood expectations E_{ β } over all values of β:
We discretize the integral by using a number C of scaling factors β_{ 0 }, β_{ 1 }, ... β_{ C } evenly spaced from 0 to 1, and estimating the respective loglikelihood expectations U_{ βi } as the mean of the MCMC sample. The discrete integral is then calculated applying Simpson's trapezoidal rule:
The accuracy of this discrete approximation of (9) depends on the number of classes C that can represent a limiting factor since the computational time increases linearly with the number of categories. Beerli and Palczewski [72] showed that a highly accurate estimate of L_{ M } can be obtained with a small number of scaling factors when integrating analytically over the first interval [β_{ 0 }, β_{ 1 } ] using Bézier cubic spline. This approach uses control points P_{ 1 } and P_{ 2 } based on the likelihood expectations at the first three scaling factors U_{ 0 }, U_{ 1 }, U_{ 2 } :
The four control points are used to define the cubic Bézier curve B_{ P0, P1, P2, P3 } , with the first and the last points being
The integral of the marginal likelihood over the interval [β_{ 0 }, β_{ 1 } ] is then calculated as
We found that the shape of the Bézier spline described by Beerli and Palczewski [72] provided a good approximation of the curve obtained by applying many scaling factors under different models of diversification, and therefore adopted it in our computation of the marginal likelihood. After testing various numbers of scaling classes to calculate the discrete thermodynamic estimate of the marginal likelihood (not shown), six classes were found to be a good compromise between accuracy of the result and computational time. Once the log marginal likelihoods L_{ M } were obtained via TDI, the log Bayes factor (BF) between pairs of models M_{ 0 } and M_{ 1 } was computed as BF_{ 01 } = 2(M_{ 1 }  M_{ 0 } ) and its interpretation based on the values suggested by Kass and Raftery [65]. Thus BF_{ 01 } greater than 2 represent positive evidence for model M_{ 1 } , and greater than 6 provide strong evidence. To assess the power of Bayes factor in discerning between different modes of diversification, we analyzed several simulated data set under birthdeath models and purebirth assuming one to three rate shifts with a special focus on processes that generate similar patterns (e.g. birthdeath and purebirth with rate increase).
Statistical evaluation
To test the performance of our method, we analyzed simulated phylogenies generated under a range of models using the Rpackage TreeSim [34, 73]. A total of 38 data sets of 100 phylogenies (with 50, 100, or 400 tips) were simulated under different models of diversification (Additional file 4). We simulated constant rate birthdeath models with extinction fractions ranging from low to very high (0.1, 0.5, and 0.9), and different taxon sampling proportions (25%, 50%, 75%, and 100%). Purebirth processes were simulated with either constant rates, or including shifts in diversification rates (one or two shifts) under small (twofold), moderate (fivefold), and large (eightfold) rate variations, respectively (Additional file 4). Trees (50 and 100 tips) with approximately continuously decreasing speciation rates were obtained by imposing nine equally spaced rate shifts, under two different diversification scenarios where speciation rates follow an exponential decrease (λ_{0} = 1, μ = 0.1, and k = 0.25; λ_{0} = 5, μ = 0, and k = 0.95). Because of the limitations of the SPVAR [8] model under variable or high extinction rates [74, 75], we assumed absent or very low and constant extinction. As these simulations only approximate continuously decreasing rates, we report the parameter estimates under the SPVAR model, but do not perform model comparisons via Bayes factors.
To assess the accuracy of the rate estimates, we calculated the relative errors [cf. [12]] as (r_{est}  r_{true})/r_{ true } , where r_{est} is the estimated rate of speciation or extinction and r_{true} is the true value. A positive relative error indicates overestimation of the parameter, whereas a negative value indicates its underestimation. For the purebirth model with rate variation, the marginal diversification rates through time were calculated for time categories of 1 million years, and their relative errors were calculated in relation to the true values between shift points. The modal values of the posterior distribution of the shift points were compared against the true shift times and their relative error was calculated as (t_{est}  t_{true})/T, where t_{est} is the estimated time of rate shift, t_{true} is its true value, and T is the average root node age of the analyzed trees.
To address the impact of estimating rates on a single tree compared to analyzing a distribution of trees, we used a tree topology simulated in PhyloGen [76] under the birthdeath process (100 tips; r = 1; a = 0.9) to simulate nucleotide sequences (3978 bp, HKY+I+Γ) using the program SeqGen [77]. Phylogenetic trees were then reconstructed in BEAST [v.1.6.1; [51]]. For comparison, we also inferred the maximum likelihood estimates of λ and μ on the consensus tree (Figure 3) through a birthdeath optimization as implemented in LASER [78].
To empirically assess the potential impact of specifying priors on the birthdeath parameters in both the molecular clock analysis and the subsequent rate estimation, additional simulations were performed. We generated trees in PhyloGen (50 tips; extinction fraction a = 0, 0.5, and 0.9) on which nucleotide sequences (5000 bp, HKY+I+Γ) were simulated using the program SeqGen [77]. Dated phylogenies were reconstructed in BEAST using the default uniform priors on the birthdeath parameters and constraining the root node to the age of the initial tree. The posterior rates were then estimated using our MCMC approach on both the initial tree (used to simulate the alignment) and the distribution of trees obtained from BEAST. The rate estimates were compared by calculating their variation in terms of relative error (Additional file 3).
Analyses on the case studies
The phylogenetic relationships of the genus Chondrostoma were reconstructed using mitochondrial cytochrome b and nuclear ßactin gene sequences for all currently recognized taxa [39]. Phylogenetic trees and divergence times were reconstructed using BEAST [v.1.6.1; [51]] and assuming the GTR+I+Γ model of sequence evolution. A speciation model following a Yule process was selected as the tree prior, with an uncorrelated lognormal (UCLN) model for the rate variation among branches. Secondary calibration points were used, following Gante et al. [39], constraining nodes to a normal prior: the crown node of Chondrostoma was constrained with a mean of 15.1 Mya (central 95% range 12.5  17.6 Mya). The split between C. olisiponensis and its sister clade was constrained with a mean of 10.1 Mya (central 95% range 7.7  12.4 Mya). The analysis was run for 15 million generations, sampling states every 2, 000 generations. The adequacy of the sampling was assessed with Tracer [50] using the Effective Sample Size diagnostic (Additional file 2). We evaluated the temporal patterns of diversification using all diversification models implemented in our approach applied on a random sample of 100 trees from the molecular clock analysis.
We reconstructed the phylogeny and divergence times of the genus Lupinus based on a combined alignment of ITS and LEGCYC1. Following Hughes and Eastwood [16], the two markers were partitioned and analyzed under GTR+Γ and GTR+I models, respectively. A relaxed molecular clock analysis was carried out using BEAST, assuming an uncorrelated lognormal clock model, running 30 million MCMC generations. A normal distribution with a mean of 16.01 Mya and a standard deviation of 2.6 for the stem node of Lupinus was set as calibration point. We carried out the diversification analyses on a random sample of 100 trees obtained from a relaxed molecular clock analysis, applying a purebirth process after model selection. The estimation of the diversification rates was performed assuming cladespecific taxon sampling (ρ_{I} = 0.77, ρ_{II} = 0.55, ρ_{III} = 0.18, ρ_{IV} = 0.40) under different models in which the rates were linked or unlinked among clades.
The metaanalysis on the Cape Floristic Region was based on four dated phylogenies of different Cape clades [23, 44]. The analyses using the Fmodel were performed on a random set of 100 trees sampled from the posterior distribution of each data set. All clades have on average only a small proportion of missing taxa, which was accounted for by means of cladespecific sampling fractions. Each data set was split into Cape and nonCape clades based on their main geographic distribution, high degrees of endemism  particularly within the CFR  allowed a simple assignment of clades (Figure 5C). The metaanalysis was performed implementing purebirth models, which were favored over a birthdeath process with a Bayes factor value of 3.46.
References
Nee S, Mooers AØ, Harvey PH: Tempo and mode of evolution revealed from molecular phylogenies. Proc Natl Acad Sci USA. 1992, 89 (17): 83228326. 10.1073/pnas.89.17.8322.
Sanderson MJ, Donoghue MJ: Reconstructing shifts in diversification rates on phylogenetic trees. Trends Ecol Evol. 1996, 11 (1): 1520. 10.1016/01695347(96)810597.
Barraclough TG, Nee S: Phylogenetics and speciation. Trends Ecol Evol. 2001, 16 (7): 391399. 10.1016/S01695347(01)021619.
Ricklefs RE: Estimating diversification rates from phylogenetic information. Trends Ecol Evol. 2007, 22 (11): 601610. 10.1016/j.tree.2007.06.013.
Jablonski D, Roy K, Valentine JW, Price RM, Anderson PS: The impact of the pull of the recent on the history of marine diversity. Science. 2003, 300 (5622): 11331135. 10.1126/science.1083246.
Jaramillo C, Rueda MJ, Mora G: Cenozoic plant diversity in the Neotropics. Science. 2006, 311 (5769): 18931896. 10.1126/science.1121380.
Nee S, Holmes EC, May RM, Harvey PH: Extinction rates can be estimated from molecular phylogenies. Phil Trans R Soc B. 1994, 344 (1307): 7782. 10.1098/rstb.1994.0054.
Rabosky DL, Lovette IJ: Explosive evolutionary radiations: Decreasing speciation or increasing extinction through time?. Evolution. 2008, 62 (8): 18661875. 10.1111/j.15585646.2008.00409.x.
Nee S, May RM, Harvey PH: The reconstructed evolutionary process. Phil Trans R Soc B. 1994, 344: 305311. 10.1098/rstb.1994.0068.
Magallón S, Sanderson MJ: Absolute diversification rates in angiosperm clades. Evolution. 2001, 55 (9): 17621780.
Ricklefs RE: Global variation in the diversification rate of passerine birds. Ecology. 2006, 87 (10): 24682478. 10.1890/00129658(2006)87[2468:GVITDR]2.0.CO;2.
Rabosky DL: Likelihood methods for detecting temporal shifts in diversification rates. Evolution. 2006, 60 (6): 11521164.
Morlon H, Potts MD, Plotkin JB: Inferring the dynamics of diversification: a coalescent approach. PLoS Biol. 2010, 8 (9): e100049310.1371/journal.pbio.1000493.
Alfaro ME, Santini F, Brock C, Alamillo H, Dornburg A, Rabosky DL, Carnevale G, Harmon LJ: Nine exceptional radiations plus high turnover explain species diversity in jawed vertebrates. Proc Natl Acad Sci USA. 2009, 106 (32): 1341013414. 10.1073/pnas.0811087106.
Baldwin BG, Sanderson MJ: Age and rate of diversification of the Hawaiian silversword alliance (Compositae). Proc Natl Acad Sci USA. 1998, 95 (16): 94029406. 10.1073/pnas.95.16.9402.
Hughes C, Eastwood R: Island radiation on a continental scale: Exceptional rates of plant diversification after uplift of the Andes. Proc Natl Acad Sci USA. 2006, 103 (27): 1033410339. 10.1073/pnas.0601928103.
Valente LM, Savolainen V, Vargas P: Unparalleled rates of species diversification in Europe. Proc R Soc Lond B. 2010, 277 (1687): 14891496. 10.1098/rspb.2009.2163.
Day JJ, Cotton JA, Barraclough TG: Tempo and mode of diversification of lake Tanganyika cichlid fishes. PLoS ONE. 2008, 3 (3): e173010.1371/journal.pone.0001730.
BinindaEmonds ORP, Cardillo M, Jones KE, MacPhee RDE, Beck RMD, Grenyer R, Price SA, Vos RA, Gittleman JL, Purvis A: The delayed rise of presentday mammals. Nature. 2007, 446: 507512. 10.1038/nature05634.
Magallón S, Castillo A: Angiosperm diversification through time. Am J Bot. 2009, 96 (1): 349365. 10.3732/ajb.0800060.
Phillimore AB, Freckleton RP, Orme CDL, Owens IPF: Ecology predicts largescale patterns of phylogenetic diversification in birds. Am Nat. 2006, 168 (2): 220229. 10.1086/505763.
Linder HP: Plant species radiations: where, when, why?. Phil Trans R Soc B. 2008, 363: 30973105. 10.1098/rstb.2008.0075.
Valente L, Reeves G, Schnitzler J, Pizer Mazon I, Fay M, Rebelo T, Chase M, Barraclough T: Diversification of the African genus Protea in the Cape biodiversity hotspot and beyond: equal rates but different spatial scales. Evolution. 2010, 64 (3): 745760. 10.1111/j.15585646.2009.00856.x.
Rabosky DL: Ecological limits and diversification rate: alternative paradigms to explain the variation in species richness among clades and regions. Ecol Lett. 2009, 12 (8): 735743. 10.1111/j.14610248.2009.01333.x.
Ree RH: Detecting the historical signature of key innovations using stochastic models of character evolution and cladogenesis. Evolution. 2005, 59 (2): 257265.
Hodges SA, Arnold ML: Spurring plant diversification: Are floral nectar spurs a key innovation?. Proc R Soc Lond B. 1995, 262: 343348. 10.1098/rspb.1995.0215.
Maddison WP, Midford PE, Otto SP: Estimating a binary character's effect on speciation and extinction. Syst Biol. 2007, 56 (5): 701710. 10.1080/10635150701607033.
Moore BR, Donoghue MJ: A Bayesian approach for evaluating the impact of historical events on rates of diversification. Proc Natl Acad Sci USA. 2009, 106 (11): 43074312. 10.1073/pnas.0807230106.
Kendall DG: On the generalized birthanddeath process. Ann Math Stat. 1948, 19 (1): 115. 10.1214/aoms/1177730285.
Stadler T: Mammalian phylogeny reveals recent diversification rate shifts. Proc Natl Acad Sci USA. 2011, 108 (15): 61876192. 10.1073/pnas.1016876108.
Rabosky DL: Extinction rates should not be estimated from molecular phylogenies. Evolution. 2010, 64 (6): 18161824. 10.1111/j.15585646.2009.00926.x.
Bokma F: Bayesian estimation of speciation and extinction probabilities from (in)complete phylogenies. Evolution. 2008, 62 (9): 24412445. 10.1111/j.15585646.2008.00455.x.
Yang Z, Rannala B: Bayesian phylogenetic inference using DNA sequences: A Markov chain Monte Carlo method. Mol Biol Evol. 1997, 14 (7): 717724.
Stadler T: On incomplete sampling under birthdeath models and connections to the samplingbased coalescent. J Theor Biol. 2009, 261 (1): 5866. 10.1016/j.jtbi.2009.07.018.
Höhna S, Stadler T, Ronquist F, Britton T: Inferring speciation and extinction rates under different species sampling schemes. Mol Biol Evol. 2011, 28 (9): 25772589. 10.1093/molbev/msr095.
Gelman A, Meng XL: Simulating normalizing constants: from importance sampling to bridge sampling to path sampling. Statistical Science. 1998, 13 (2): 163185.
Lartillot N, Philippe H: Computing Bayes factors using thermodynamic integration. Syst Biol. 2006, 55 (2): 195207. 10.1080/10635150500433722.
Ogata Y: A MonteCarlo method for high dimensional integration. Numer Math. 1989, 55 (2): 137157. 10.1007/BF01406511.
Gante HF, Santos CD, Alves MJ: Phylogenetic relationships of the newly described species Chondrostoma olisiponensis (Teleostei: Cyprinidae). J Fish Biol. 2010, 76 (4): 965974. 10.1111/j.10958649.2010.02536.x.
Bianco PG: Potential role of the palaeohistory of the Mediterranean and Paratethis basins on the early dispersal of EuroMediterranean freshwater Wshes. Ichthyol Explor Freshwaters. 1990, 1: 167184.
Bănărescu P: Zoogeography of fresh waters. Volume 2: distribution and dispersal of freshwater animals in North America and Eurasia. 1991, Wiesbaden: AULAVerlag
Robalo JI, Almada VC, Levy A, Doadrio I: Reexamination and phylogeny of the genus Chondrostoma based on mitochondrial and nuclear data and the definition of 5 new genera. Mol Phylogenet Evol. 2007, 42 (2): 362372. 10.1016/j.ympev.2006.07.003.
Linder HP: The radiation of the Cape flora, southern Africa. Biol Rev (Camb). 2003, 78: 597638.
Schnitzler J, Barraclough TG, Boatwright JS, Goldblatt P, Manning JC, Powell MP, Rebelo T, Savolainen V: Causes of plant diversification in the Cape biodiversity hotspot of South Africa. Syst Biol. 2011, 60 (3): 343357. 10.1093/sysbio/syr006.
Python Core Development Team: Python programming language v.2.6.4. 2010, [http://www.python.org/]
Ascher D, Dubois PF, Hinsen K, Hugunin J, Oliphant T: Numerical Python. UCRLMA128569. 2001, Livermore, CA 94566: Lawrence Livermore National Laboratory
Jones E, Oliphant T, Peterson P: SciPy: Open source scientific tools for Python. 2001, [http://www.scipy.org]
R Development Core Team: R: A language and environment for statistical computing v.2.13.1. 2011, R Foundation for Statistical Computing, [http://www.Rproject.org/]
Gautier L: rpy2: A simple and efficient access to R from Python. 2008, [http://rpy.sourceforge.net/]
Rambaut A, Drummond AJ: Tracer v.1.5. 2007, [http://beast.bio.ed.ac.uk/Tracer]
Drummond AJ, Rambaut A: BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol. 2007, 7: 21410.1186/147121487214.
Yang ZH: PAML 4: Phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007, 24 (8): 15861591. 10.1093/molbev/msm088.
Gernhard T: The conditioned reconstructed process. J Theor Biol. 2008, 253 (4): 769778. 10.1016/j.jtbi.2008.04.005.
Yang ZH, Rannala B: Bayesian estimation of species divergence times under a molecular clock using multiple fossil calibrations with soft bounds. Mol Biol Evol. 2006, 23 (1): 212226.
Thorne JL, Kishino H: Divergence time and evolutionary rate estimation with multilocus data. Syst Biol. 2002, 51 (5): 689702. 10.1080/10635150290102456.
Lepage T, Bryant D, Philippe H, Lartillot N: A general comparison of relaxed molecular clock models. Mol Biol Evol. 2007, 24 (12): 26692680. 10.1093/molbev/msm193.
Thorne JL, Kishino H, Painter IS: Estimating the rate of evolution of the rate of molecular evolution. Mol Biol Evol. 1998, 15 (12): 16471657.
Lartillot N, Philippe H: A Bayesian mixture model for acrosssite heterogeneities in the aminoacid replacement process. Mol Biol Evol. 2004, 21 (6): 10951109. 10.1093/molbev/msh112.
Kubo T, Iwasa Y: Inferring the rates of branching and extinction from molecular phylogenies. Evolution. 1995, 49 (4): 694704. 10.2307/2410323.
Paradis E: Can extinction rates be estimated without fossils?. J Theor Biol. 2004, 229 (1): 1930. 10.1016/j.jtbi.2004.02.018.
Rabosky DL: Primary controls on species richness in higher taxa. Syst Biol. 2010, 59 (6): 634645. 10.1093/sysbio/syq060.
Hastings WK: Monte Carlo sampling methods using Markov chains and their applications. Biometrika. 1970, 57 (1): 97109. 10.1093/biomet/57.1.97.
Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AW, Teller E: Equations of state calculations by fast computing machines. J Chem Phys. 1953, 21 (6): 10871091. 10.1063/1.1699114.
Jeffreys H: Some tests of significance, treated by the theory of probability. P Camb Philos Soc. 1935, 31 (2): 203222. 10.1017/S030500410001330X.
Kass RE, Raftery AE: Bayes Factors. J Amer Stat Assoc. 1995, 90 (430): 773795. 10.2307/2291091.
Newton MA, Raftery AE: Approximate Bayesian inference with weighted likelihood bootstrap. J Roy Stat Soc B. 1994, 56 (1): 348.
Nylander JAA, Ronquist F, Huelsenbeck J, NievesAldrey JL: Bayesian phylogenetic analysis of combined data. Syst Biol. 2004, 53 (1): 4767. 10.1080/10635150490264699.
Pagel M, Meade A: Bayesian analysis of correlated evolution of discrete characters by reversiblejump Markov chain Monte Carlo. Am Nat. 2006, 167 (6): 808825. 10.1086/503444.
Rodrigue N, ArisBrosou S: Fast Bayesian choice of phylogenetic models: Prospecting data augmentationbased thermodynamic integration. Syst Biol. 2011
Fan Y, Wu R, Chen M, Kuo L, Lewis P: Choosing among partition models in Bayesian phylogenetics. Mol Biol Evol. 2011, 28 (1): 523532. 10.1093/molbev/msq224.
Xie W, Lewis PO, Fan Y, Kuo L, Chen MH: Improving marginal likelihood estimation for Bayesian phylogenetic model selection. Syst Biol. 2011, 60 (2): 150160. 10.1093/sysbio/syq085.
Beerli P, Palczewski M: Unified framework to evaluate panmixia and migration direction among multiple sampling locations. Genetics. 2010, 185 (1): 313326. 10.1534/genetics.109.112532.
Hartmann K, Wong D, Stadler T: Sampling trees from evolutionary models. Syst Biol. 2010, 59 (4): 465476. 10.1093/sysbio/syq026.
Bokma F: Problems detecting densitydependent diversification on phylogenies. Proc R Soc Lond B. 2009, 276 (1659): 993994. 10.1098/rspb.2008.1249.
Rabosky DL, Lovette IJ: Problems detecting densitydependent diversification on phylogenies: reply to Bokma. Proc R Soc Lond B. 2009, 276 (1659): 995997. 10.1098/rspb.2008.1584.
Rambaut A: PhylOGen. Phylogenetic Tree Simulator Package v.1.1. 2002, University of Oxford, [http://tree.bio.ed.ac.uk]
Rambaut A, Grassly NC: SeqGen: An application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic frees. Comput Appl Biosci. 1997, 13 (3): 235238.
Rabosky DL: LASER: A maximum likelihood toolkit for detecting temporal shifts in diversification rates from molecular phylogenies. Evol Bioinform Online. 2006, 2: 257260.
Acknowledgements
We thank Nicolas Salamin and Bob O'Hara for valuable discussions, Hugo F. Gante and Colin Hughes for providing the alignments for Chondrostoma and Lupinus, respectively, Ziheng Yang for help implementing the birthdeath model with taxon sampling, Fernando FernandezMendoza and Peter Beerli for aid with the thermodynamic integration. We also thank David Posada, Folmer Bokma, and Jeffrey L. Thorne for helpful comments on an earlier version on the manuscript. All simulations were run on the Frankfurt Cloud Server, we thank Kai Bosch and Matthew Forrest for support. This study was funded by Hesse's Ministry of Higher Education, Research, and the Arts (funding program "LOEWE  LandesOffensive zur Entwicklung wissenschaftlichökonomischer Exzellenz") and the German Research Foundation (DFG ZI 557/71, SCHU 2426/11).
Author information
Affiliations
Corresponding author
Additional information
Authors' contributions
DS, JS, and GZ designed the study. DS and JS developed the Bayesian framework and the models. DS wrote the Python code, JS designed and performed the simulations and wrote the R scripts. DS and JS wrote the paper, all authors read and approved the final version of the manuscript.
Daniele Silvestro, Jan Schnitzler contributed equally to this work.
Electronic supplementary material
12862_2011_1906_MOESM1_ESM.XLS
Additional file 1: Marginal Likelihoods for different models of diversification. Marginal Likelihoods for simulated data sets calculated under birthdeath (BD), purebirth (PB), and purebirth with rate shift (PB2PB4) models based on thermodynamic integration. (XLS 41 KB)
12862_2011_1906_MOESM2_ESM.XLS
Additional file 2: Posterior rate estimates. Parameter estimates, 95% credibility intervals and ESS values for all data sets simulated. All simulations settings are provided in Additional file 4. (XLS 44 KB)
12862_2011_1906_MOESM3_ESM.XLS
Additional file 3: Effect of sequential estimation of divergence times and diversification rates. The potential impact of specifying priors on the birthdeath parameters in both the molecular clock analysis and the subsequent rate estimation is assessed through generating a starting tree, simulating a molecular alignment on it, and run BEAST analyses on the alignment. The rate are then estimated on both the starting tree and the BEAST posterior trees, and compared. (XLS 30 KB)
12862_2011_1906_MOESM4_ESM.XLS
Additional file 4: List of the simulation settings. Simulations were obtained from birthdeath (BD), purebirth (PB), purebirth with rate shift (PB2PB4), and (approximately) continuously decreasing speciation rates (SPVAR). The parameters included are speciation rates (λ), extinction rate (μ), time of rate shift (s), sampling fraction (ρ), and the shape parameter of the exponential transformation of λ through time (k). The continuously decreasing rates (SPVAR model) were approximated by imposing nine equally spaced rate shifts where speciation rates follow an exponential decrease. The value of λ reported for the SPVAR model represents the initial speciation rate (λ_{0}). (XLS 38 KB)
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Silvestro, D., Schnitzler, J. & Zizka, G. A Bayesian framework to estimate diversification rates and their variation through time and space. BMC Evol Biol 11, 311 (2011). https://doi.org/10.1186/1471214811311
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/1471214811311
Keywords
 Markov Chain Monte Carlo
 Credibility Interval
 Extinction Rate
 Marginal Likelihood
 Speciation Rate