Skip to main content
  • Methodology Article
  • Open access
  • Published:

Improving the performance of Bayesian phylogenetic inference under relaxed clock models



Bayesian MCMC has become a common approach for phylogenetic inference. But the growing size of molecular sequence data sets has created a pressing need to improve the computational efficiency of Bayesian phylogenetic inference algorithms.


This paper develops a new algorithm to improve the efficiency of Bayesian phylogenetic inference for models that include a per-branch rate parameter. In a Markov chain Monte Carlo algorithm, the presented proposal kernel changes evolutionary rates and divergence times at the same time, under the constraint that the implied genetic distances remain constant. Specifically, the proposal operates on the divergence time of an internal node and the three adjacent branch rates. For the root of a phylogenetic tree, there are three strategies discussed, named Simple Distance, Small Pulley and Big Pulley. Note that Big Pulley is able to change the tree topology, which enables the operator to sample all the possible rooted trees consistent with the implied unrooted tree. To validate its effectiveness, a series of experiments have been performed by implementing the proposed operator in the BEAST2 software.


The results demonstrate that the proposed operator is able to improve the performance by giving better estimates for a given chain length and by using less running time for a given level of accuracy. Measured by effective samples per hour, use of the proposed operator results in overall mixing more efficient than the current operators in BEAST2. Especially for large data sets, the improvement is up to half an order of magnitude.


Bayesian phylogenetics puts an emphasis on estimating a probability distribution over parameters of interest, including the phylogenetic tree topology and divergence times, given the data. The Metropolis-Hastings Markov chain Monte Carlo (MCMC) [1, 2] algorithm has been the primary computational tool used in Bayesian phylogenetics for sampling from the posterior distribution. This paper is aimed at improving the performance of the relaxed clock model in Bayesian phylogenetic analysis.

Early implementations of Bayesian phylogenetic inference assumed a strict molecular clock where the evolutionary rates are the same at every branch [3]. This was the preferred method for estimating divergence times [4, 5]. The introduction of relaxed molecular clocks allowed for the estimation of divergence times [6] and phylogeny [7] in the presence of rate heterogeneity among branches. Since then, the relaxed clock model has been widely applied, such as the study of Nothofagus [8] and flowering plants [9]. Many aspects of the performance and accuracy of relaxed clock models have subsequently been investigated (e.g. [10, 11]).

Bayesian phylogenetic inference via MCMC is computationally intensive for large data sets. Two approaches to improve efficiency are (i) by making faster likelihood calculations, and (ii) by incorporating more effective proposal kernels. Calculating the phylogenetic likelihood is computationally expensive. Hence, researchers have tried many ways to tackle the computation burden in the likelihood calculations, such as detection of repeating sites [12], approximate methods (e.g. [13, 14]) and the use of parallelisation strategies (e.g. BEAGLE [15]).

However the overall efficiency of the sampling process also depends strongly on the construction of the proposal mechanism. An effective proposal mechanism is proficient at exploring the posterior distribution, and can do so with fewer steps in the MCMC chain. Therefore fewer likelihood calculations are required, since each step in the chain that changes the tree or substitution parameters requires a likelihood calculation.

A major limitation in Bayesian MCMC analysis of phylogeny lies in the efficiency with which operators sample the tree space [16, 17]. Fast and reliable estimation is dependent on a good mixture of operators, since the posterior distribution often exhibits correlations between the tree and other random variables.

In this paper, we present a novel operator that works alongside standard operators by proposing moves within a subspace of constant genetic distances. Namely, the proposed operator changes both divergence times of nodes and neighbouring branch rates so that the implied genetic distances are not changed. For time-reversible substitution models the phylogenetic likelihood will also be unchanged under this operation. The proposed operator has been implemented and tested in BEAST2 [18].


Bayesian MCMC

Let D, g and Φ denote the data, phylogenetic time-tree and a set of evolutionary parameters respectively. The time-tree g={E,t} consists of a directed edge graph, E, defining a rooted tree topology on a set of labelled taxa and a set of associated divergence times t (for details see e.g. [19]). The posterior probability density can be calculated using Eq. 1. It consists of prior distributions for the tree and the parameters, a phylogenetic likelihood that conveys information from data, and the posterior distribution to be inferred. These are denoted in the form of probability densities by p(g),p(Φ),p(D|g,Φ),p(g,Φ|D) respectively. From a Bayesian perspective, the phylogenetic trees and the parameters are random variables described by a posterior probability distribution given the observed data D.

$$ p(g,\mathbf{\Phi} |\mathbf{D}) = \frac{{p (\mathbf{D}|g,\mathbf{\Phi}) p (g) p (\mathbf{\Phi})}}{{p (\mathbf{D})}} $$

However, due to the state space being high dimensional and the marginal likelihood being infeasible to calculate, MCMC is adopted to sample the posterior distribution. Specifically, MCMC algorithms construct a Markov chain whose stationary distribution is the posterior distribution p(g,Φ|D), in such a way that the computation of the marginal likelihood p(D) is avoided.

Tree proposals

We use the term “operator" to describe an algorithm that can be used to draw a new state θ given an existing state θ={g,Φ} from a specific proposal kernel q(θ|θ) and also return the Hastings-Green ratio for the proposed state transition [2, 20].

Standard naïve operators such as the random walk operator propose the new state θ by adding a random variate to a component of the current state θ [21]. Similarly, scale operators multiply a subset of the current state by a random scale factor [22]. They are suitable for working on a single random variable, or a single component of the model, for example the population size parameter of the coalescent tree prior. Standard operators for the tree topology and divergence times include the subtree slide operator, Wilson-balding and narrow exchange operators [19, 23].

In this paper, the novelty of the proposed operators lies in maintaining the genetic distance d while changing the rate r and divergence time t. The reason is that the likelihood along one branch is constant if its distance is fixed, i.e. d=r×t, noting that the likelihood is calculated based on transition probability matrix for each branch of \(\phantom {\dot {i}\!}{e^{\mathbf {Q}d_{i}}}\), where di is the branch length in units of substitutions per site for branch i. In this way, the joint distribution on rates and divergence times can be explored without proposing states that would adversely affect the phylogenetic likelihood.

Uncorrelated relaxed clock model

Molecular clocks model how molecular sequences evolve along branches in the phylogenetic tree, so that a time tree can be reconciled with the genetic distances between sequences. In this paper, uncorrelated relaxed clock models are adopted, where the rates are drawn independently and identically from a given prior distribution, such as the log-normal distribution [7]. As a result, the rates can vary markedly between parent and child branches.

Referring to the Bayesian framework in Eq. (1), the joint inference of evolutionary rates r and the time tree g can be obtained by the conditional distribution in Eq. 2:

$$ p(g,\mathbf{r},\boldsymbol{\Phi} |\mathbf{D}) = \frac{{p(\mathbf{D}|g,\mathbf{r},\mathbf{\Phi})p(\mathbf{r})p(g)p(\mathbf{\Phi})}}{{p(\mathbf{D})}} \text{,} $$

where p(r) is the prior for rates specified in uncorrelated relaxed clock model. In the constructed Markov chain, the operator proposes a new state θ=(r,g,Φ), from the original state θ={r,g,Φ}.

While the proposed operator is introduced based on uncorrelated clock models, it could equally be applied to any other relaxed clock that applies a rate parameter to each branch, such as autocorrelated clock models [6].


To validate the correctness and determine the efficiency, we conducted a series of experiments by implementing the Constant Distance operator in BEAST2 [18].

First, we perform a well-calibrated simulation study, which tests our operator alongside existing operators. Correctness was further confirmed by sampling trees from the prior distribution i.e. without data (Refer to Appendix 2 section for more details). By comparing effective sample sizes (ESS) [24] and running times, it is demonstrated that the performance is improved when including our proposed operator. Finally, the posterior correlation of rates and node times are discussed.

Well-calibrated simulation study

A well-calibrated simulation study is a powerful tool for evaluating and validating the implementation of a Bayesian model [25].

Figure 1 shows the Bayesian model used in this study, which includes the evolutionary model and the prior distributions of parameters. As is shown in the figure, the sequence alignment is simulated by a phylogenetic continuous-time Markov chain in BEAST2. It contains a substitution rate matrix given by the HKY85 [26] model and a substitution tree determined by an uncorrelated relaxed clock model and Yule model. More specifically, base frequencies π follow a Dirichlet distribution and the transition-transversion ratio κ follows a log-normal prior distribution. The distribution of node times is described in a Yule tree ψ with hyperparameter birth rate λ following a log-normal distribution. The rates ri follow a log-normal distribution with mean of 1 and standard deviation s1 following a hyperprior distribution.

Fig. 1
figure 1

The models and prior distributions to simulate the sequence data. The sequence alignment (SA) is simulated through a phylogenetic continuous-time Markov Chain (PhyloCTMC) that consists of a substitution model (HKY) and an uncorrelated relaxed clock model (UCRelaxedClockModel). The random variables in HKY model construct the mutation rate matrix (Q), including base frequencies (π={πA,πC,πG,πT}) and kappa (κ). The time trees (ψ) and branch rates (ri for each branch i in ψ) construct the substitution tree (ST). The branch rates have a LogNormal prior with fixed mean 1 and certain standard deviation (denoted by s1). And the time trees have a Yule model prior with birth rate (λ) having a LogNormal prior. The other prior distributions include a Dirichlet distributions on π, a LogNormal distribution on κ, and a LogNormal distribution on s1. For notations in LogNormal distributions, the uppercase letters represent the parameters in real space, and the lowercase letters represent the parameters in log space. In all the simulations, the number of taxa is fixed at 120 (n = 120)

First, we sampled parameters and trees from the full model 100 times. The random parameters included: standard deviation of rates across branches s1, birth rate λ, base frequencies π and transition-transversion bias κ. Second, we simulated nucleotide alignments using the simulated parameters. In total, 100 data sets were simulated, each with 120 taxa. Third, we used BEAST2 with the Constant Distance operator to infer the tree and parameters from each of the 100 simulated data sets in turn. Finally, the posterior estimates of the parameters were compared with the real values that were used to simulate the corresponding sequence alignment. The comparisons are shown in Fig. 2.

Fig. 2
figure 2

Well-calibrated simulation study with 120 taxa. Each point is a separate simulated dataset

These results show that the true values of the parameters are within the 95% highest posterior density (HPD) interval approximately 95% of the time (Table 1). This well-calibrated simulation study formed part of the validation of our implementation of the Constant Distance operator.

Table 1 Percentage of real values lying in the 95% HPD in Fig. 2

Performance comparison

To evaluate the performance of Constant Distance operator in a Bayesian phylogenetic analysis, we explored the time required to adequately sample the posterior distribution. This was achieved by examining (i) the total time taken by BEAST2 to complete the MCMC inference (running time), and (ii) the effective sample size (ESS) of the sampled parameters. The effective sample size of a parameter is the number of effectively independent samples from the posterior distribution. Larger ESS indicates a better approximation of the marginal posterior distribution of the parameter. We used Tracer [24] to compute ESS.

For each dataset, we compared two operator configurations. 1) Using the current operators in BEAST2 to sample discrete rate categories (Category). 2) Using the Constant Distance operator to sample continuous rates specified by an uncorrelated related clock model (Cons). The Category configuration is the default setting in BEAST 2.5. We aim to compare the performance of the Constant Distance operator to that of the existing operator schedule. In each configuration, the data set was analyzed 20 times with the prior distributions and all other model specifications held constant. The details of operator weights used are given in Appendix 3.1. Each setting is benchmarked using an Intel(R) Xeon(R) Gold 6138 CPU (2.00 GHz).

We performed the analysis on two sets of simulated sequence alignment (See Appendix 3.2 for more details). The simulated data sets both have 20 taxa but different sequence lengths, i.e. one data set containing 500 sites, the other containing 1,000 sites. Moreover, we used four real data sets to further evaluate the performance of Constant Distance operator, including a primate data set [27] and three other data sets (Anolis [28], RSV2 [29, 30] and HIV-1 [31]) in BEAST2 [32].

The ESS and running time are summarised in Fig. 3 and Table 2. To be more specific, we measure the efficiency by ESS per hour, which is calculated by the ESS of parameters in one simulation divided by the running time in hours. Then we compare the efficiency of two configurations by calculating the ratios of ESS per hour for simulations in the two configurations. If the ratio is larger than 1, then ESS per hour of Cons configuration is larger than that of Category configuration. As is shown in Fig. 3, the efficiency varies in different data sets and also depends on what model is used in the analysis. For Anolis data set (29 taxa and 1456 sites), Category configuration performs better than Cons configuration, since most ratios of the parameters are slightly below the red line (which means smaller than 1). Moreover, for RSV2 (129 taxa and 629 sites) and HIV-1 data sets (117 taxa and 663 sites), some ratios of the parameters, “posterior” and “prior” in particular, are above the red line (larger than 1), which indicates that Cons configuration provides larger ESS per hour. Although there are several parameters sampled by Cons configuration having smaller ESS per hour, it should be noticed that the ratio is calculated by choosing random simulations in the two configurations (See Appendix 3.3 for more details). Additionally, it is worth noting that the efficiency is improved more obviously in simulated data set having 1000 sites, compared with the data sets having 500 sites. This indicates that the proposed operators behave better when sequence length is long. More specifically, in Primates data set (87 taxa and 19220 sites), the longer molecular sequence provides more accurate genetic distances, which leads to peaked likelihood distributions. In this circumstance, the proposed operators sample rates and node times that fit the constant genetic distances more efficiently.

Fig. 3
figure 3

Comparison of ESS and running time. There are 6 data sets analysed, including 4 real data sets and 2 simulated data sets with different number of sites, as is shown in the legend. The red line represent the position where the ratio of ESS per hour is equal to 1. The horizontal axis represents the names of sampled parameters

Table 2 Summary of ESS and running time

Table 2 lists the average running time of 20 simulations for each data set. It can be seen that Cons configuration finished simulations with a little bit more time in most cases. This is because the continuous rates have to be adjusted for a new clock standard deviation (See Appendix Section 4 for more details). Moreover, Table 2 also shows the parameter that has the smallest ESS in Category configuration, and is compare with the corresponding ESS in Cons configuration. Although the improvement in ESS is not obvious for both simulate data sets, it is noticed that ESS of the parameters are much larger in Cons configuration for all the real data sets. After calculating the ESS per hour, we conclude that Cons configuration improved the efficiency of the worst estimated parameter in Category configuration by a factor of 1.55 to 8.53.

Correlation analysis of rates and branch lengths

In this section, we conduct a pairwise comparison between rates and branch lengths in units of time. We used a data set of ratite mitochondrial genomes [33]. This data set includes 7 species of ratites and an alignment of 10767 sites. After analysing the ratites data set in BEAST2 using the Constant Distance operator, we calculated the Pearson coefficient between the rates and the times across branches to investigate the posterior correlation of these parameters.

The results are summarised in Fig. 4. Figure 4a presents the topology of the maximum clade credibility tree. We utilised the programme TreeStat2 [34] to obtain the filtered trees that have the same topology as the maximum clade credibility tree from the sampled trees in MCMC chain. This means the trees that have different shared common ancestors of each taxon from the reference tree are filtered out.

Fig. 4
figure 4

Correlation analysis in the ratites tree. l represents the length of a branch, that is the time difference between a parent node and a child node, where l1=l2=t1−0,l3=l4=t2−0,l5=t3−0,l6=t4−0,l7=T−0,l8=t5t1,l9=t3t2,l10=t4t3,l11=t5t4 and l12=Tt5. The rates and branch lengths are converted into log space and then Pearson’s coefficients are computed, which range from -1 to 1. Blue indicates positive correlations and red indicates negative correlations. The darker the colour, the stronger the correlation

Afterwards, Fig. 4b shows the pairwise comparison of the 12 branch rates and 12 branch lengths (in time) on these filtered trees. As can be seen from the diagonal, the rate on one branch is negatively correlated with the length of that branch, which indicates that an older divergence time will lead to a smaller rate. This is because the primary signal in the data is genetic distance, so that there will be a range of rates and divergence times that are consistent with the genetic distances, but the products of these quantities will vary less than the individual parameters. The consequence is that there will tend to be a negative relationship between rate ri and branch length li i.e. ri=di/li. At the same time, there will tend to be a positive relationship between rate ri and its parent’s branch length lip, since a larger lip leads to a smaller li. Moreover, for cherries that share the same branch length in the tree, they will tend to have the same correlation pattern. Take ANDI and DIGI as an example. r1 and l1 are negatively correlated, but r1 and l8 are positively correlated, which is also the correlation of r2,l2 and l8.

It is precisely this form of correlation structure in the posterior that our operator anticipates, and these correlations are the reason that our operator performs better than naive alternatives.

Sampling a fixed unrooted tree

A limiting case for the relaxed molecular clock model (and one exploited in some of our validation tests) occurs for long sequences, when the branch lengths of the unrooted tree, in units of expected substitutions per site, become known without error. With full length genomes now available, although inferring trees from genomes involves complexities and assumptions such as a good partition scheme [35], this limiting case might be approached in some data sets. As a simple test in this paper, this gives rise to an alternative approach to analysis, where divergence times, a root position and the branch rates are random variables, and the data are a set of branch lengths in units of substitution on a known unrooted tree topology.

Previous work done by Reis and Yang [14] also tried to approximate the likelihood of such an unrooted tree in Bayesian phylogenetic inference. Similar researches in [6, 13] show that these methods can account for rate changes in a relaxed clock model, but the genetic distances are not fixed, for example Stéphane Guindon used a Gibbs sampling algorithm [13]. Outside of the Bayesian MCMC formalism, least-squares criteria [36] and maximum likelihood [37, 38], can also be applied to estimate substitution rates and divergence times in unrooted trees.

In this section, we investigated this approach on a fixed substitution tree reconstructed from whole mitochondrial genomes from a set of ratite species [33]. Since no uncertainty is admitted in the genetic distances and the proposed operator doesn’t change the genetic distances, the phylogenetic likelihood is no longer needed and the unrooted tree becomes the data, rather than a multiple sequence alignment.

First of all, we used the ratites data set to construct an unrooted tree with PhyML 3.0 [39, 40]. Figure 5a shows the unrooted tree with the genetic distances on the branches which are fixed in the subsequent relaxed clock analysis in BEAST2.

Fig. 5
figure 5

Illustration of sampling a fixed unrooted tree. In subfigure (a), the unrooted tree is obtained from the ratites data set [33] by a maximum likelihood method [39] and the labeled numbers represent genetic distances. The three unique tree topologies in (b) (c) and (d) are obtained from the sampled trees by using program TreeTraceAnalysis [41]. The branch rates and node times are summarised by using program TreeAnnotator [42]. The labeled numbers represent the posterior mean of rates on the corresponding branches. The colour of branches from green to red indicates the rates increasing from small to large, and the blue bars represent the 95% HPD of the corresponding node times

As an initial starting point, the root is assigned using the midpoint method. After that, according to the genetic distances among seven taxa and the position of the root, consistent divergence times are specified and assigned to each ancestral node, so that a valid rooted time tree is obtained. Once divergence times are determined, rates on the branches are also calculated so that the products match the unrooted substitution tree.

Then we used Constant Distance operator to sample a Markov chain initiated by this starting tree. The resulting posterior distribution is shown in Fig. 5b-d. As can be seen, despite that there is some uncertainty in the root position, the most probable tree in Fig. 5b is consistent with previous analyses of this data (see Figure 2 in Ref. [33]). For large data sets of long sequences, the proposed operators may prove useful to provide faster divergence time estimates based on the assumption of known unrooted topology and branch lengths in units of expected substitutions per site.


We have demonstrated that the presented operator is valid and able to improve the efficiency of phylogenetic MCMC for relaxed clock models. The overall performance of a Bayesian phylogenetic analysis will be affected by the proportion of MCMC steps that this operator is chosen to make the proposal. In the BEAST2 software, this can be changed by modifying the relative weights operators in the operator schedule. The ideal proportion is non-trivial to determine for an arbitrary data set. In this study, we assigned equal weights on operations to all internal nodes (including the root). How to assign weights to achieve better performance is not studied in this paper, and users may assign different weights in practice. Hence, an optimal method of assigning weights still needs further investigation.

The key idea of the presented operator (to maintain the genetic distances) shows a novel direction for more efficient proposals in Bayesian phylogenetic MCMC. For example, the operations on the internal nodes, in the current study, involve one random internal node, one node time and three branch rates. If two or more nodes are selected, then more associated rates and node times can be sampled in one proposal, which may achieve even better efficiency. Another possible approach is to make small changes to the genetic distances as well. To minimise the number of changes to genetic distances, a two-dimensional random draw will be used to change four parameters (one divergence time and one rate changed directly, the other two rates derived so as to minimise changes to genetic distances). What’s more, it should be pointed out that Small Pulley and Big Pulley can only be applied to reversible continuous-time Markov chain models where unrooted trees can be used in inference, because these operators require the underlying unrooted tree to be unchanged. Future work could elaborate a larger class of operators along these lines.

As data sets have increased in size the impetus to improve efficiency of Bayesian phylogenetic inference algorithms has steadily increased. Besides more effective proposal mechanisms within Metropolis-Hastings MCMC, completely novel approaches to Bayesian phylogenetics have also begun to get some attention. Variational methods are one alternative for approximating Bayesian posterior distributions [43]. These approaches make inference an optimisation problem and take advantage of tractable variational distributions that approximate the posterior distribution, thus decreasing the computational cost by avoiding high-dimensional integrals in MCMC sampling schemes. Recent work has investigated the potential for applying variational methods to phylogenetics [44, 45]. Our improved MCMC methods provide a performance baseline for these new approaches.


As data sets have increased in size, the need for computational efficiency of Bayesian phylogenetic analyses has also increased. In this paper, we have discussed a new tree proposal that substantially increases the efficiency of Bayesian phylogenetic inference under a popular class of relaxed molecular clock models.

We demonstrate the correctness of this algorithm with a series of tests including a well-calibrated simulation study. Based on both simulated and real data sets, the proposed operator is more efficient than current algorithms implemented in BEAST2 for datasets with long sequences. This is a desirable property because efficiency is most important for larger datasets. The proposed operator is available for use as a package of BEAST2.


In this section, we define the Constant Distance Operator. Figure 6 illustrates the flow chart of the proposed operators. In a phylogenetic tree, the node to operate on is denoted by X and the Constant Distance Operator works differently on internal nodes and the root node. The details of the operations are introduced step by step in the following subsections.

Fig. 6
figure 6

The flow chart of the Constant Distance operator

Operations on internal nodes

Figure 7 represents the tree (or subtree) with the node X that is randomly selected from among the internal nodes. Let g be the tree in the current state. The following steps propose a new divergence time in g and three rates in r.

Fig. 7
figure 7

Illustration of the operation on an internal node. The operator proposes tX,rX,rL and rR, during which dL,dR,dX are kept constant

Step 1 Identify the parent node and two child nodes of X, denoted by P, L and R respectively.

Step 2 Denote the nodes times of X, P, L and R by tX,tP,tL,tR respectively. Denote the rates on the branches above the nodes by rX,rL and rR respectively.

Step 3 Propose a new node time for X by tXtX+a, where a follows a Uniform distribution with a symmetric window size w, i.e. a Uniform[-w, +w], for some window size w. Make sure that the proposed time is valid, i.e. max{tL,tR}<tX<tP holds. Otherwise, we reject the proposal.

Step 4 Propose new rates by using Eq. 3.

$$ \begin{aligned} {r_{X}}' &= \frac{{r_{X}} \times ({{t_{P}} - {t_{X}}})}{{{t_{P}} - {t_{X}}'}} {r_{L}}' = \frac{{{r_{L}} \times ({{t_{X}} - {t_{L}}})}}{{{t_{X}}' - {t_{L}}}}\\ {r_{R}}' &= \frac{{{r_{R}} \times ({{t_{X}} - {t_{R}}})}}{{{t_{X}}' - {t_{R}}}} \end{aligned} $$

Step 5 Return the Green ratio αIN (Refer to Calculating the Green Ratio in the following subsection).

Operations on the root

We present three strategies for proposing the new rates and a new divergence time for the special case when X is the root node. i) The Simple Distance operator only proposes a new root time. ii) Small Pulley adjusts the distances of branches on both sides of the root. iii) Big Pulley proposes a new tree topology by rearranging the root, without perturbing the unrooted tree. As is illustrated in Fig. 8a, all the operations on the root, including Big Pulley that changes the tree topology, do not change the underlying unrooted tree. For instance, no matter where the root X is (either on branch EF or AE), the operators maintain the distances (dAB,dAC,dAD,dBC,dBD,dCD) and preserve the unrooted tree at the same time.

Fig. 8
figure 8

Illustration of operations on root. a An example of a 4-taxa unrooted tree and two possible rooted trees for the operator to sample, during which the unrooted tree can not be changed. Based on the original tree in b, Simple Distance proposes a node time in g and two rates in r and keeps dL,dR constant in c. Small Pulley proposes two rates in r and D=dL+dR remains constant in d

Simple distance

Figure 8b, c and d show the trees that are rooted at the node X. The original tree g in the current state is shown in Fig. 8b. Similar to the operations on internal nodes, we will use the following steps to propose a new root time in g and two rates in r, as is illustrated in Fig. 8c. At the same time, the genetic distances of two branches linked to the root, i.e. dL and dR, are kept constant.

Step 1 Identify the child nodes of the root X, denoted by L and R. Their corresponding node times and branch rates are tX,tL,tR and rL,rR.

Step 2 Propose a new node time for the root X by tXtX+a, where a Uniform[-w, +w]. Make sure that tX> max{tL,tR} holds. Otherwise, we reject the proposal.

Step 3 Propose new rates for branches on both sides of the root by using Eq. 4.

$$ {r_{L}}' = \frac{{{r_{L}} \times ({{t_{X}} - {t_{L}}})}}{{{t_{X}}' - {t_{L}}}}\\ {r_{R}}' = \frac{{{r_{R}} \times ({{t_{X}} - {t_{R}}})}}{{{t_{X}}' - {t_{R}}}} $$

Step 4 Return the Green ratio αSD.

Small pulley

In contrast to Simple Distance, Small Pulley changes genetic distances of branches on both sides of the root. As is illustrated in Fig. 8d, two new rates in r are proposed based on those in the original tree g. In order to maintain the total genetic distance dL+dR of the two branches linked to the root, after dL is proposed, dR will be adjusted simultaneously. In other words, Small Pulley keeps D=dL+dR constant. The detailed process includes the following 4 steps.

Step 1 Identify the child nodes of the root X, denoted by L and R. Their corresponding node times and branch rates are tX,tL,tR and rL,rR. The implied genetic distances of the two branches linked to the root can be calculated by:

$$ {d_{L}} = {r_{L}} \times ({{t_{X}} - {t_{L}}})\\ {d_{R}} = {r_{R}} \times ({{t_{X}} - {t_{R}}}) $$

Step 2 Propose a new genetic distance for dL by adding a random number that follows a Uniform distribution, i.e. dLdL+b, where b Uniform[-v, +v], for some window size v. Make sure that 0<dL<D holds. Otherwise, we reject the proposal.

Step 3 Propose new rates for branches on each side of the root:

$$ {r_{L}}' = \frac{{{d_{L}}'}}{{{t_{X}} - {t_{L}}}}\\ {r_{R}}' = \frac{{D - {d_{L}}'}}{{{t_{X}} - {t_{R}}}} $$

Step 4 Return the Green ratio αSP.

Big pulley

Big Pulley resamples the rates and times while maintaining the implied unrooted tree in units of genetic distance. So the genetic distances between the taxa are held constant, but the location of the root in the time tree is readjusted.

Before describing the detailed steps, we introduce a method Exchange that proposes a new root position. In Fig. 9, let (i) X denote the root of tree g, (ii) C and N denote the two child nodes of X, (iii) S and M denote the two child nodes of C. The Exchange(M, N) method involves the following steps:

  • Swap the two nodes by pruning and regrafting, i.e. cutting M (N) at its original position and attaching it to the original position of N (M).

    Fig. 9
    figure 9

    Illustration of Exchange (M,N) method. This method is applied to tree g and proposes g by swapping M and N, so that the three distances are adjusted to maintain the distances among S, M and N. That is, dC=dC+b,dN=dC+dN and dM=dMdC, where bU[−v,+v]

  • Propose dCdC+b, where b Uniform[-v, +v]. Make sure that 0<dC<D holds, where D=dC+dN. Otherwise, we reject the proposal.

  • The distances on the other three branches, i.e. dS,dM and dN, will be adjusted:

$$ {d_{S}}' = {d_{S}}\\ {d_{M}}' = {d_{M}} - {d_{C}}'\\ {d_{N}}' = {d_{N}} + {d_{C}} $$

As can be seen from the above descriptions, the method Exchange(M, N) swaps two nodes and adjusts distances (dS,dM,dN and dC) on the four branches so as to maintain the implied genetic distances among three taxa S, M and N.

Additionally, operations in Big Pulley vary depending on the shape of phylogenetic tree. In Fig. 10, a symmetric tree is shown on the left, in which both the child nodes of the root have two child nodes, i.e. L having children H1, H2 and R having children H3, H4. But in the asymmetric tree on the right, only one of the child nodes of the root has two child nodes below it, i.e. O having children G1, G2. But the other child node Y doesn’t have any child node, which is a heterochronous tip. The corresponding operations are detailed in the following two parts.

Fig. 10
figure 10

Two different tree shapes. The symmetric tree is on the left and the asymmetric tree is on the right. The dashed triangles represent the potential subtrees rooted at the nodes

Symmetric tree

For the symmetric tree in Fig. 10, the operations are illustrated in Fig. 11, after which one of the four possible trees ( ) will be proposed. The detailed process is described in Algorithm 1.

Fig. 11
figure 11

Illustration of operations on the symmetric tree in Fig. 10. The proposed operator will propose one of the four possible trees, each with 0.25 probability

For example, suppose we are going to propose tree . After the new node times for the root X and L are proposed, we apply the method by Exchange (H1, R), so that four distances are adjusted, as follows:

$$ {d_{H1}}' = {d_{H1}} - {d_{L}}' \\ {d_{H2}}' = {d_{H2}} \\ {d_{L}}' = {d_{L}} + b \\ {d_{R}}' = {d_{L}} + {d_{R}} $$

Finally, in this example the new rates would be updated by:

$$ \begin{aligned} {r_{H1}}' = \frac{{{d_{H1}}'}}{{{t_{X}}' - {t_{H1}}}} {r_{H2}}' = \frac{{{d_{H2}}'}}{{{t_{L}}' - {t_{H2}}}} {r_{L}}' = \frac{{{d_{L}}'}}{{{t_{X}}' - {t_{L}}'}} {r_{R}}' = \frac{{{d_{R}}'}}{{{t_{L}}' - {t_{R}}}} \end{aligned} $$
Asymmetric tree

For an asymmetric tree such as in Fig. 10 we would operate as illustrated in Fig. 12, in which there are three possible trees ( ). The operations are detailed in Algorithm 2.

Fig. 12
figure 12

Illustration of operations on the asymmetric tree in Fig. 10. The proposed operator will propose one of the three possible trees. If to<tG1, has 1 probability, otherwise and have 0.5 probability each

To give an example, assume we are going to propose tree . Firstly, tX and tO are proposed in Step 3 and Step 4. Then, in Step 4, the method Exchange (G1, Y) is applied, after which the four distances are adjusted as follows:

$$ {d_{G1}}' = {d_{G1}} - {d_{O}}' \\ {d_{G2}}' = {d_{G2}} \\ {d_{O}}' = {d_{O}} + b \\ {d_{Y}}' = {d_{Y}} + {d_{O}} $$

And the four rates are updated as follows:

$$ \begin{aligned} {r_{G1}}' = \frac{{{d_{G1}}'}}{{{t_{X}}' - {t_{G1}}}} {r_{G2}}' = \frac{{{d_{G2}}'}}{{{t_{O}}' - {t_{G2}}}} {r_{O}}' = \frac{{{d_{O}}'}}{{{t_{X}}' - {t_{O}}'}} {r_{Y}}' = \frac{{{d_{Y}}'}}{{{t_{O}}' - {t_{Y}}}} \end{aligned} $$

Calculating the green ratio

MCMC operators must use reversible proposal distributions to satisfy the detailed balance requirements of the MCMC algorithm (Refer to Appendix section 1 for more details). Therefore, all four of our operators involve a final step of calculating the Green ratio for the proposal.

According to the third and fourth steps in the operations for internal nodes, three rates on the branches linked to the selected internal node are proposed by one random number a that is used to change the node time. There are four parameters involved in this proposal, comprised of a 3-dimensional rate space and a 1-dimensional time space. The proposed operator utilises one random number in time space and makes changes in both time and rate space, which leads to a dimension-matching problem. To solve this dimension-matching problem, as is mentioned in Green’s paper [20], it is necessary to construct the Jacobian matrix. In Eq. (12), J1 deals with the parametric spaces before the proposal in vector IN=[tX,rX,rL,rR] and after the proposal in vector OUT=[tX,rX,rL,rR].

$$ {\mathbf{J_{1}}} = \left[ {\begin{array}{cccc} {\frac{{\partial {\mathbf{f}}}}{{\partial {t_{X}}}}}&{\frac{{\partial {\mathbf{f}}}}{{\partial {r_{X}}}}}&{\frac{{\partial {\mathbf{f}}}}{{\partial {r_{L}}}}}&{\frac{{\partial {\mathbf{f}}}}{{\partial {r_{R}}}}} \end{array}} \right] = \left[ {\begin{array}{cccc} {\frac{{\partial {f_{1}}}}{{\partial {t_{X}}}}}&{\frac{{\partial {f_{1}}}}{{\partial {r_{X}}}}}&{\frac{{\partial {f_{1}}}}{{\partial {r_{L}}}}}&{\frac{{\partial {f_{1}}}}{{\partial {r_{R}}}}} \\ {\frac{{\partial {f_{2}}}}{{\partial {t_{X}}}}}&{\frac{{\partial {f_{2}}}}{{\partial {r_{X}}}}}&{\frac{{\partial {f_{2}}}}{{\partial {r_{L}}}}}&{\frac{{\partial {f_{2}}}}{{\partial {r_{R}}}}} \\ {\frac{{\partial {f_{3}}}}{{\partial {t_{X}}}}}&{\frac{{\partial {f_{3}}}}{{\partial {r_{X}}}}}&{\frac{{\partial {f_{3}}}}{{\partial {r_{L}}}}}&{\frac{{\partial {f_{3}}}}{{\partial {r_{R}}}}} \\ {\frac{{\partial {f_{4}}}}{{\partial {t_{X}}}}}&{\frac{{\partial {f_{4}}}}{{\partial {r_{X}}}}}&{\frac{{\partial {f_{4}}}}{{\partial {r_{L}}}}}&{\frac{{\partial {f_{4}}}}{{\partial {r_{R}}}}} \end{array}} \right]\text{,} $$

where the functions f1,f2,f3 and f4 represent how the operator makes a proposal. After substituting Eq. (3) in Eq. (12), the Green ratio for the internal nodes can be derived:

$$ {}{\alpha_{IN}} = \frac{{p (- a)}}{{p (a)}}\left| {\mathbf{J_{1}}} \right| = \frac{{{t_{P}} - {t_{X}}}}{{{t_{P}} - {t_{X}}'}} \times \frac{{{t_{X}} - {t_{L}}}}{{{t_{X}}' - {t_{L}}}} \times \frac{{{t_{X}} - {t_{R}}}}{{{t_{X}}' - {t_{R}}}}\text{,} $$

where the proposal density p(−a) is equal to p(a) since the random number a is drawn from Uniform distribution.

Likewise, the Green ratio for Simple Distance, Small Pulley and Big Pulley can be obtained:

$$ {\alpha_{SD}} = \frac{{{t_{X}} - {t_{L}}}}{{{t_{X}}' - {t_{L}}}} \times \frac{{{t_{X}} - {t_{R}}}}{{{t_{X}}' - {t_{R}}}}\text{,} $$
$$ {\alpha_{SP}} = 1\text{,} $$
$$ {}{\alpha_{BP}} = \mu \times \frac{{{t_{X}}' - {t_{C}}}}{{{t_{X}}' - {t_{C}}'}} \times \frac{{{t_{C}} - {t_{S}}}}{{{t_{C}}' - {t_{S}}}} \times \frac{{{t_{C}} - {t_{N1}}}}{{{t_{X}}' - {t_{N1}}}} \times \frac{{{t_{X}} - {t_{N2}}}}{{{t_{C}}' - {t_{N2}}}}\text{,} $$

where μ=p(g,g)/p(g,g) is defined as the proposal ratio of topology change and is obtained by Algorithm 3. More details of how to calculate the determinant of the Jacobian matrix are explained in Appendix1 section.


1. the green ratio

When developing an operator for MCMC, the proposal function must be reversible. In other words, the probability that the operator propose a new state from the current state is required to be equal to the probability that the proposed state goes back to current state. To be specific, let π(x) be the target probability distribution and p(x,x) be the transition kernel in the continuous Markov chain. The reversibility condition requires that π(x)p(x,x)=π(x)p(x,x). And an operator provides a proposal q(x,x) with some probability α(x,x) that the proposal is accepted. Thus, the reversibility condition is rewritten as π(x)q(x,x)α(x,x)=π(x)q(x,x)α(x,x).

Considering the subspace φ1 on x and subspace φ2 on x, it is assumed that there is a symmetric measure on the combined parametric space φ=φ1×φ2, so that π(x)q(x,x) has a density with respect to a single measure on φ. Then, Green suggested that the reversibility condition should be satisfied by detailed balance [20], as represented by Eq. (17). And according to Peskun’ proof, it is optimal to take Eq. (18) as the acceptance probability to retain the detailed balance [46].

$$ \begin{aligned} \int_{A} {\pi (x) d_{x}} {\int_{B} {q(x, x')}{\alpha(x, x')} d_{x}} = \int_{B} {\pi (x') d_{x'}}{\int_{A} {q(x', x)}{\alpha(x', x)} d_{x'}} \text{,} \end{aligned} $$

where Aφ1 and Bφ2 are two Borel sets. q(x,x) denotes the probability that the operator proposes a new state x given the current state x.

$$ {\alpha_{H}}(x, x') = \min \left\{ {1,\frac{{\pi (x'){p}(x',x)}}{{\pi (x){p}(x,x')}}} \right\} \text{,} $$

where p(x,dx)/p(x,dx) is known as the Hastings ratio.

However, for operators that do not have a symmetric measure, it is necessary to include the Jacobian matrix J in order to deal with the dimension matching problem, as is discussed in Green’s paper [20]. In this case, Eq. (18) is extended, as is shown in Eq. (19).

$$ {\alpha_{G}}(x, x') = \min \left\{ {1,\frac{{\pi (x'){p}(x',x)}}{{\pi (x){p}(x,x')}}}\left|{\mathbf{J}}\right| \right\} \text{,} $$

where J=h(x,x) represents a vector differential matrix of deterministic function h. \(\alpha = \frac {{p}(x',x)}{{p}(x,x')}\left |{\mathbf {J}}\right |\) is defined as the Green ratio, and J ensures that the proposal have a symmetric measure on each subspace in state x and x.

1.1 calculating the green ratio for operations on internal nodes

The Constant Distance Operator firstly proposes a new time for the randomly selected internal node (Eq. (20a)), and then proposes three rates by the original distances and new node times(Eqs. (20b) (20d)).

$$ {f_{1}}:{{\mathrm{t}}_{X}}{\text{' = }}{{\mathrm{t}}_{X}}{\text{ + a}} $$
$$ {f_{2}}:{r_{X}}' = \frac{{{r_{X}} \times ({t_{P}} - {t_{X}})}}{{{t_{P}} - {t_{X}}'}} $$
$$ {f_{3}}:{r_{L}}' = \frac{{{r_{L}} \times ({t_{X}} - {t_{L}})}}{{{t_{X}}' - {t_{L}}}} $$
$$ {f_{4}}:{r_{R}}' = \frac{{{r_{R}} \times ({t_{X}} - {t_{2}})}}{{{t_{X}}' - {t_{R}}}} $$

Substituting Eq. (20) in the Jacobian matrix J1 (Eq. (12)), we can get Eq. (21), so that the determinant of J1 can be obtained by Eq. (22).

$$ {{\mathbf{J}}_{1}} = \left[ {\begin{array}{cccc} 1&0&0&0 \\ {\frac{{ - {r_{X}}}}{{{t_{P}} - {t_{X}}'}}}&{\frac{{{t_{P}} - {t_{X}}}}{{{t_{P}} - {t_{X}}'}}}&0&0 \\ {\frac{{{r_{L}}}}{{{t_{X}}' - {t_{L}}}}}&0&{\frac{{{t_{X}} - {t_{L}}}}{{{t_{X}}' - {t_{L}}}}}&0 \\ {\frac{{{r_{R}}}}{{{t_{X}}' - {t_{R}}}}}&0&0&{\frac{{{t_{X}} - {t_{R}}}}{{{t_{X}}' - {t_{R}}}}} \end{array}} \right] $$
$$ \begin{aligned} \left| {{{\mathbf{J}}_{1}}} \right| &= 1 \times \left| {\begin{array}{ccc} {\frac{{{t_{P}} - {t_{X}}}}{{{t_{P}} - {t_{X}}'}}}&0&0 \\ 0&{\frac{{{t_{X}} - {t_{L}}}}{{{t_{X}}' - {t_{L}}}}}&0 \\ 0&0&{\frac{{{t_{X}} - {t_{R}}}}{{{t_{X}}' - {t_{R}}}}} \end{array}} \right| \\&= \frac{{{t_{P}} - {t_{X}}}}{{{t_{P}} - {t_{X}}'}} \times \left| {\begin{array}{cc} {\frac{{{t_{X}} - {t_{L}}}}{{{t_{X}}' - {t_{L}}}}}&0 \\ 0&{\frac{{{t_{X}} - {t_{R}}}}{{{t_{X}}' - {t_{R}}}}} \end{array}} \right| \\&= \frac{{{t_{P}} - {t_{X}}}}{{{t_{P}} - {t_{X}}'}} \times \frac{{{t_{X}} - {t_{L}}}}{{{t_{X}}' - {t_{L}}}} \times \frac{{{t_{X}} - {t_{R}}}}{{{t_{X}}' - {t_{R}}}} \end{aligned} $$

1.2 calculating the green ratio for simple distance

Simple Distance proposes two rates by using Eqs. (23b) and (23c), according the new root time in Eq. (23a). So the Jacobian matrix can be obtained as is shown in Eq. (24).

$$ {t_{X}}' = {t_{X}} + a $$
$$ {r_{L}}' = \frac{{{r_{L}} \times ({t_{X}} - {t_{L}})}}{{{t_{X}}' - {t_{L}}}} $$
$$ {r_{R}}' = \frac{{{r_{R}} \times ({t_{X}} - {t_{R}})}}{{{t_{X}}' - {t_{R}}}} $$
$$ {{\mathbf{J}}_{2}} = \left[ {\begin{array}{ccc} {\frac{{\partial {t_{X}}'}}{{\partial {t_{X}}}}}&{\frac{{\partial {t_{X}}'}}{{\partial {r_{X}}}}}&{\frac{{\partial {t_{X}}'}}{{\partial {r_{R}}}}} \\ {\frac{{\partial {r_{L}}'}}{{\partial {t_{X}}}}}&{\frac{{\partial {r_{L}}'}}{{\partial {r_{X}}}}}&{\frac{{\partial {r_{L}}'}}{{\partial {r_{R}}}}} \\ {\frac{{\partial {r_{X}}'}}{{\partial {t_{X}}}}}&{\frac{{\partial {r_{X}}'}}{{\partial {r_{X}}}}}&{\frac{{\partial {r_{X}}'}}{{\partial {r_{R}}}}} \end{array}} \right] = \left[ {\begin{array}{ccc} 1&0&0 \\ {\frac{{{r_{L}}}}{{{t_{X}}' - {t_{L}}}}}&{\frac{{{t_{X}} - {t_{L}}}}{{{t_{X}}' - {t_{L}}}}}&0 \\ {\frac{{{r_{x}}}}{{{t_{X}}' - {t_{R}}}}}&0&{\frac{{{t_{X}} - {t_{R}}}}{{{t_{X}}' - {t_{R}}}}} \end{array}} \right] $$

So the determinant of J2 is calculated by Eq. (25)

$$ \left| {{{\mathbf{J}}_{2}}} \right| = \frac{{{t_{X}} - {t_{L}}}}{{{t_{X}}' - {t_{L}}}} \times \frac{{{t_{X}} - {t_{R}}}}{{{t_{X}}' - {t_{R}}}} $$

Calculating the green ratio for small pulley

Small Pulley proposes a new genetic distance of a branch on one side of the root by adding a random number b, which is equal to adding a random number b to the original product of rate and time on that branch. As a result, a new rate is proposed by Eq. (26a). Similarly, a new rate on another branch is proposed by Eq. (26b), because the total distance of the two branches linked to the root should remain constant.

$$ {r_{L}}' = \frac{{{r_{L}} \times ({t_{X}} - {t_{L}}) + b}}{{{t_{X}} - {t_{L}}}} $$
$$ \begin{aligned} {r_{R}}' &= \frac{{[{r_{R}} \times ({t_{X}} - {t_{R}}) + {r_{L}} \times ({t_{X}} - {t_{L}})] - [{r_{L}} \times ({t_{X}} - {t_{L}}) + b]}}{{{t_{X}} - {t_{R}}}}\\& = \frac{{{r_{R}} \times ({t_{X}} - {t_{R}}) - b}}{{{t_{X}} - {t_{R}}}} \end{aligned} $$

Then, as is illustrated in Eq. (27), the Jacobian matrix J3 is simply obtained, which makes the determinant |J3|=1.

$$ {{\mathbf{J}}_{3}} = \left[ {\begin{array}{cc} {\frac{{\partial {r_{L}}'}}{{\partial {r_{L}}}}}&{\frac{{\partial {r_{L}}'}}{{\partial {r_{X}}}}} \\ {\frac{{\partial {r_{R}}'}}{{\partial {r_{L}}}}}&{\frac{{\partial {r_{X}}'}}{{\partial {r_{X}}}}} \end{array}} \right] = \left[ {\begin{array}{cc} 1&0 \\ 0&1 \end{array}} \right] $$

1.3 calculating the green ratio for big pulley

Two new node times are proposed in Big Pulley. One is the root time (Eq. (28a)), the other is the node time of the child node of the root. It can be either children of the root, i.e. son and dau. So tC is used to denote the node time proposed, as is seen in Eq. (28b). In addition, the distances are adjusted by the method Exchange (M, N), dependent on which nodes are chosen. As a result, the four rates are proposed, as is shown in Eq. (28c) Eq. (28f)

$$ {t_{X}}' = {t_{X}} + a $$
$$ {t_{C}}' = {t_{C}} + {a_{1,2,3}} $$
$$ {r_{C}}' = \frac{{{r_{C}} \times (t{}_{X} - {t_{C}}) + b}}{{t{}_{X}' - {t_{C}}'}} $$
$$ {r_{S}}' = \frac{{{r_{2}} \times (t{}_{C} - {t_{S}})}}{{t{}_{C}' - {t_{S}}}} $$
$$ {r_{M}}' = \frac{{{r_{M}} \times ({t_{C}} - {t_{M}}) - [{r_{C}} \times ({t_{X}} - {t_{C}}) + b]}}{{{t_{X}}' - {t_{M}}}} $$
$$ {r_{N}}' = \frac{{{r_{C}} \times ({t_{X}} - {t_{C}}) + {r_{N}} \times ({t_{X}} - {t_{N}})}}{{{t_{C}}' - {t_{N}}}} $$

where a1,2,3 is the random number to propose a new node time for the child node of the root. Depending on which child node is selected, the notation is different, i.e. a1,a2,a3. Here, to make it a general case, ax is used.

Fig. 13
figure 13

The illustration of sampling from prior. g1 is set to be the original tree where an MCMC chain starts. When testing Big Pulley, the proposed operator samples the trees among g1,g2 and g3

Therefore, the Jacobian matrix J4 for the six parameters in Eq. (28) is obtained by Eq. (29). And the determinant of J4 is calculated shown in Eq. (30).

$$ {}\begin{aligned} {{\mathbf{J}}_{4}} &= \left[ {\begin{array}{cccccc} {\frac{{\partial {t_{X}}'}}{{\partial {t_{X}}}}}&{\frac{{\partial {t_{X}}'}}{{\partial {t_{C}}}}}&{\frac{{\partial {t_{X}}'}}{{\partial {r_{C}}}}}&{\frac{{\partial {t_{X}}'}}{{\partial {r_{S}}}}}&{\frac{{\partial {t_{X}}'}}{{\partial {r_{M}}}}}&{\frac{{\partial {t_{X}}'}}{{\partial {r_{N2}}}}} \\ {\frac{{\partial {t_{C}}'}}{{\partial {t_{X}}}}}&{\frac{{\partial {t_{C}}'}}{{\partial {t_{C}}}}}&{\frac{{\partial {t_{C}}'}}{{\partial {r_{C}}}}}&{\frac{{\partial {t_{C}}'}}{{\partial {r_{S}}}}}&{\frac{{\partial {t_{C}}'}}{{\partial {r_{M}}}}}&{\frac{{\partial {t_{C}}'}}{{\partial {r_{N2}}}}} \\ {\frac{{\partial {r_{C}}'}}{{\partial {t_{X}}}}}&{\frac{{\partial {r_{C}}'}}{{\partial {t_{C}}}}}&{\frac{{\partial {r_{C}}'}}{{\partial {r_{C}}}}}&{\frac{{\partial {r_{C}}'}}{{\partial {r_{S}}}}}&{\frac{{\partial {r_{C}}'}}{{\partial {r_{M}}}}}&{\frac{{\partial {r_{C}}'}}{{\partial {r_{N2}}}}} \\ {\frac{{\partial {r_{S}}'}}{{\partial {t_{X}}}}}&{\frac{{\partial {r_{S}}'}}{{\partial {t_{C}}}}}&{\frac{{\partial {r_{S}}'}}{{\partial {r_{C}}}}}&{\frac{{\partial {r_{S}}'}}{{\partial {r_{S}}}}}&{\frac{{\partial {r_{S}}'}}{{\partial {r_{M}}}}}&{\frac{{\partial {r_{S}}'}}{{\partial {r_{N2}}}}} \\ {\frac{{\partial {r_{M}}'}}{{\partial {t_{X}}}}}&{\frac{{\partial {r_{M}}'}}{{\partial {t_{C}}}}}&{\frac{{\partial {r_{M}}'}}{{\partial {r_{C}}}}}&{\frac{{\partial {r_{M}}'}}{{\partial {r_{S}}}}}&{\frac{{\partial {r_{M}}'}}{{\partial {r_{M}}}}}&{\frac{{\partial {r_{M}}'}}{{\partial {r_{N}}}}} \\ {\frac{{\partial {t_{N}}'}}{{\partial {t_{X}}}}}&{\frac{{\partial {t_{N}}'}}{{\partial {t_{C}}}}}&{\frac{{\partial {t_{N}}'}}{{\partial {r_{C}}}}}&{\frac{{\partial {t_{N}}'}}{{\partial {r_{S}}}}}&{\frac{{\partial {t_{N}}'}}{{\partial {r_{N}}}}}&{\frac{{\partial {t_{N}}'}}{{\partial {r_{N}}}}} \end{array}} \right] \\&= \left[ {\begin{array}{cccccc} 1&0&0&0&0&0 \\ 0&1&0&0&0&0 \\ {\frac{{{r_{C}}}}{{{t_{X}}' - {t_{C}}'}}}&{\frac{{ - {r_{C}}}}{{{t_{X}}' - {t_{C}}'}}}&{\frac{{{t_{X}}' - {t_{C}}}}{{{t_{X}}' - {t_{C}}'}}}&0&0&0 \\ 0&{\frac{{{r_{S}}}}{{t' - {t_{S}}}}}&0&{\frac{{{t_{C}} - {t_{S}}}}{{{t_{C}}' - {t_{S}}}}}&0&0 \\ {\frac{{ - {r_{C}}}}{{{t_{X}}' - {t_{M}}}}}&{\frac{{{r_{N1}} + {r_{C}}}}{{{t_{X}}' - {t_{M}}}}}&{\frac{{ - ({t_{X}} - {t_{C}})}}{{{t_{X}}' - {t_{M}}}}}&0&{\frac{{{t_{C}} - {t_{M}}}}{{{t_{X}}' - {t_{M}}}}}&0 \\ {\frac{{{r_{C}} + {r_{S}}}}{{{t_{C}}' - {t_{N}}}}}&{\frac{{ - ({r_{C}} + {r_{S}})}}{{{t_{C}}' - {t_{N}}}}}&{\frac{{{t_{X}} - {t_{C}}}}{{{t_{C}}' - {t_{N}}}}}&0&0&{\frac{{{t_{X}} - {t_{N}}}}{{{t_{C}}' - {t_{N}}}}} \end{array}} \right] \end{aligned} $$
$$ \left| {{{\mathbf{J}}_{4}}} \right| = \frac{{{t_{X}}' - {t_{C}}}}{{{t_{X}}' - {t_{C}}'}} \times \frac{{{t_{C}} - {t_{S}}}}{{{t_{C}}' - {t_{S}}}} \times \frac{{{t_{C}} - {t_{M}}}}{{{t_{X}}' - {t_{M}}}} \times \frac{{{t_{X}} - {t_{N}}}}{{{t_{C}}' - {t_{N}}}} $$

Last but not least, due to the change of tree topology in Exchange (M, N), the probability of the proposed tree going back to the original tree p(g|g), as well as the probability of making the proposal p(g|g), should be considered. As the ratio of p(g|g)/p(g|g) is defined as μ, the calculation of μ is detailed in the following algorithm.

2. sampling from the prior

In this section, we aim to validate the correctness of the proposed operators. To be more specific, we firstly run the simulations by sampling from prior distributions in BEAST2. Since the prior distributions are deterministic, we can analytically calculate the theoretical joint-distributions of sampled parameters in MCMC chains. By comparing the sampled distributions with the analytical results, we demonstrate whether the proposed operators are able to sample parameters correctly.

In Fig. 13, a tree with three taxa A, B and C (plus one internal node D, and root E) is used as a small example in the experiments in this section. In the figure, g1 is set as the initial tree. Firstly, a LogNormal distribution is used as the rate prior in the uncorrelated relaxed clock model, given by Eq. (31).

$$ {}r = \{{r_{A}}\quad{r_{B}}\quad{r_{C}}\quad{r_{D}}\} \sim LogNormal(m = -3, s = 0.25) $$

In addition, a Coalescent model [47] with constant population size (N=0.3) is used to describe the tree prior. Hence, for the tree in Fig. 13, the probability of node times is calculated by Eq. (32).

$$ p(t=\{{t_{E}},{t_{D}}\}) = (\frac{1}{N} \times {e^{- \frac{1}{N}({t_{E}} - {t_{D}})}}) \times (\frac{1}{N} \times {e^{- \frac{3}{N}{t_{D}}}}) $$
Table 3 Initial settings for testing operations on internal nodes
Table 4 Results of sampling the internal node

After the priors are specified, the distribution to sample can be exactly known, since the samples are drawn from the prior distributions. In other words, as the rates are functions of its genetic distance and times, the joint distribution to sample can be represented by Eq. (33).

$$ \begin{aligned} p(r,t) &= p({t_{E}},{t_{D}}) \times p({r_{D}}) \times p({r_{A}}) \times p({r_{B}}) \times p({r_{C}}) \\&= p({t_{E}},{t_{D}}) \times p\left(\frac{{{d_{D}}}}{{t_{E}} - {t_{D}}}\right) \times p\left(\frac{{{d_{A}}}}{{t_{D}} - {t_{A}}}\right)\\& \quad\times p\left(\frac{{{d_{B}}}}{{t_{D}} - {t_{B}}}\right) \times p\left(\frac{{{d_{C}}}}{{t_{E}} - {t_{C}}}\right)\text{,} \end{aligned} $$

where p(.) is the probability of certain rate values in the LogNormal distribution. Therefore, the whole probability can be obtained by conducting numerical integration on Eq. (33), which shows the probability distribution over all the possible values of parameters.

2.1 test the operator on internal nodes

The genetic distances, node times and rates for g1 in Fig. 13 are given in Table 3. To test roundly, two scenarios are designed. In each scenario, the genetic distances are fixed, the node time tD starts from the initial value and will be changed by the proposed operator during the sampling process. Essentially, the proposed operator makes node D move between node A and E. Besides, to make sure that the result is robust, two different MCMC chain lengths are performed in each scenario, i.e. 10 million and 20 million.

The mean, mean error and the standard deviation of the MCMC samples are summarised in Table 4. Besides, according to Eq. (33), the actual joint distribution is obtained by using Eq. (34), and is used to evaluate the results, which is also included in Table 4. Moreover, the histograms of MCMC samples that indicate the sampled distributions, as well as the curves of the numerical integration of Eq. (34), are shown in Fig. 14. From Table 4 and Fig. 14, it can be seen that the red curves well fit the black histograms, and the mean values and standard deviations are consistent, which makes it safe to conclude that the proposed operator samples the internal node correctly.

$$ \begin{aligned} p(r,t) &= \int_{{t_{D}} = 0}^{{t_{E}}} {p({t_{E}},{t_{D}})} \times p\left(\frac{d_{A}}{t_{D}}\right) \times p\left(\frac{d_{B}}{t_{D}}\right)\\ & \times p\left(\frac{d_{D}}{{t_{E}} - {t_{D}}}\right) \times p\left(\frac{d_{C}}{t_{E}}\right){d_{t_{D}}} \end{aligned} $$
Fig. 14
figure 14

Sampled parameters in tests of internal nodes. The horizontal axis represents the node time of D in Fig. 13. The two scenarios sample two trees with different distances specified in Table ??

2.2 test the operator on root

Still starting from g1 in Fig. 13, the initial settings for testing the root are given in Table 5. And the three strategies are tested separately in the following parts.

Table 5 Initial settings for operations on the root
2.2.1 Using Simple Distance

The root time tE is sampled by Simple Distance, which ranges from 1 to positive infinity theoretically. Namely, all the genetic distances and the node time tD are fixed. Similar to Eq. (34), the joint distribution of tE and rates to sample can be obtained by Eq. (35).

$$ {}\begin{aligned} p(r,t) &= \int_{{t_{E}} = 1}^{+ \infty} {p({t_{E}},{t_{D}})} \times p\left(\frac{{{d_{A}}}}{{{t_{D}}}}\right) \times p\left(\frac{{{d_{B}}}}{{{t_{D}}}}\right)\\ & \times p\left(\frac{{{d_{D}}}}{{{t_{E}} - {t_{D}}}}\right) \times p\left(\frac{{{d_{C}}}}{{{t_{E}}}}\right){d_{t_{E}}} \end{aligned} $$

The results are given in Table 6 and Fig. 15a. As can be seen, the mean and the standard deviation of MCMC samples and numerical integration are close to each other, which confirms that the two distribution are the same. Thus, Simple Distance samples the root time and two branch rates correctly.

Table 6 Results of sampling the root
Fig. 15
figure 15

Sampled parameters in test of the root. For the trees in Fig. 13, Simple Distance samples the root time tE only, Small Pulley samples the distance dD only, and Big Pulley samples tE,tD,dD. To make it simple, tE and dD are compared

2.2.2 Using Small Pulley

Although both dx and di are changed during the sampling process when using Small Pulley, the sum of dD and dC are kept 0.67 in this test, as the initial setting shown in Table 5. To make it simple, only dD is compared.

Then, based on Eq. (33), the exact distribution of di can be obtained by Eq. (36), which is compared with the sampled distribution in Table 6 and Fig. 15b. Even though there exist some errors, the sampled parameters can be considered to follow the same distribution. So the Small Pulley is also able to provide correct samples.

$$ \begin{aligned} p(r,t) &= \int_{{d_{D}} = 1}^{0.67} {p({t_{E}},{t_{D}})} \times p\left(\frac{{{d_{A}}}}{{{t_{D}}}}\right) \times p\left(\frac{{{d_{B}}}}{{{t_{D}}}}\right) \times p\left(\frac{{{d_{D}}}}{{{t_{E}} - {t_{D}}}}\right)\\ & \times p\left(\frac{{0.67 - {d_{D}}}}{{{t_{E}}}}\right){d_{d_{D}}} \end{aligned} $$
2.2.3 Using Big Pulley

For g1 in Fig. 13, a new tree, together with the root time tE and node time of its older child tD, as well as a genetic distance di, is proposed by Big Pulley. In this case, the initial tree g1 will either go to g2 or g3, as is shown in Fig. 13. So the samples are repeatedly drawn from the 3 trees. Besides, according to the initial settings in Table 5, the genetic distances remain unchanged during the process, i.e. dAB=1,dAC=1 and dBC=1 hold. Hence, the distribution we are about to achieve can be calculated by Eq. (37).

$$ \begin{aligned} p(r,t) &= \int_{{t_{E}} = 0}^{+ \infty} {\int_{{t_{D}} = 0}^{{t_{E}}} {\int_{{d_{D}} = 0}^{0.5} {p({t_{E}},{t_{D}})}} \times p\left(\frac{{0.5}}{{{t_{D}}}}\right)} \\&\times p\left(\frac{{0.5}}{{{t_{D}}}}\right) \times p\left(\frac{{{d_{D}}}}{{{t_{E}} - {t_{D}}}}\right) \times p\left(\frac{{0.5 - {d_{D}}}}{{{t_{E}}}}\right){d_{d_{D}}}{d_{t_{D}}}{d_{t_{E}}} \end{aligned} $$

The statistical measurements, i.e. mean and standard deviation, are compared in Table 6. The histograms of samples and numerical curves of dD and tE are pictured in Fig. 15c and d. It is shown that the two distributions are consistent within the acceptable error range. Therefore, Big Pulley can also give the right combinations of rates and node times, under the condition that the genetic distances among taxa are constant.

3. performance analysis of operators

This section provides the details of the results presented in Performance comparison section.

3.1 Operator weights

The weights on operators for the simulations when comparing efficiency are listed in Table 7. Although how to assign weights to achieve better performance is not studied in this paper, we maintain the percentage of weights on three operator class in Category and Cons configurations. But we modified some weights on the operators inside the same class, and we assigned different weights for different data sets.

Table 7 Operator weights in MCMC chains

3.2 Simulated data sets

We simulated two sets of sequence alignment on the same tree with 20 taxa that is shown in Fig. 16. We used HKY model as substitution model with κ=2.4751, and the base frequencies are π=(0.21930.22680.30070.2531). In the uncorrelated relaxed clock model, the standard deviation of the branch rates (Ucldstdev) is 0.1803. The models and prior distributions are the same as is described in Fig. 1.

Fig. 16
figure 16

The tree used to simulate sequence alignment. The taxa are denoted by t1 to t20. The divergence times are drawn near the node

Fig. 17
figure 17

Running time and ESS using Anolis data

3.3 Efficiency measured by ESS per hour

Since we compare the efficiency based on ESS per hour using two configurations, i.e. Category and Cons, the ratio of ESS per hour is calculated by a random simulation in the two configurations, as is shown in Fig. 3. Then Table 2 lists the average running time and ESS of particular parameters in the simulations using different data sets. Here, we present the detailed running time and ESS of the simulations, which can be seen in Figs. 17, 18, 19, 20, and 21. Overall, we conclude that the proposed operators are able to provide better performance, because the figures suggest that Cons configuration requires less running time and have larger ESS for most parameters in most simulations. Especially, for those poorly estimated parameters in Category configuration, the improvement is more obvious. For data sets such as primates and simulated data with 500 sites, the running time is slightly larger in Cons configuration, but the ESS are much larger, which makes it acceptable to reduce the MCMC chain length and get the same performance.

Fig. 18
figure 18

Running time and ESS using RSV2 data

Fig. 19
figure 19

Running time and ESS using HIV-1 data

Fig. 20
figure 20

Running time and ESS using primates data

Fig. 21
figure 21

Running time and ESS using simulated data

Fig. 22
figure 22

Efficiency comparison of proposals using Anolis data

Fig. 23
figure 23

Efficiency comparison of proposals using RSV2 data

Fig. 24
figure 24

Efficiency comparison of proposals using HIV-1 data

3.4 Efficiency measured by proposals

The operators introduced in the paper utilise a random walk proposal for the new node time, which draws a random number from a uniform distribution and moves the node uniformly on the branch. However, others proposals, such as a Bactrian proposal [48] and a Beta proposal [49], assign a specific distribution on the new node time so that it is more probable to move to a certain height on the branch, either far away from or close to its original position. This section applied Random walk proposal (the operators in this paper), Bactrian proposal and Beta proposal to the three data sets, and the results are compared to those using Category configuration.

The comparisons are shown in Figs. 22, 23, and 24. It is indicated that Beta proposal achieved worst performance in the three analysed data sets. The performance of the Constant Distance operator (Random walk) and Bactrian proposal achieved similar performance in RSV2 data set, while Bactrian proposal provided larger ESS per hour for most parameters in HIV-1 data set. Therefore, it still needs further investigation to demonstrate the effectiveness of different proposals when analysing various data sets. Our current implementation of the operators enables users to specify which proposal style will be used in Beast2 analysis.

4. ucldstdevScaleOperator: a scale operator on standard deviation

It should be noted that the proposed ConstantDistance operator parameterises branch rates as continuous random variables, instead of discrete rate categories as is used in current BEAST2 settings. In uncorrelated relaxed clock model, branch rates are assumed to have a lognormal prior distribution, where the real mean is fixed to 1 and the standard deviation (denoted by Ucldstdev) is usually sampled with a hyper prior such as gamma(α=0.5396,β=0.3819). When a new Ucldstdev is proposed in one state during MCMC sampling by normal operators, the probability of all rates change as well under the new log normal distribution. Therefore, the authors implemented a separate operator working on Ucldstdev, which is able to solve this problem properly.

The first step is to propose a new Ucldstdev by a scale operation, which multiplies current Ucldstdev by a random factor, as is shown in Eq. (38).

$$ Ucldstdev' = Ucldstdev \times {\text{scale}} $$

where \({\text {scale = Factor + }}\left [ \xi \times (\frac {1}{{{\text {Factor}}}}{\text {- Factor}}) \right ]\) and ξ is a random variable from a Uniform(0,1),Factor is a user-defined parameter to specify how bold the proposal is.

Fig. 25
figure 25

Efficiency comparison of clock standard deviation

Secondly, all the branch rates are proposed based on the new Ucldstdev, given the probability of original Ucldstdev, which is calculated using Eq. (39).

$$ r_{i}' = icd{f_{stdev'}}\left[ {cd{f_{stdev}}({r_{i}})} \right] $$

where the notations cdf(·) and icdf(·) represent the cumulative and inverse cumulative density function of log normal distribution. Because of the calculation of cdf(·) and icdf(·) for each branch rates, the "Cons" configuration requires more running time than "category", as is discusses in “Performance comparison” section. However, it is acceptable as ConstantDistance operator gives larger ESS.

Finally, it is important to return the corrected hastings ratio, since the proposal is associated with one random variable, Ucldstdev and (2n−1) branch rates. As is shown in Eq. (40), the ratio includes the scale operation and rates changing under the same probability.

$$ {{\mathbf{J}}_{Ucldstdev}} = \frac{1}{{scale}} \times \prod\limits_{i = 1}^{2n - 1} {\frac{{\partial icd{f_{Ucldstdev'}}[cd{f_{Ucldstdev}}({r_{i}})]}}{{\partial {r_{i}}}}} $$

In the comparison of ESS for the clock standard deviation (denoted by ucld.stdev in Fig. 3), we specified a normal scale operator in “Category" configuration. In “Cons" configuration, the UcldstdevScaleOperator is used to sample the clock standard deviation of continuous rates. To avoid the concern that the difference between “Category" and “Cons" is a result of how rates are parameterised (i.e. discrete or continuous), we set another configuration where continuous rates are sampled without using the ConstantDistance operator (denoted by “NoCons" configuration). The weights of the operators in “NoCons" are the same as those in “Category" which is detailed in Table 7. We ran the analysis using the three real data sets (Anolis, RSV2 and HIV-1) and the comparison of ESS per hour between “Category”, “Cons" and “NoCons” is summarised in Fig. 25. The figure shows ESS per hour in log10 space of ucld.stdev in 20 independent MCMC chains. As can be seen, “Cons” configuration gives similar performance, comparing with “Category”. This indicates UcldstdevScaleOperator works properly on continuous rates. Moreover, ESS per hour is much larger in “Cons" than in “NoCons”, where both continuous rates are sampled. Therefore, the proposed operators contribute to the improved performance. However, we noticed that the rate parameterisation does have some mixing issues in MCMC chains. In the future, we will further investigate how to parameterise branch rates to get better performance when using the proposed operators.

Availability of data and materials

The source code of the proposed operator and the data sets analysed during the current study are available in the Github repository (



Markov chain Monte Carlo


effective sample sizes


highest posterior density


Using the current operators in BEAST2 to sample discrete rate categories


Using the Constant Distance operator to sample continuous rates specified by an uncorrelated related clock model


  1. Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E. Equation of state calculations by fast computing machines. J Chem Phys. 1953; 21(6):1087–92.

    Article  CAS  Google Scholar 

  2. Hastings WK. Monte carlo sampling methods using markov chains and their applications. Biometrika. 1970; 57(1):97–109.

    Article  Google Scholar 

  3. Zuckerkandvl E, Pauling L. Evolutionary divergence and convergence in proteins. 1965:97–166.

  4. Yang Z, Rannala B. Bayesian phylogenetic inference using dna sequences: a markov chain monte carlo method. Mol Biolo Evol. 1997; 14(7):717–24.

    Article  CAS  Google Scholar 

  5. Rannala B, Yang Z. Bayes estimation of species divergence times and ancestral population sizes using dna sequences from multiple loci. Genetics. 2003; 164(4):1645–56.

    CAS  PubMed  PubMed Central  Google Scholar 

  6. Thorne JL, Kishino H, Painter IS. Estimating the rate of evolution of the rate of molecular evolution. Mol Biol Evol. 1998; 15(12):1647–57.

    Article  CAS  PubMed  Google Scholar 

  7. Drummond AJ, Ho SY, Phillips MJ, Rambaut A. Relaxed phylogenetics and dating with confidence. PLoS Biol. 2006; 4(5):88.

    Article  CAS  Google Scholar 

  8. Knapp M, Stöckler K, Havell D, Delsuc F, Sebastiani F, Lockhart PJ. Relaxed molecular clock provides evidence for long-distance dispersal of nothofagus (southern beech). PLoS Biol. 2005; 3(1):14.

    Article  CAS  Google Scholar 

  9. Smith SA, Beaulieu JM, Donoghue MJ. An uncorrelated relaxed-clock analysis suggests an earlier origin for flowering plants. Proc Natl Acad Sci. 2010; 107(13):5897–902.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Ho SY, Phillips MJ, Drummond AJ, Cooper A. Accuracy of rate estimation using relaxed-clock models with a critical focus on the early metazoan radiation. Mol Biol Evol. 2005; 22(5):1355–63.

    Article  CAS  PubMed  Google Scholar 

  11. Lepage T, Bryant D, Philippe H, Lartillot N. A general comparison of relaxed molecular clock models. Mol Biol Evol. 2007; 24(12):2669–80.

    Article  CAS  PubMed  Google Scholar 

  12. Kobert K, Stamatakis A, Flouri T. Efficient detection of repeating sites to accelerate phylogenetic likelihood calculations. Syst Biol. 2017; 66(2):205–17.

    CAS  PubMed  Google Scholar 

  13. Guindon S. Bayesian estimation of divergence times from large sequence alignments. Mol Biol Evol. 2010; 27(8):1768–81.

    Article  CAS  PubMed  Google Scholar 

  14. Reis Md, Yang Z. Approximate likelihood calculation on a phylogeny for bayesian estimation of divergence times. Mol Biol Evol. 2011; 28(7):2161–72.

    Article  PubMed  CAS  Google Scholar 

  15. Ayres DL, Darling A, Zwickl DJ, Beerli P, Holder MT, Lewis PO, Huelsenbeck JP, Ronquist F, Swofford DL, Cummings MP, et al.Beagle: an application programming interface and high-performance computing library for statistical phylogenetics. Syst Biol. 2011; 61(1):170–3.

    Article  PubMed  PubMed Central  Google Scholar 

  16. Lakner C, Van Der Mark P, Huelsenbeck JP, Larget B, Ronquist F. Efficiency of markov chain monte carlo tree proposals in bayesian phylogenetics. Syst Biol. 2008; 57(1):86–103.

    Article  PubMed  Google Scholar 

  17. Höhna S, Drummond AJ. Guided tree topology proposals for bayesian phylogenetic inference. Syst Biol. 2012; 61(1):1–11.

    Article  PubMed  Google Scholar 

  18. Bouckaert R, Heled J, Kühnert D, Vaughan T, Wu C-H, Xie D, Suchard MA, Rambaut A, Drummond AJ. Beast 2: a software platform for bayesian evolutionary analysis. PLoS Comput Biol. 2014; 10(4):1003537.

    Article  CAS  Google Scholar 

  19. Drummond A, Nicholls G, Rodrigo A, Solomon W. Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data. Genetics. 2002; 161:1307–20.

    CAS  PubMed  PubMed Central  Google Scholar 

  20. Green PJ. Reversible jump markov chain monte carlo computation and bayesian model determination. Biometrika. 1995; 82(4):711–32.

    Article  Google Scholar 

  21. Suchard MA. Stochastic models for horizontal gene transfer: taking a random walk through tree space. Genetics. 2005; 170(1):419–31.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Higuchi T. Monte carlo filter using the genetic algorithm operators. J Stat Comput Simul. 1997; 59(1):1–23.

    Article  Google Scholar 

  23. Hohna S, Defoin-Platel M, Drummond AJ. Clock-constrained tree proposal operators in bayesian phylogenetic inference. In: 2008 8th IEEE International Conference on BioInformatics and BioEngineering. IEEE: 2008.

  24. Rambaut A, Drummond AJ, Xie D, Baele G, Suchard MA. Posterior Summarization in Bayesian Phylogenetics Using Tracer 1.7. Systematic Biology. 2018; 67(5):901–904.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Dawid AP. The well-calibrated bayesian. J Am Stat Assoc. 1982; 77(379):605–10.

    Article  Google Scholar 

  26. Hasegawa M, Kishino H, Yano T-a. Dating of the human-ape splitting by a molecular clock of mitochondrial dna. J Mol Evol. 1985; 22(2):160–74.

    Article  CAS  PubMed  Google Scholar 

  27. Finstermeier K, Zinner D, Brameier M, Meyer M, Kreuz E, Hofreiter M, Roos C. A mitogenomic phylogeny of living primates. PloS one. 2013; 8(7):69504.

    Article  CAS  Google Scholar 

  28. Jackman TR, Larson A, De Queiroz K, Losos JB. Phylogenetic relationships and tempo of early diversification in anolis lizards. Syst Biol. 1999; 48(2):254–85.

    Article  Google Scholar 

  29. Zlateva KT, Lemey P, Vandamme A-M, Van Ranst M. Molecular evolution and circulation patterns of human respiratory syncytial virus subgroup a: positively selected sites in the attachment g glycoprotein. J Virol. 2004; 78(9):4675–83.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Zlateva KT, Lemey P, Moës E, Vandamme A-M, Van Ranst M. Genetic variability and molecular evolution of the human respiratory syncytial virus subgroup b attachment g protein. J Virol. 2005; 79(14):9157–67.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Shankarappa R, Margolick JB, Gange SJ, Rodrigo AG, Upchurch D, Farzadegan H, Gupta P, Rinaldo CR, Learn GH, He X, et al.Consistent viral evolutionary changes associated with the progression of human immunodeficiency virus type 1 infection. J Virol. 1999; 73(12):10489–502.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. BEAST, 2 Data Sets. Accessed 13 Dec 2019.

  33. Cooper A, Lalueza-Fox C, Anderson S, Rambaut A, Austin J, Ward R. Complete mitochondrial genome sequences of two extinct moas clarify ratite evolution. Nature. 2001; 409(6821):704.

    Article  CAS  PubMed  Google Scholar 

  34. TreeStat, 2. Accessed 13 Dec 2019.

  35. Lanfear R, Calcott B, Ho SY, Guindon S. Partitionfinder: combined selection of partitioning schemes and substitution models for phylogenetic analyses. Mol Biol Evol. 2012; 29(6):1695–701.

    Article  CAS  PubMed  Google Scholar 

  36. To T-H, Jung M, Lycett S, Gascuel O. Fast dating using least-squares criteria and algorithms. Syst Biol. 2015; 65(1):82–97.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  37. Sagulenko P, Puller V, Neher RA. Treetime: Maximum-likelihood phylodynamic analysis. Virus Evol. 2018; 4(1):042.

    Article  Google Scholar 

  38. Sanderson MJ. r8s: inferring absolute rates of molecular evolution and divergence times in the absence of a molecular clock. Bioinformatics. 2003; 19(2):301–2.

    Article  CAS  PubMed  Google Scholar 

  39. PhyML, 3.0: New Algorithms, Methods and Utilities. Accessed 13 Dec 2019.

  40. Guindon S, Dufayard J-F, Lefort V, Anisimova M, Hordijk W, Gascuel O. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML3.0. Syst Biol. 2010; 59(3):307–21.

    Article  CAS  PubMed  Google Scholar 

  41. TreeTraceAnalysis.

  42. TreeAnnotator. Accessed 13 Dec 2019.

  43. Beal MJ. Variational Algorithms for Approximate Bayesian Inference. England: University of London; 2003, p. 281.

    Google Scholar 

  44. Zhang C, IV FAM. Variational bayesian phylogenetic inference. In: International Conference on Learning Representations: 2019. Accessed 13 Mar 2019.

  45. Dang T, Kishino H. Stochastic variational inference for bayesian phylogenetics: A case of cat model. Mol Biol Evol. 2019; 36(4):825–33.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Peskun PH. Optimum monte-carlo sampling using markov chains. Biometrika. 1973; 60(3):607–12.

    Article  Google Scholar 

  47. Pybus OG, Rambaut A. Genie: estimating demographic history from molecular phylogenies. Bioinformatics. 2002; 18(10):1404–5.

    Article  CAS  PubMed  Google Scholar 

  48. Yang Z, Rodríguez CE. Searching for efficient markov chain monte carlo proposal kernels. Proc Nat Acad Sci. 2013; 110(48):19307–12.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. RateAgeBetaShift. Accessed 18 Nov 2019.

Download references


Thanks are due to Jordan Douglas for assistance with the benchmarking experiments. The author(s) wish to acknowledge the use of New Zealand eScience Infrastructure (NeSI) high performance computing facilities, consulting support and/or training services as part of this research. New Zealand’s national facilities are provided by NeSI and funded jointly by NeSI’s collaborator institutions and through the Ministry of Business, Innovation & Employment’s Research Infrastructure programme. URL


The author(s) would like to acknowledge support from a Royal Society of New Zealand Marsden award (#UOA1611; 16-UOA-277). This work is also partially supported by scholarship from China Scholarship Council (No.201706990021).

Author information

Authors and Affiliations



RZ developed the operator and was a major contributor in writing the manuscript. AJD supervised the implementation of the operator and the writing process of the manuscript. Both authors read and approved the final manuscript.

Corresponding author

Correspondence to Rong Zhang.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, R., Drummond, A. Improving the performance of Bayesian phylogenetic inference under relaxed clock models. BMC Evol Biol 20, 54 (2020).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: