### Iterative addition

Given a set Γ of taxa, let us define a *partial tree* as a *m*-leaf tree whose leaves are taxa of a subset Γ*'* ⊂ Γ, with *m* = |Γ*'*|. Moreover, given a partial tree, with node set *V* and edge set *E*, let us say that we *add/insert a leaf i* (not yet in Γ*'*) on the edge (*r*, *s*) ∈ *E* (i.e., the edge joining the nodes *r*, *s* ∈ *V*), and generate a new partial tree with node set \widehat{V} = *V* ∪ {*i*, *t*} and edge set \widehat{E} = *E* ∪ {(*r*, *t*), (*t*, *s*), (*t*, *i*)}\{(*r*, *s*)}. In other words, we add a leaf *i* on an edge, divide that edge with a new node *t*, and join the leaf *i* to *t*. All algorithms described here build complete phylogenetic trees by iteratively adding one leaf at a time on the edges of a partial tree.

### Primal bound

To generate a first upper bound [5] of the ME problem, we adapted the *Sequential Addition* (SA) greedy algorithm [6]. The Sequential Addition algorithm is less prone, than NJ, to generate a systematic local optimum at the end of the search (i.e., starting from "too good" a primal bound may lead to inefficient results [29]).

The pseudo-code of our version of the Sequential Addition algorithm is presented in Figure 1. In the initialization step, we arbitrarily chose a subset Γ*'* ⊆ Γ of *m* ≤ *n* leaves, and we generate as initial *m*-leaf partial tree, i.e., an optimal solution of the problem (1) when only *m* leaves are considered. A each iteration, we join the leaf *i* to all possible leaves already present in Γ*'*, and choose the solution that minimize tree length (we break possible ties randomly), hence, generating a new partial tree and new set Γ*'* = Γ*'* ∪ {*i*}. We iterate the procedure until a tree with *n* leaves is obtained. Finally, fixing the topology matrix {\widehat{X}}_{m}, we determine the optimal edge weights by imposing *f* (**D**, {\widehat{X}}_{m}, **v** = 0, and return the length of the tree, i.e., the upper bound on the optimal solution of the ME problem.

Unfortunately, the computation complexity of our heuristic is *O*((2*m* - 5)!! + *n*(*n - m*)^{2}). AT each iteration, given a partial tree, i.e., a *k*-leaf phylogenetic tree of the leaves in Γ*'* ⊂ Γ with *k* = |Γ*'*|, and a leaf *i* not in Γ*'*, the procedure generates all the different (*k* + 1)-leaf partial trees that can be obtained by adding the leaf *i* in each edge of the current partial tree.

### The ant colony optimization algorithm

The specific ACO algorithm for the minimum evolution problem (hereafter ACO-ME) that we introduce here (cf. pseudo-code in Figure 2), is a hybrid between the Max-Min Ant System (MMAS) [30, 31] and the Approximate Nondeterministic Tree Search (ANTS) [32]. Both methods are modifications of the original Ant System approach [33].

The core of the ACO-ME algorithm is the iteration phase, where the ants generate a set \mathcal{T} of trees. Then, starting from the trees in \mathcal{T}, a local search is performed until a locally optimal tree is found and compared with the current-best tree. If stopping conditions are met the procedure ends, otherwise the iteration phase is repeated.

Each ant builds a phylogenetic tree by iteratively adding a leaf at a time to a partial tree. Following a *relation-learning* model [34], the choices performed by an ant about (i) which leaf to insert, and (ii) where to add it on the partial tree are based of a set of parameters {*τ*_{
ij
}} called *pheromone trails*. The values of the pheromone trail parameters {*τ*_{
ij
}} represent a stochastic desirability that a leaf *i* shares a direct common ancestor with a vertex *j* on a partial tree. The ants generate a new set \mathcal{T} of trees and the pheromone trail parameters are updated at the end of the main iteration phase.

Let us now consider the algorithm in more details. It uses two identical data structures: *s** and *s*_{
k
}. The former stores the current-best complete *reconstruction* (solution) known, whereas the latter stores the best complete reconstruction obtained by the ants during iteration *k*. The algorithm also uses a variable *n*_{
a
}, i.e., the number of artificial ants. How to set the value of *n*_{
a
}is discussed in the Parameter settings section. In the initialization phase, *s** is first set to the reconstruction obtained by the Sequential Addition algorithm and *s*_{
k
}is set to null, then the pheromone trail parameters are updated. We implemented the MMAS [30, 31] method of pheromone update, where *τ*_{
min
}≤ *τ*_{
ij
}≤ *τ*_{
max
}. Here, we set *τ*_{
min
}and *τ*_{
max
}to 0.0001 and 0.9999, respectively [35]. In the initialization phase, the pheromone trail parameters {*τ*_{
ij
}} are set to 0.5, i.e., all positions for leaf insertion have the same desirability.

Before describing the iteration phase, let us introduce some definitions. Let {\mathcal{G}}_{k} be a partial tree with *k* leaves, *V*({\mathcal{G}}_{k}) the set of vertices of {\mathcal{G}}_{k}, and {\Gamma}_{{\mathcal{G}}_{k}} the set of leaves of {\mathcal{G}}_{k}. Let us also use the *recursive distance* definition of [23, 36]: if A and B are two non-intersecting subtrees from a tree \mathcal{G}, then the average distance between A and B is:

{\Delta}_{A|B}=\frac{1}{\left|A\right|\left|B\right|}{\displaystyle \sum _{i\in A,j\in B}{d}_{ij}}.

(5)

In the iteration phase, each artificial ant *r* generates a complete phylogenetic tree using the ConstructCompleteReconstruction(*r*) procedure, as illustrated in Figure 3: ant *r* randomly selects four leaves from the set Γ, and builds a partial tree {\mathcal{G}}_{k}, *k* = 4, then, ant *r* (i) chooses, among the leaves not yet inserted in the partial topology, the leaf *i* defining the smallest distance *d*_{
ij
}, *j* ∈ {\Gamma}_{{\mathcal{G}}_{k}}, and (ii) computes the probability that *i* has a common ancestor with the vertex *j* ∈ *V* ({\mathcal{T}}_{k}) using the formula suggested by ANTS [32]:

{p}_{ij}=\frac{\alpha {\tau}_{ij}+(1-\alpha ){\eta}_{ij}}{{\displaystyle {\sum}_{q\in \Gamma \backslash {\Gamma}_{{\mathcal{G}}_{k}}}[\alpha {\tau}_{qj}+(1-\alpha ){\eta}_{qj}]}}

(6)

where *η*_{
ij
}represents the "heuristic desirability" that leaf *i* shares a common ancestor with a vertex *j* of *V*({\mathcal{G}}_{k}) (whereas *τ*_{
ij
}represents the corresponding "stochastic desirability"). Finally, *α* ∈ [01] allows the relative weighting of heuristic and stochastic desirabilities. The heuristic desirability *η*_{
ij
}is computed as:

*η*_{
ij
}= (Δ_{
ij
}- *u*_{
i
}- *u*_{
j
})^{-1}

where {u}_{i}={\displaystyle {\sum}_{j\notin {V}_{{\mathcal{G}}_{k}}}{\Delta}_{ij}/(|{V}_{{\Gamma}_{{\mathcal{G}}_{k}}}|)}, i.e., the sum of the distances from i to the leaves not yet inserted in the partial tree divided the number of leaves inserted in the partial tree.

Note that *η*_{
ij
}, Δ_{
ij
}, and *u*_{
i
}correspond to the quantities used in the Neighbor-Joining algorithm[20, 21](see also[23]). Hence, computation of the vector **p**_{
i
}= {*p*_{
ij
}}, for all *i* ∈ Γ, can be interpreted as the stochastic application of the Neighbor-Joining algorithm. A possible problem (not observed yet in practice in our analyses) is that *η*_{
ij
}can take negative values. Finally, ant *r* randomly chooses a vertex *j* on the basis of the probabilities **p**_{
i
}, and the leaf *i* is added to the tree.

At the end of the construction phase, a set \mathcal{T} of trees is obtained and a 2-OPT local search (with best-improvement and without candidate list [37, 38]) is iteratively performed on each tree: two randomly-chosen leaves are swapped and the tree length is evaluated. Swap i is performed on the new tree if swap i-1 generated an improvement, otherwise it is performed on the old tree. To reduce the 2-OPT computational overhead, we perform no more than 10 swappings on each tree in \mathcal{T}. If the best tree generated by the 2-OPT local search is shorter than the tree in *s**, both *s** and *s*_{
k
}are updated, otherwise only *s*_{
k
}is updated.

The pheromone update completes the iteration phase: each entry *τ*_{
ij
}is updated following:

*τ*_{
ij
}← (1 - *ρ*)*τ*_{
ij
}+ *ε*_{
ij
}

where

{\epsilon}_{ij}=\{\begin{array}{ll}\kappa \rho /{l}_{{\mathcal{G}}_{\left|\Gamma \right|}},\hfill & \text{if}{w}_{i}\text{adjacentto}{w}_{j}\text{in}{s}^{best};\hfill \\ 0,\hfill & \text{otherwise}.\hfill \end{array}

(9)

where *κ* ∈ ℝ and *ρ*, the *pheromone evaporation rate*, are two tuning constants, *s*^{best}is one of the tree *s** or *s*_{
k
}(see below), and {l}_{{\mathcal{G}}_{\left|\Gamma \right|}} the length of *s*^{best}. When applying equation (8), if *τ*_{
ij
}is greater than *τ*_{
max
}or smaller than *τ*_{
min
}, then its value is set to *τ*_{
max
}or *τ*_{
min
}, respectively. We set to *ρ* 0.1, *κ* to *κρ* ∈ [10^{-2}, 10^{-1}], and *α* to 0.7. Fine-tuning of these parameters might have a significant impact on search efficiency but such a systematic analysis is out of the scope of a proof-of concept for the use of ACO-ME. Finally, if the objective function does not decrease after 30 iteration, ACO-ME chooses *s*_{
k
}as *s*^{best}instead of *s** for the pheromone updating; if the objective function does not decrease after 30 additional iterations, then all {*τ*_{
ij
}} are reset to 0.5 and *s** is used for pheromone updating.

### Parameter settings

We evaluated the performances of the ACO-ME algorithm under different values of the parameter *κ*(0.1, 0.5, and 1), and different numbers of ants (1 to 10). For each of the 30 possible combinations of these parameters values, we run ACO-ME for 1000 iterations. As suggested elsewhere (see [29]), we do not consider colony sizes larger than 10.

Relative performances are measured using a normalized index as in [39–41]:

{I}_{j}^{k}=\frac{{x}_{j}^{(x)}-{x}_{j}^{min}}{{x}_{j}^{max}-{x}_{j}^{min}}

(10)

where {x}_{j}^{(k)} is the best solution found under parameter value *k* using dataset *j*, whereas {x}_{j}^{min} and {x}_{j}^{max} are respectively the best and worst solutions found on the instance *j* using the parameter value *k*. By definition, performance index values are in the interval [0, 1]. The optimal parameter value exhibits the smallest relative performance index (see box-and-whisker plot histograms in Figure (4, 5, 6). Figures 4, 5, and 6 indicate that, for small, medium, and large datasets, the optimal combinations of number of ants/*κ* are 7/1, 10/0.5, and 8/0.5, respectively. However, differences of performances are not spectacular among different combinations of parameter values (except that performances are generally very low when a single at is used).

### Experimental evaluation

We first used a set of distance matrices generated from real datasets: the dataset "551314.nex" that includes 55 RBCL sequences of 1314 nucleotides each, and the dataset "Zilla500.nex" that includes 500 RBCL sequences of 1428 nucleotides each. These datasets are available at [42]. Note that sequences in these datasets were aligned using ClustalX [43] and columns including gaps were excluded before computing pairwise distances. Second, we generated (i) 10 artificial instances of 20 taxa (also called *small instances*); (ii) 10 artificial instances of 50 taxa (also called *medium instances*); and (iii) 10 artificial instances of 100 taxa (also called *large instances*). Each artificial instance was generated by random sampling of taxa and partial character reshuffling of the Zilla500.nex data set. More explicitly, after random selection of the 20 or 50 or 100 taxa, we randomly reshuffled characters among taxa, for 50 percents of the aligned columns. As the reshuffling makes the dataset prone to yield undefined pairwise distances [44], we simply used the absolute number of differences between sequence pairs for generating the distance matrix. Edge lengths were computed using the standard OLS because WLS and GLS can potentially lead to inconsistent results *et al*. [45]. All numerical experiments were performed on a workstation Apple 64-bit Power Mac G5 dual processor dual core, with 8 Gb of RAM, and OS X. The ACO-ME source code is written in C/C++ and compiled using IBM XL C/C++ compiler version 6.0. We compared the quality (total length) of trees generated by the ACO-ME algorithm to those obtained using a classical hill-climbing algorithm (implemented in PAUP* 4.0 [19]) after a fixed run time of 1 minute. The starting tree was generated using the Neighbor-Joining algorithm [20, 21], and the TBR branch-swapping operator [6] was used for exploring the solution space. PAUP* 4.0 was used with and without the "Steepest Descent" (SD) option. When SD is activated, all possible TBR are tried, and the rearrangement producing the largest decrease in tree length is selected, inducing a computational overhead similar to that of the 2-OPT local search implemented in our ACO-ME algorithm. Each algorithm was run 30 times on each of the two real datasets. Figure 7a and 7b show that ACO-ME performances are intermediate between hill-climbing with SD, and hill-climbing without SD. Furthermore, Figure 7a and 7b indicate that the relative performances of ACO-ME, in comparison to hill climbing, increase with larger datasets. Note that, contrary to our simple implementation of ACO-ME, the implementation of ME in PAUP* 4.0 [19] incorporates procedures [4, 23] that greatly speed-up the OLS (reaching a complexity *O*(*n*^{2})). We trust that implementation of these procedures in combination with further tuning of the ACO parameters (number of ants, relative weights of the heuristic information and stochastic pheromone parameters, etc) would lead to better performances of the ACO-ME algorithm. Figure 8a and 8b indicate that the relative performances described above are relatively stable trough time, especially for large data sets (at any time during the run, ACO-ME has similar performances than "hill-climbing without SD" and better performances than "hill-climbing with SD").