A tree T = (V, E) is a connected acyclic graph with vertex set V and edge set E. A vertex of degree one is called a leaf of T and the set of all leaves is called the leaf set of T. A rooted tree T = (V, E, ϱ) is a tree (V, E) that has exactly one distinguished vertex called the root, denoted ϱ. A rooted tree T has a natural ordering where v ≤ v', if v lies on the path from the root to v'. If v ≤ v', we say that v is an ancestor of v' and v' is a descendant of v. For any set of vertices V, a vertex v is called minimal with respect to V if for all v' in V, it holds that v ≤ v'. For any edge e, we use α (e) and β(e) to denote the source and target of e. A rooted phylogenetic X-tree is a pair (T, ν), where T = (V, E, ϱ) is a rooted tree and ν : X → V is a bijection from X to the leaf set of T. See [21] for more details.
Definition 1 Let X be a set of taxa. A rooted reticulate network N = N (V, E, ν) on X is a connected, directed acyclic graph with vertex set V, edge set E and vertex labeling ν : X → V, such that:
1. there exists precisely one distinguished vertex ϱ called the root;
2. every vertex v ∈ V is either a tree vertex, v ∈ V
T
, that has exactly one ancestor, or a reticulation vertex r ∈ V
R
that has exactly two ancestors;
3. every edge is either a tree edge leading to a vertex of indegree one or a reticulation edge leading to a vertex of indegree two; and
4. the set of leaves L (vertices with no descendants) consists only of tree vertices and is labeled by the set of taxa X, i.e. ν maps X bijectively onto L.
It follows from these definitions that each reticulation vertex (or reticulation, for short) r ∈ V
R
is contained in one or more cycles of the form C = (r, p(r), w1, e1, ..., ek-1, w
k
, q(r), r), with w
i
∈ V and e
i
∈ E\{p(r), q(r)} for all i. (Note that additionally, r can also be contained in one or more cycles that do not contain p(r) and q(r)). We say that two reticulations r, r' ∈ V
R
are dependent if a cycle that contains both r and r' exists.
In graph theory, a two-connected component of a graph G is any maximal subgraph G' with the property that any two vertices v and w of G' are connected by two paths p and p' that share no vertices except for v and w. For any reticulation vertex r, let prand qrdenote the two associated reticulation edges.
Furthermore, let and denote the two ancestors of r with respect to prand qr. The lowest single ancestor lsa(r) of a reticulation r is the minimum of all nodes in V that is connected to r by two paths p and p' that share no vertices except for lsa(r) and r.
Algorithm
One important approach to drawing trees is the equal angle algorithm which was developed by Meacham (see [22]). The equal angle algorithm guarantees that no two edges intersect. Our algorithm for visualizing recombination networks is based on a generalization of the equal angle algorithm. The algorithms adds an ordering step at each vertex, that chooses an optimal ordering of the descending edges, that minimizes the number of crossings between reticulations edges and other edges. It can easily be altered to be used with any drawing algorithm for trees. We will start out with a description of the equal angle algorithm and will then define some basic properties. Finally, we will give solutions to minimize crossing edges in a drawing of a reticulate networks, and the optimal placement of reticulation vertices.
The equal angle algorithm is a recursive algorithm that starts at an internal vertex of a tree. For each subtree connected to the starting vertex, we appoint an angle proportional to the share of leaves it contains. In the next step, we assign to each subtree a sector of the circle of the size of the angle appointed to it and draw the edge to the subtree in the middle of the sector. We place the sector of the subtree in a way that it is centered at the end of the branch and the branch is pointing at the bisector of the angle. We then recurse to the starting vertex of the subtree and assign each newly discovered subtree its proportional share of the angle. Each subtree is than placed in the sector of the starting vertex. The recursion is repeated until we have appointed angles to each branch of the tree. The only modifications for rooted trees are the explicit start point (the root of the tree) and the use of a fraction of the cycle. For a detailed description of the algorithm, see [22].
The rooted equal angle algorithm is not directly applicable to a reticulate network since for each reticulation, we have to decide which of the reticulation edges we want to use for the drawing algorithm and either choice may be suboptimal. The idea behind our approach is to use neither of them. The influence of a reticulation upon the graph structure is bounded by the reticulation and its lowest single ancestor, therefore we decided to define an auxiliary edge between those two vertices and to use the auxiliary edges for the layout of the graph. When the algorithm reaches a node each descending edge is checked for its status (being either a tree-edge, an auxiliary-edge or a reticulation-edge) and only tree- and auxiliary-edges are incorporated into the process.
Through these modifications to the rooted equal angle algorithm, it is possible to visualize reticulate networks, but these visualizations are not very satisfying. To obtain an improved method, we will address two key problems. The first problem is the crossing of reticulation edges: even though it can not always be avoided, the number of such events should be minimized. The second problem is that the auxiliary edges are artifical edges and their optimal edge length must be determined. In the following, we will show solutions to these two problems.
Minimizing crossing edges
An edge crossing another one is an undesirable event in drawing a graph. It is well known that solving this problem is, in general, computationally hard [23]. The equal angle algorithm ensures that we only have to deal with reticulation edges crossing other edges. Furthermore, the construction of the auxiliary edges implies that edges that can be crossed by the reticulation edges are descendent edges of the lowest single ancestor of the reticulation. The optimization starts at the root of the networks and optimizes the arrangement of the directly descending vertices. It then continues the optimization iteratively at each directly descending vertex in the order given and keeps going until it has optimized all placements.
Let be the set of tree vertices directly below a vertex v and let be the set of reticulation vertices connected to v by auxiliary edges. We say that a tree path p(v, v') from a vertex v to a vertex v' exists if v' is a descendant of v and every edge in p(v, v') is either a tree- or auxiliary-edge. Furthermore, we say that a reticulation r is easily reachable from a vertex v if a tree path p(v, ) exists. Finally, let R
v
be the set of all reticulations that are easily reachable from the vertex v.
The set R
v
can be divided into those reticulations r for which v = lsa(r), which we will again denote by ; v is a descendant of lsa(r), denoted by ; and v is an ancestor of lsa(r), denoted by . If v is the root, is empty. The set can be divided further. Since for a reticulation r in , the nodes directly below lsa(r) have been previously sorted, we can denote the set as containing those r in for which r is sorted less than the directly descending node of lsa(r) leading via a tree path to v.
The aim of our optimization is to find a linear arrangement of the vertices in such that the number of reticulation edges, in the subtrees of the vertices in , intersecting with tree edges is minimized. We define the optimal linear arrangement graph OLAv(V, E) of a vertex v as one that contains a vertex representative for any vertex in . We add a weighted edge between any two vertices (v
i
, v
k
) in V and set the weight w
ik
of the edge to . More formally written:
Problem 1
With
minimize
The optimal linear arrangement problem is known to be hard [24]. Nevertheless, this arrangement problem is in general much smaller than the complexity of minimizing all crossing edges at once. Interestingly, a couple of additional restrictions exist that we may apply to the ordering, leading to a "greedy" solution that works well in most cases. One restriction that we can place upon the structure is that for any reticulation r, the position in the ordering should be between and . Consequently, we should place and before we place r.
Another restriction we can place is a consequence from the dependency of the reticulations upon each other. For any pair of reticulation r, r' in we say that r is less than r' if and only if a tree path p(r, ) exists. To meet the first restriction we have to place r before we can place r'. The graph that can be constructed from the relations between the reticulations must be cycle free, since the reticulation network is cycle free. Consequently, we can use a standard topological sorting algorithm to obtain a linear ordering Ord
l
() for the reticulations in .
The optimization algorithm iterates through the ordering and at each reticulation r it first places and , if necessary, and then r. If all reticulation are placed, the algorithm places all descending tree edges that have not yet been placed. At each placement, the algorithm positions the vertex at the position that minimizes the score given in Problem 1. After all nodes have been placed in the linear arrangement, the result is returned to the main method. An Example of the optimization procedure can be seen in Figure 1.
Optimal placement for reticulation vertices
Having calculated the angle and optimal arrangement for each edge, we have to place the vertices. Tree vertices can be placed in the same way as in the standard equal angle algorithm. But since auxiliary edges do not come with a given length, we have to calculate an optimal placement for each of the reticulation vertices. Such a placement has to incorporate the conditions of the equal angle algorithm, otherwise we might face unnecessary crossings between edges. Note that there are two cases for which we have to consider different placement methods. In the first case, we have a reticulation r where the nodes and are both different from lsa(r), and in the second case, one of them is equal to lsa(r).
In both cases, we place the reticulation vertex r on the bisector of the sector assigned to its auxiliary edge. In the first case, the distance between r and lsa(r) should be larger than the minimum distance between lsa(r) and the line l(v
p
, v
q
), indicating that r is a descendant of v
p
and v
q
. In other words, we assume the angles v
q
v
p
r and v
p
v
q
r are positive. In the second case, we assume that v
q
is equal to lsa(r). We first calculate the point on the bisector r
t
that has the same distance to lsa(r) as v
p
and than ensure that the angle between r
t
v
p
r is positive. We added an option to the algorithm so that the user can specify the (maximum) value of this angle; the standard value is 15°. An example of the drawing algorithm can be seen in Figure 2.