Trees are the central object handled by CompPhy. They are contained in tree collections, investigated in projects, each of which can be accessed by a declared list of users (possibly by anyone knowing the URL if the project is declared to be public). When working on a project, CompPhy’s interface can be divided in four main parts, see Figure 2.
Zone 1 contains the site menu, enabling users to navigate between their projects, to edit their account details or to access the onsite manual. Zone 2 first displays the project menu and the collaborative box (on the right-hand side) enabling users to coordinate their actions when jointly visualizing a project. For instance, this box allows them to indicate which person is in charge of a current edit. Below, it displays short captions of the project trees, organized in two collections (e.g., to separate gene from species trees, or host from parasite trees). Trees can be reordered within each collection and dragged to Zone 3 to be displayed in full size. Zone 3 consists of two workbenches allowing users to display two trees side-by-side when focusing on their comparison. Operations can be performed on each tree individually or on both trees jointly: (i) tools on a workbench allow to investigate the tree it contains by zooming, resizing, flipping or translating its image, or by swapping chosen subtrees (other tools are available in Zone 4, see the paragraph below on tree edition); (ii) Zone 3 also provides pairwise comparison tools that consider the two trees displayed together on the workbenches: coordinated swap of their tips, computation of their topological distance or highlighting of their topological agreement and disagreement. At the bottom of the interface, Zone 4 contains tools that may apply to more than two trees, and tools to manage other data associated with the project (see specific paragraphs below).
Our system limits impose that you upload no more than 10,000 trees covering at most 5,000 different taxa inside a same project. Please note that we also limit the number of trees during an import to 1,000 per collection. However, CompPhy’s main focus is on collections of several dozen to a few hundred trees. Above this limit, you might not find it too convenient to use. CompPhy can easily handle trees containing more than 1,000 taxa. Above this limit you can still use it, but be aware that pictures will take longer times to load and to be rendered by your browser (they are in SVG format, which requires some computation time from the browser).
Collaborative work
CompPhy allows a group of users to jointly work on a project. This work can usually be done in a number of working sessions to which a variable number of persons will participate. CompPhy thus proposes synchronization tools for multi-user sessions but also asynchronous tools for communication between users present and absent for some sessions. For instance, a FORUM is associated with each project, where project members can exchange questions, agree on an analysis protocol or simply leave a summary of the member’s recent work for other members who were absent at the last working session.
To coordinate users during a joint working session, CompPhy ensures that at any moment only one of them performs project changes. All members connected to the project are offered a synchronized (shared) view of the trees and tools. The view refreshes itself regularly, reflecting the edits done by the person in control. Insisting that only one person is in control at any moment avoids concurrent edits and ensures that a project stays in a coherent state. A COLLABORATIVE BOX (right-hand part of Zone 2 in Figure 2) indicates which project members are currently online, who is currently in control of the interface, and allows other members to REQUEST THE CONTROL in turn. Each request can be accepted or declined by either the control holder or the administrator. The latter can also TAKE THE CONTROL over the project at any moment. An option also enables a user to detach their browser from the activity performed by the others (SYNCHRONIZE tool). In this case, they can change the trees displayed on the workbenches but they can not make any concrete changes in the project, as this would interfere with the actions of the user in control.
Data management
Data in CompPhy is organized around the project concept, that basically pools a set of analyzed trees and associated documents. Each project has an administrator who can invite other people to become members of the project. By default, projects are created with a private status, so only the project members can access the data after being identified by CompPhy. This policy guarantees data privacy while still allowing data sharing. In contrast, a public project can be accessed by any guest to which the URL is sent. Without opening an account on CompPhy, a guest can see the project trees and examine them on the workbenches. However, no guest can make changes to the trees or data of the project.
A TODO LIST reminds project members of the next tasks to be performed in the dataset analysis. Once performed, each task can be registered as an historical point in the project TIMELINE, thus keeping track of the main analysis steps. BACKUPS of the data can be built at these intermediate moments and later restored if needed.
Additional trees can be added to a project with an UPLOAD facility. Trees of the project along with their images can be downloaded one by one from the workbenches, or by collection. Extra files related to the project can also be shared between members (DOCUMENTS section), e.g., documents explaining how the trees were obtained or papers related to study. It usually helps to have all information relative to a project available in one place. Tree pictures with sufficient resolution for publication can be downloaded, as well as trees and other data stored in the project in case a user wants to work offline.
Tree edition tools
Various tree edition facilities ease the tree comparison process. As in most tree visualization programs, it is possible to color taxa or whole subtrees (COLORIZE TREES) but, importantly, this can be done here in a coordinated way for several trees together. It is possible to alter the interleaf scale and font size (DISPLAY OPTIONS) with which trees are displayed. Once again, this can be applied to a set of selected trees. Taxa names at the tips of the trees can be changed in one or several trees, and individual trees can be renamed at any time. Zoom in and out are available for trees on the workbenches to facilitate side by side comparison of trees containing different numbers of taxa.
The tree structure itself can be changed in an automated way via several operations. First, trees can be rerooted by defining a new outgroup (REROOT tool [30]). This operation can be done jointly for several trees, with several outgroup taxa being indicated in case the precise outgroup taxa differs between trees. Several outgroup levels can be indicated: when the most exterior group has no representative in some trees, a taxon from the next level is sought, and so on. The second way to alter a tree structure is by swapping branches of trees on the workbenches. This can be done on one tree (MANUAL SWAP) by selecting representative taxa of the two branches to swap, or in an automated way on the two trees in the workbenches (AUTO SWAP [31]) so that taxa appear as much as possible in the same order in both trees. Another tool enables users to RESTRICT TREES to their common taxa. This can help focus on their topological disagreement, which by definition can only derive from shared taxa arranged in different ways. In particular, gene trees obtained by phyloinformatic pipelines often have different sets of taxa, simply because genes can be lost in some species over the course of evolution.
Distance, consensus and supertree computation
A commonly used distance to measure the disagreement between two trees is that proposed by [19]. It counts the number of clades (or bipartitions) present in one tree but not in the other. CompPhy proposes to indicate this value for two trees on the workbenches by interfacing PHYLIP treedist program [24]. This distance originally applies to two trees with identical taxon sets, so when provided with trees on different taxon sets, CompPhy first restricts the compared trees to their common taxa, as often done in the field.
When comparing two trees with conflicting topologies, it is often useful to highlight in evidence the largest common structure they share. Users can thus drag two trees on the workbenches and ask CompPhy to compute their MAXIMUM AGREEMENT SUBTREE CONSENSUS. This consensus is defined as the subtree linking the largest set of taxa whose relative placement in the two trees is exactly the same (in rare cases where several of such sets exist, one is chosen at random). CompPhy thus first restricts the two trees to their common taxa, then it uses PAUP* [23] to compute the consensus of the two trees, and finally displays the two trees with taxa not belonging to the consensus being shaded in light grey, while taxa in the consensus are represented with their original color. Subtrees containing taxa present in just one tree are also in grey, so that the structure of the consensus tree is clearly apparent. This has a main advantage to highlight — inside each of the two compared trees — the topological part on which they agree.
The consensus feature is thus focused on the comparison of two trees. When dealing with more than two trees, supertrees offer advantages over consensus trees. For instance, supertree methods consider taxa not present in all compared trees, whereas consensus methods overlook these taxa. CompPhy gives access to two SUPERTREE COMPUTATION methods: PhySIC_IST [32] and Matrix Representation with Parsimony (MRP, [33],[34]). When computing the supertree by the PhySIC_IST method with default parameters, the degree of agreement of the input trees is translated in the resolution level of the obtained supertree: basically, a supertree containing only a few taxa and/or being poorly resolved indicates low agreement among the input trees. Changing the parameters of PhySIC_IST or resorting to the MRP method gives users an idea of the majority signal in case of substantial disagreement among the input trees (though the MRP supertree can sometimes contain artifacts representing topological signal absent from the source trees [25]). The MRP method is implemented via the Spruce library [27] to create the matrix representation of a set of source trees and via PAUP* to analyze the matrix with parsimony.