Highly conserved core components and species-specific regulators
To examine Hippo pathway evolution across distant metazoan phyla, we searched and compared orthologs of 16 genes in 24 organisms (Figure 2). Despite that Yorkie, the sole effector of Hippo signaling, was previously identified in the filastereans C. owczarzaki and the choanoflagellates M. brevicollis[21], we did not expect that all, or most, genes can be found in the most basal phyla due to the simplicity of their body plans. Surprisingly, most of the genes were identified in the simplest metazoan A queenslandica and T adhaerens. On the other hand, Yorkie was not found in H. magnipapillata, O. dioica, and I. scapularis in our search. Yorkie’s main transcriptional partner Scalloped was found in all 24 organisms. The absence of yorkie in a given species seems to be accompanied by the absence of multiple other components. For example, in O. dioica, orthologs of salvador, kibra, expanded, fat, dachs, lowfat, and four-jointed were also not identified. While the core components Mats, Hippo, and Warts are present in almost all species examined, certain upstream regulators and signaling mediators are absent in a considerable number of organisms. The co-existence and co-absence of Hippo components may suggest not only species- and clade-specific evolution but also primary and advanced functional modules. An important feature common to all Hippo pathway genes is the varied numbers of exons and highly conserved functional domains. This is especially apparent for fat, a key upstream regulator of Hippo signaling, and should allow genes to produce multiple proteins via alternative splicing to function in different tissues and organs for growth control.
Domains and exons of the Fat/Dachsous/Dachs family of upstream regulators
The genes fat and dachsous encode protocadherins required for controlling growth [17] and PCP in Drosophila (reviewed recently by [22, 23]). Experimental studies of Drosophila wing and eye growth revealed that Dachsous and Fat may act as a pair of ligand and receptor [24]. The interaction between Fat and Dachsous generates a tissue-level directional cue for planar cell polarization [25, 26], and this interaction is modulated by Four-jointed [27–29] and regulates the downstream protein Dachs. Polarized distribution of Dachs in the cell then mediates oriented cell division in Drosophila [13]. The function of Fat and Dachsous in controlling PCP is conserved in mammals [30], suggesting that dach should have mammalian orthologs to mediate polarized cell division.
We first examined the distribution of fat, dachsous, four-jointed, and dachs in species. As a pair of ligand and receptor, Fat and Dachsous co-exist in all examined metazoans except A. californica and O. dioica, whereas their modifier Four-jointed is absent in multiple organisms, including a Porifera and a Placozoa, three Cnidaria, and three Chromadorea (Figure 3; Additional file 1: Figure S1). This may indicate that Four-jointed joined the growth control and/or PCP mechanisms at a later stage and functions only in Bilateria. As reported, dachs encodes an unconventional myosin in Drosophila [12]. Despite being identified in the basal metazoa A. queenslandica and T. adhaerens, it is absent not only in two Cnidaria (H. magnipapillata and A. digitifera) and two Nematodes (A. suum and C. elegans) but also in most Chordates we examined (H. sapiens, X. tropicalis, O. dioica, and C. intestinalis). Dachs was previously not found in humans [12], but its absence in multiple phyla is somewhat unexpected, raising the questions of how polarized cell division is controlled without dachs in epithelial cells and whether dachs is dispensible for growth control. In mammals, the myosins closest to Drosophila Dachs are within the myosin V, VII, and X families, but Drosophila Dachs does not seem orthologous to any of these mammalian homologs [12].
We next examined the domains and exons of fat, dachsous, four-jointed, and dachs. Fat has a highly variable number of exons and the characteristic LamG and EGF domains. As cadherins Fat and Dachsous share several notable characteristics: their N-termini each contains many cadherin domains that facilitate heterophilic binding, they have a highly conserved transmembrane region, and their extracellular cadherin repeats (from 20–34 aa in length) appear to be more conserved than the intracellular cadherin domains (Figure 3). As a cadherin-domain kinase, Four-jointed contains a phosphorylation site (contained in the FAM20_C domain) important for the stimulation of Fat-Dachsous binding [29], but this site was not identified in the Porifera, Placozoa, Cnidaria, and Nematoda orthologs. Similar to Fat, Four-jointed has a variable number of exons, from a single exon in Arthropoda and Tetrapoda to many in other organisms (Additional file 1: Figure S1). All known myosins are comprised of an N-terminal head domain, a neck regulatory domain, and a specific C-terminal tail domain [31]. The N-terminal head domain contains a well-conserved ATP binding domain, an actin-binding domain, and an active thiol region; the neck domain has a single calmodulin-type IQ-like motif; and the tail domain is highly divergent (Additional file 1: Figure S2) [12]. Notably, in dachs these domains are encoded by significantly varied number of exons (Additional file 1: Figure S2). Compared with the 7 exons present in D. melanogaster and T. castaneum, dachs has more than 20 exons in certain other species, including A. queenslandica and T. adhaerens, with the number and order of functional domains highly conserved (note that the dachs orthologs in A. queenslandica, N. vectensis, B. floridae, and L. gigantea lack the long N-terminal extension and the coiled-coil domain). The common feature of conserved domains encoded by highly varied numbers of exons should allow fat, dachsous, four-jointed, and dachs to produce multiple transcripts for flexible tissue- and species-specific growth control.
The finally examined was the origin of fat and dachsous. BLASTP searches with the sequences of Drosophila fat and dachsous against the A. queenslandica genome both yielded the EC-containing protein XP_003386184.1 as the highest hit. To determine whether XP_003386184.1 is orthologous to fat or to dachsous, we built an phylogenetic tree for all fat and dachsous orthologs together with XP_003386184.1 and found XP_003386184.1 was not convincingly clustered with either group. We then performed a BLASTP search with the sequence of T. adhaerens dachsous against the A. queenslandica genome and obtained results suggesting that XP_003386184.1 in A. queenslandica is more likely orthologous to dachsous than to fat in T. adhaerens. To unveil the origin of A. queenslandica’s XP_003386184.1, we used XP_003386184.1 as a query to search (BLASTP) the genome of the choanoflagellate M. brevicollis and determined that XP_003386184.1 is similar to XP_001747521.1 and XP_001749260.1, two EC-rich proteins in M. brevicollis. Sequence comparisons revealed two findings - the many EC-repeats make A. queenslandica’s fat and dachsous orthologs highly similar to each other, and many EC-containing cadherins in A. queenslandica can be identified in M. brevicollis. Specifically, the 4900 ~ 6300 aa in the A. queenslandica’s dachsous is present in the two M. brevicollis proteins XP_001747521.1 and XP_001749260.1, but not in dachsous orthologs in any other organisms, and can be found in other genes in some metazoans (for example, the A, B, and C domains from 5114 to 6215 aa in A. queenslandica dachsous are detected as the main domains in the 2603 aa XP_002612362.1 in B. floridae). These findings suggest that dachsous may be originated in M. brevicollis and possibly underwent a reshuffle from A. queenslandica to eumetazoans. Our examination of the M. brevicollis genome revealed that it contains up to 23 distinct cadherin genes; among them, the 10056 aa MBCDH21 is the only cadherin with a combination of ECs, LamG, EGF, and transmembrane domains [32]. We found these domains are located at the N-terminus of MBCDH21 in M. brevicollis, whereas they fall near the C-terminus in metazoan fat orthologs. This discrepancy leaves the issue of whether fat originated from MBCDH21 unresolved.
Domains and exons of Yorkie and its downstream partners
As the sole effector of Hippo signaling, Yorkie was initially identified in Drosophila and has two homologs in vertebrates, YAP and TAZ [33]. Yorkie contains a TEAD-binding domain (TB domain) at the N-terminus and a transcriptional activation domain at the C-terminus, but lacks a DNA-binding domain (Figure 4). The TB domain enables Yorkie to recognize and bind to Scalloped (TEAD in vertebrates) to activate downstream genes, including vestigial, dE2F1, and scalloped in the Drosophila wing [14]. When the Hippo pathway is activated, Yorkie is phosphorylated by Warts, which promotes the association between Yorkie and 14-3-3 and cytoplasmic retention. Otherwise, Yorkie interacts with Scalloped to form functional heterodimeric transcription factors [34].
A previous analysis reported that yorkie appeared after the emergence of A. queenslandica and seems to be absent in the Nematodes C. elegans and C. briggsae[20]. However, in a recent study, yorkie orthologs were identified not only in the sponge A. queenslandica but also in the unicellular amoeboid C. owczarzaki[21]. As Yorkie is the sole downstream effector of Hippo signaling, we reasoned that it should be present in all metazoan phyla. Using human Yap and Drosophila Yorkie as queries, our BLAST search failed to detect yorkie orthologs in L. gigantea, A. digitifera, and A. californica, yet GeneWise and GenScan predicted its presence not only in the three organisms but also in A. queenslandica and C. elegans (Additional file 1: Table S1). In B. floridae, due to incomplete genome sequencing, the putative Yorkie ortholog we obtained is very short, 86 aa in length and unlikely representing its true sequence. Unexpectedly, BLAST search, GeneWise, and GenScan did not convincingly find Yorkie orthologs in H. magnipapillata, I. scapularis, and O. dioica (Figure 4).
We next compared the 20 Yorkie orthologs with the human YAP and Drosophila Yorkie sequences. As previously reported [20], we found Yorkie orthologs were more readily comparable to human Yap than to the Drosophila Yorkie in many cases (Additional file 1: Table S2 and Additional file 1: Table S3). For example, the N. vectensis Yorkie is more similar to human YAP (similarity = 47%) than to Drosophila Yorkie (similarity = 26.3%), supporting the conjecture that Yorkie sequences in arthropods are much more divergent and harbor many more changes as compared with YAP sequences in Chordates [20]. As reported [20, 21], we found that Yorkie contains one WW domain in A. queenslandica and T. adhaerens and two WW domains in all other phyla. Consisting of a TEF/TEAD-binding motif (TB domain) and a 14-3-3 binding motif, Yorkie’s N-terminus is highly conserved in metazoans, with the exception of absence in B. mori (Figure 4). Notably, probably in consequence, Scalloped’s BYD domain in B. mori is also very short. The C-terminus of Yorkie, which contains the coiled-coil domain and PDZ-binding motif, is less conserved, as the coiled-coil domain is absent in A. queenslandica, D. melanogaster, and C. elegans, and the PDZ-binding motif is absent in even more organisms (Figure 4A).
Orthologs of scalloped were identified in all 24 organisms and significantly varied in the number and length of exons (Figure 4B). We used Jpred, a secondary structure prediction server, with the domains of the human TEAD to analyze Scalloped/TEAD domains in metazoans. All Scalloped orthologs contain a DNA-binding domain (TEA domain, or transcriptional enhancer activator domain) at the N-terminus and a Yorkie-binding domain (YBD domain) at the C-terminus. An exception is the Scalloped ortholog in B. mori, which, probably due to the absence of yorkie in B. mori, contains a very short YBD domain following a GLY domain. Most TEA domains are between 300–450 aa in length, with the shortest comprising 172 aa in A. digitifera and the longest 458 aa in A. queenslandica. The YBD domains are between 100–200 aa, with the shortest containing 50 aa in B. mori and, very remarkably, the longest 252 aa in A. queenslandica. A BLAST search indicated that orthologs of scalloped are varied in length and divergent in sequence, but Jpred analysis revealed that the secondary structure of their YBD domains is highly conserved (Additional file 1: Figure S4).
Yorkie requires tissue-specific transcription factors to activate different downstream genes. In Drosophila wing Yorkie binds to Scalloped to regulate target genes, but in Drosophila eye Yorkie regulates target genes in conjunction with Homeothorax and Teashirt [15]. We searched for the homeothorax ortholog in the 24 organisms and found it exists in all organisms except I. scapularis, D. pulex, and B. mori (Additional file 1: Figure S3), and all these orthologs contain a homeobox domain for DNA binding. Similar to scalloped, homeothorax also varies in the number and length of exons in different organisms (Additional file 1: Figure S3), indicating that this may be a common feature of Yorkie’s partners.
Phylogenetic analysis of Hippo pathway genes
Since our analysis suggests that dachsous could undergo a reshuffle from A. queenslandica to eumetazoans, we examined whether it and other genes contain recombination breakpoints. In all of the 16 genes, no or few recombination breakpoints were detected by > = 4 programs in the RDP package (Figure 5) [35], but considerable ones were detected by > = 3 programs (Additional file 1: Figure S5), leaving the number and sites of recombination breakpoints not reliably determined. We chose seven genes (crumbs, dachsous, fat, hippo, merlin, mats, yorkie) that have fewer recombination breakpoints than others for phylogenetic analysis. According to ProtTest [36], the most appropriate substitution models for the merlin and mats datasets is LG + G, for the crumbs dataset is Blosum62 + G + I, and for the dachsous, fat, hippo, and yorkie’s datasets is VT + G + I, based on the Bayesian information criterion. Since the 24 species cover multiple distinct phyla, the sequences are quite divergent. To make phylogenetic analysis robust, we used two Bayesian packages - PhyloBayes and MrBayes, and two maximum likelihood packages - RAxML and PhyML, to build trees. PhyloBayes produced trees for all the seven genes based on the LG + G model, PhyML produced trees for all the seven genes based on their specific most appropriate substitution models, and MrBayes and RAxML produced trees for crumbs, dachsous, fat, hippo, and yorkie based on their specific most appropriate substitution models [37–40].
PhyloBayes, RAxML, and PhyML produced trees without or with mild polytomies (Figure 6; Additional file 1: Figure S6; Additional file 1: Figure S7), but MrBayes produced the trees of crumbs, dachsous, and fat with a serious polytomy at the root, indicating that basic phyla cannot be resolved. To examine if the serious polytomy could be explained by divergence of sequences across phyla, we used MEGA to compute the overall mean distance (p-distance) of the seven genes [41] and found that the sequences of crumbs, dachsous, and fat, which are upstream regulators, have high overall mean p-distances (0.669, 0.676, 0.672) compared with the sequences of hippo, merlin, and mats (0.186, 0.369, 0.166), which are in the core of the Hippo pathway. We also examined the impact of gaps on tree building by removing the shortest O. dioica and columns with < =6 amino acids in the sequences of crumbs, and found that MrBayes produced a much better tree of crumbs. Thus, divergent species, potential recombination breakpoints, and gaps probably together influenced MrBayes’ Bayesian inference.
The phylogenetic trees of these genes show certain features. First, trees produced by PhyloBayes and by RAxML are highly consistent (Figure 6; Additional file 1: Figure S6), indicating robustness of tree building. Second, in some trees nodes at or near the root have lower posterior probability or bootstrap values compared with nodes near the leaves, indicating relatively unconvinced phyla and subphyla determination. This is in line with the serious polytomy at the root in the MrBayes trees of crumbs, fat, and dachsous. Third, in many trees the Porifera A. queenslandica, the Placozoa T. adhaerens, and the Hydrozoa H. magnipapillata, are misplaced, indicating drastic evolution of the Hippo pathway in early metazoans. The above two features may be due to too few species covering too many phyla in our datasets, or indicate significant phylum- or class-specific evolution of Hippo genes. Fourth, in most trees species within many classes, subphyla, or phyla (such as Anthozoa, Gastropoda, and Secernentea) are correctly determined. Finally, the polytomy in the Bayesian tree of mats (Additional file 1: Figure S7) may be caused by very short (226 aa) and highly conserved (the overall mean p-distance = 0.166) sequences. These features unveil some important aspects of the evolution of the Hippo pathway in metazoans.
Evolution and Coevolution of genes in the Hippo pathway
As Hippo genes contain multiple functionally conserved domains, we examined their evolutionary conservation. When the random-site model in the PAML package was used to detect sites under positive and purifying selection [42], few or no sites in these genes were detected under positive selection at the 95% level (Additional file 1: Table S4) [43], but considerable sites in multiple genes were detected under positive selection at the 60-70% levels (the M2 model with naïve empirical Bayes analysis). We used the “Tests for alignment-wide evidence of selection” in the Datamonkey webserver to cross-check the data and obtained the same results. It appears that these results do not indicate significant false positively selected sites caused by recombination. We next examined whether genes at different positions in the Hippo pathway (upstream regulators, core Hippo complexes, the effector yorkie, and yorkie’s downstream partners) have been subjected to different selective forces. The values of ω = dN/dS indicate that hippo, mats, and warts, the core Hippo components at the center of the pathway, are most conserved (Figure 7). “Tests for alignment-wide evidence of selection” also confirmed that dachs, merlin, hippo, mats, and warts are the most conserved. In comparison, PCP-related upstream regulators and the transcriptional activators Scalloped and Homeothorax are more divergent (Figure 7). These results consist with the finding that sequences of Merlin, Mats, and Hippo have small, but sequences of Dachsous, Fat, and Yorkie have rather high, overall mean p-distances. The upstream regulators Dachsous and Fat and the downstream effector Yorkie, and its transcriptional partners, must have evolved to integrate multiple inputs and to activate different targets.
Physical interaction and functional association make proteins coevolved [44]. Typically, ligands and receptors should coevolve to maintain their physically interacting domains to fit for each other. One way to determine whether two genes or proteins have coevolved is to compute the pairwise distances between them across distant phyla and then the correlation between pairwise distances [45]. Using this distance matrix-based method, yorkie and scalloped were found to have coevolved [20]. A drawback of this method, however, is that it cannot detect which regions in the two proteins physically interact. To reveal coevolution of functional domains in Hippo pathway genes, we used the software CAPS to identify correlated pairwise amino acid variability for every pair of the 16 Hippo components [46]. Relatively high correlation coefficients (>0.6), if densely occurred at considerable sites, indicate potential coevolution of physically interacting or functionally associated regions in two proteins.
Upon identified functional domains, we first drew sites under positive selection (as mentioned above, at the 60-70% levels), neutral evolution and purifying selection. In the three core components Hippo, Warts and Mats, it is clear that all functional domains are under purifying selection and most regions under purifying selection are functional domains (Figure 8ABCD). This correspondence is in anticipation. Then, coevolutionary analysis reveals that many functional domains have coevolved with other functional domains (Figure 8EFGH), which allows one to infer whether a protein interacts with other proteins and how the interaction takes place. As an example, experimental studies indicate that Dachs coprecipitates and interacts with Warts [47], and our analysis reveals that Dachs may use its myosin head domain to bind to Warts’s protein kinase domain, because they share strong coevolution signals (Figure 8H). The combined use of the three analyses significantly helps identifying functional and interacting domains in Hippo genes.
Coevolution analysis also helps unveil why some domains have been lost in some species. For example, Dachs in some metazoans lost the coiled-coil domain and in some metazoans has a very short, or lost, IQ camodulin-binding domain. By evolutionary analysis we found that only half of the IQ camodulin-binding domain is under purifying selection, and by coevolutionary analysis we found that the coiled-coil domain poorly coevolves with any domain in Warts. These two findings provide a sensible explanation for the shortened or lost IQ camodulin-binding domain and the lost coiled-coil domain of Dachs in some species. Generally, applying evolutionary and coevolutionary analysis to identified functional domains revealed that most functional domains in Hippo genes have not only been under purifying selection but also coevolved with interacting functional domains in other genes. These results provide valuable clues for further examination of Hippo signaling.