Skip to main content

Adaptive evolution of Hox-gene homeodomains after cluster duplications



Hox genes code for homeodomain-containing transcription factors that function in cell fate determination and embryonic development. Hox genes are arranged in clusters with up to 14 genes. This archetypical chordate cluster has duplicated several times in vertebrates, once at the origin of vertebrates and once at the origin of gnathostoms, an additional duplication event is associated with the origin of teleosts and the agnanths, suggesting that duplicated Hox cluster genes are involved in the genetic mechanisms behind the diversification of vertebrate body plans, and the origin of morphological novelties. Preservation of duplicate genes is promoted by functional divergence of paralogs, either by subfunction partitioning among paralogs or the acquisition of a novel function by one paralog. But for Hox genes the mechanisms of paralog divergence is unknown, leaving open the role of Hox gene duplication in morphological evolution.


Here, we use several complementary methods, including branch-specific d N /d S ratio tests, branch-site d N /d S ratio tests, clade level amino acid conservation/variation patterns, and relative rate ratio tests, to show that the homeodomain of Hox genes was under positive Darwinian selection after cluster duplications.


Our results suggest that positive selection acted on the homeodomain immediately after Hox clusters duplications. The location of sites under positive selection in the homeodomain suggests that they are involved in protein-protein interactions. These results further suggest that adaptive evolution actively contributed to Hox-gene homeodomain functions.


The homeobox codes for a highly conserved 60 amino acid DNA-binding motif (the homeodomain) found in transcription factors [1]. One class of homeobox-containing transcription factor genes are the Hox genes, which are homologous to the genes in the Drosophila homeotic (HOM) gene cluster, that specify cell fate during embryonic development [1] and have derived functions in other tissues [2]. Multiple Hox genes located in tightly linked clusters have been identified in all animal phyla examined, with the archetypical chordate cluster having 14 genes (Hox1–Hox14) [3]. The number of Hox clusters has increased several times in vertebrate evolution: the cluster duplicated twice in early vertebrates leading to four clusters (HoxA-D) with 42 genes [4, 5] and additional cluster duplications in teleost fish led to 7–8 clusters with 45–47 genes [6, 7]. Independent duplications have also occurred in the jawless vertebrates hagfish [8] and lamprey [9].

Models of duplicate gene preservation predict functional differentiation of paralogs based on protein sequence or regulatory divergence [10, 11]. Although numerous models of duplicate gene divergence have been proposed, four different mechanisms of functional divergence are likely to explain preservation of duplicate Hox genes: acquisition of novel functions by one paralog (neo-functionalization) [12], passive erosion of functional redundancy due to complementary degenerative mutations, (sub-functionalization) [11], models that predict the accumulation of neutral mutations, which later acquire functional constraints because the environment or genetic background changes (the Dykhuizen-Hartl effect) [13] or divergent adaptive selection of both paralogs (adaptive diversification) [14]. This list has recently been expanded by the introduction of the subneofunctionalization [15] and the adaptive radiation [16] models that predict rapid subfunctionalization after duplication followed by a prolonged period of neofunctionalization and adaptive divergence of duplicate genes in a process analogous to species radiations, respectively. Here, we are interested in testing whether positive selection acted immediately after cluster duplications to promote functional divergence and identify which mechanisms discussed above most adequately explain the preservation of Hox duplicates in vertebrates.

How paralogus Hox genes have been retained is not known, although evidence suggestive of positive selection after cluster duplication has been identified in Hox7 [17], Hox5 and Hox6 [18] paralogs. In these studies, however, it is not clear whether directional selection was responsible for the maintenance of the duplicated genes or other mechanisms promoted the maintenance of duplicates [19]. In addition, evidence for positive selection immediately after Hox cluster duplications has recently been identified in teleost fish for HoxA-11 and HoxB-5 [20]. These data suggest that, in the evolution of ray-finned fishes, some duplicate Hox genes have been preserved by functional differentiation through the action of positive Darwinian selection immediately following the gene duplication. This suggests that Hox genes may have also experienced adaptive evolution following the cluster duplications earlier in vertebrate evolution.

Hox cluster duplication and gene diversification has been proposed to be one of the genetic mechanisms behind the diversification of vertebrates and body plans and the origin of morphological novelties [2123]. This association, however, is difficult to reconcile with the perceived degree of sequence conservation between the homeodomains of Hox genes and the numerous examples of functional equivalence of Hox/Hom genes from strikingly divergent organisms [2428]. Mouse HoxA-5, for example, is able to activate the same target genes as its Drosophila homolog, Sex combs reduced (Src), in axis determination indicating strong conservation of function over 500 to 600 million years [29], but counter examples also exist, showing functional non-equivalence of Ubx orthologs from fairy shrimp, velvet worm and Drosophila [30] and non-equivalence of homeodomains from HoxA-4, HoxA-10, HoxA-11 and HoxA-13 paralogs from mouse [31, 32].

In this paper we investigate the sequence divergence in homeoboxes from the four gnathostome Hox clusters, including genes from basal vertebrates and sarcopterygians like shark and coelacanth, respectively. This is the first study of homeodomain divergence with extensive taxon sampling allowing us to identify the relative phylogenetic age of substitution events in vertebrate phylogeny. We use three different, but complementary, approaches to test for functional divergence among paralogs: comparison of patterns of amino acid sequence conservation/variation among paralog clades, d N /d S ratio tests to detect directional selection and identify positive sites, and comparison of clade level polymorphisms/divergence rates. Our results indicate that after cluster duplication positive Darwinian selection acted on the homeodomain of Hox proteins prior to the divergence of the modern gnathostome and bony fish lineages. We find amino acid substitutions at sites that are not involved in structural constraints and are located on the molecular surface where they are available for protein-protein interactions were targets of positive selection. We suggest that the action of positive selection at a subset of sites not constrained by ancestral (plesiomorphic) functions after cluster duplications led to the emergence of novel protein interactions while maintaining ancestral ones. This model can help reconcile the role of Hox genes in morphological diversification and innovation with their extreme sequence conservation.

Results and discussion

Functional divergence of paralog-group homeodomains

We compiled a database of Hox genes with 4–5 species for each gene (155 sequences in total) and compared conserved and variable sites between paralog group members to identify if there are characteristic residues that distinguish which cluster a paralog belongs to (for example, see Figure 1). This analysis identified many sites that are conserved among species but variable between genes in the same paralog group ('cluster-specific' residues; Figure 2). Although the homeodomain is a highly conserved motif, it is not invariant; in fact only 17 residues are absolutely conserved between all vertebrate Hox genes in our alignment, suggesting that variable sites could be functionally divergent. Many of these variable sites have been previously shown to be 'characteristic residues' that distinguish paralog groups from each other and have been suggested to be engaged in protein-protein interactions [33].

Figure 1

An Example of Functional Divergence in Hox6 Paralogs. The phylogeny of the genes is shown on the left with the location of the cluster duplication indicated with an open circle and speciation events indicated with a closed circle. Post cluster-duplication branches (PCD) and post speciation branches (PS) are highlighted blue and gray, respectively. These branch types were used in the calculation of ωPCD and ωPS in two ratio analyses for all paralog groups. The amino acid sequence of Hox6 genes from human (Hsa), chicken (Gga), frog (Xtr), coelacanth (Lme) and shark (Hfr) are shown on the left with divergent sites highlighted in red. Below the alignment sites are identified as type-I (1), type-II (2) or both (3). Amino acid substitutions are classified as conservative (C), moderate (M) or radical (R).

Figure 2

Location of Cluster-Specific Amino Acids on the Molecular Surface of Hox Homeodomains. The homeodomain is shown with the molecular surface in red and DNA in gray. Cluster-specific amino acids are shown in blue and amino acids that were under positive selection after cluster duplications are shown in yellow. Only those sites with a posterior probability larger than 0.90 of having ω > 1 are shown in yellow.

To test if 'cluster-specific' residues are functionally divergent we estimated the coefficient of functional divergence (θ), which measures the difference in evolutionary rate at amino acid sites between gene clusters. Rejection of the null hypothesis (θ = 0) is strong evidence for altered functional constraints after gene duplication (or speciation) [34]. We found significant evidence of type-I functional divergence for comparisons between HoxA, HoxB, and HoxD clusters (θI = 0.24–0.37, p < 0.05; Figure 3) under the ((AD)(BC)) topology and under the (B(A(CD))) topology (θI = 0.271–0.396, p < 0.05; Figure 3) at sites that differentiate paralog groups. We also found evidence of functional divergence of the HoxB cluster from the protoHoxACD cluster after the initial duplication event predicted under the (B(A(CD))) model (θI = 0.221 ± 0.07, p < 0.05; Figure 3). Results were not significant for comparisons between HoxC and any other cluster (θI = 0.001–0.029) under either duplication model. Comparisons between HoxB and HoxA to protoBC under the (B(A(CD))) model and protoAD to protoBC under the ((AD)(BC)) model, θI(B(A(CD))) = 0.138–134 and θI((AD)(BC)) = 0.001, respectively, were not significant. Type-I sites are defined as those with an amino acid that is conserved in one clade but variable in the sister clade, implying that the site is under structural/functional constraints in the first clade that is absent in the variable clade [34]. Type-I sites are located in the amino- and carboxy terminal ends of the homeodomain (outside of the 3 helices and in regions not predicted to be well structured) and in loop connecting helix 2 and 3.

Figure 3

Relative Rate Ratio Tree and Coefficent of Functional Divergence. Numbers of replacement and silent, invariant and variant substitutions are shown above branches (RI/RV, SI/SV) for lineage with significant results indicating adaptive evolution. Coefficients of functional divergence (θ) estimated from DIVERGE are shown on the right; θ is shown on the internal branch separating HoxB from protoHoxACD for the divergence between HoxB and protoHoxACD. Results are shown for both the ((AD)(BC)) (A.) and (B(A(CD))) (B) topologies. *, p < 0.05; **, p < 0.001.

Recently, a method has been developed to test for type-II functional divergence [35]. Type-II sites are those that are highly conserved in both clades but are fixed for amino acids with different biochemical properties between sister clades, implying these residues are responsible for functional differences between these groups. Although all paralog groups had at least one site with evidence of type-II divergence, all of which were radical amino acid substitutions (defined as a change in polarity or charge, but not size) of surface residues, the θII values are extremely small (θII = 0.001–0.062), highlighting the conservation between homeodomains from different clusters. These results are not unexpected given that this method calculates θ across all sites in an alignment and thus effectively averages site-wise θ values. With only ~4% of sites/cluster showing a pattern of type-II divergence in our concatenated alignment, it is not likely that the ~46 possible type-II sites have θII values high enough to compensate for the extremely low θII values of the over 900 sites with θII effectively equal to zero.

Accelerated evolution of homeodomains after cluster duplication

Several models of molecular evolution have been proposed to account for the preservation of duplicate genes. Although the details of each model can vary (see introduction), they differ in their predictions regarding the pattern of sequence evolution following gene duplication. The neo-functionalization and divergent selection models predict that the nonsynonymous substitution rate will be increased following gene duplication because of positive Darwinian selection in the gene acquiring the new function, while the Dykhuizen-Hartl and DDC models predict and increase in the substitution rate because of relaxed purifying selection. It is possible to distinguish between these models by comparing nonsynonymous (d N ) to synonymous (d S ) substitution rate (d N /d S = ω) with ω = 1, <1, and >1 indicating neutral evolution, purifying selection and directional selection.

Unlike the functional divergence methods developed by Gu [36, 37], estimating selection using the d N /d S ratio is, by definition, dependent on the degree of divergence of the sequences under study. Thus, short sequences with a high degree of amino acid conservation but substantial synonymous site divergence may not contain enough signal to reliably obtain estimates of d N and d S . We assessed whether homeodomain sequences contained sufficient information for reliable rate estimates by examining the tree length statistic S, the number of nucleotide substitutions per codon. For individual paralog groups S range from 6.3–13.3 (average = 10.00), with tree length dN averaging 1.2 substitutions per nonsynonymous site and tree length dS averaging 18 substitutions per synonymous site along the tree. Interestingly, simulation studies [38] have shown that at levels of sequence divergence similar to our datasets, use of the χ2 made the likelihood ratio test statistic (LRT) extremely conservative such that the type-I error rate is very small. Similarly, the power of the LRT to reject the null hypothesis even when it is false (type-II error) was found to be conservative even at medium to high levels of sequence divergence [38]. The power of the LRT increases as the number of sequences increases such that at 17 taxa power is nearly 100% [38], suggesting that our inclusion of at least 8–12 sequences (depending on the paralog group) helped alleviate loss of power from short conserved sequences. The simulation study results indicated that the optimal sequence divergence depends on the dataset and appears to be within the medium-to-high range [38]. Our data indicate that results based on estimates of d N and d S from this homeodomain dataset are reliable, if conservative.

To estimate the strength and kind of selection acting on Hox gene homeodomains, we used maximum likelihood methods to estimate the nonsysnonymous (d N ) to sysnonymous (d S ) substitution rate ratio [39, 40]. The one ratio model is the simplest and provides a measure of the average strength and direction of selection acting on the gene throughout its history and can test if there was an increase in the rate of evolution after Hox cluster duplications. As expected, the d N /d S ratio for the homeodomains of all paralog groups is much less than 1 (0.0033–0.0359) highlighting the dominant role purifying selection plays on Hox gene evolution. To test if there was an increase in the nonsynonymous substitution rate following Hox cluster duplication we used a two ratios model that estimated separate ω's for post cluster duplication (ωPCD) and post speciation (ωPS) branches. Post cluster duplication branches evolved significantly faster (3–27x) than post speciation branches for 10 of 13 paralog groups; the remaining 3 paralog groups had ωPCD > ωPS but the results were not significant (Table 1). A more complex model that allowed each post duplication lineage to have separate d N /d S ratios from each other (the paralog 6 group for example: ωPCD-A6, ωPCD-B6 and ωPCD-C6) and post speciation (ωPS) branches was not better than the simple two-ratio model indicating that paralogs experienced similar selective forces after cluster duplication. These results are consistent with previous data from Hox5, Hox6 and Hox7 and indicate there was a period of rapid evolution of the homeodomain after Hox cluster duplication that could have been the result of either positive Darwinian selection or relaxed purifying selection.

Table 1 Likelihood paramater estimates under the lineage-specific models.

Adaptive evolution of homeodomains after cluster duplication: Relative rate ratio tests

Although positive selection at the molecular level is most often tested using the d N /d S ratio, this method has several inherent limitations. The most problematic of which is when positive selection is acting at a limited number of sites while the majority are under strong purifying selection. Under these conditions d N will never become larger than d S and the signal for positive selection will be masked. In addition, when there is a large amount of sequence divergence between two nodes in a tree (site saturation) the accuracy of d S , and to a lesser extent d N , is greatly reduced. These two limitations of the d N /d S ratio to detect positive selection are particularly important for studying selective forces after Hox cluster duplications since very few sites (less than 15%) changed after duplication and the duplication events are relative ancient (about 560 MYA; ref), leading to substantial synonymous site divergence. Thus, even though we found evidence of accelerated rates of sequence evolution post cluster duplication, it is unlikely that the d N /d S ratio tests used above would be able to detect positive selection (ω > 1).

One complementary method that has been developed to compensate for some of limitations of the d N /d S ratio is the relative rate ratio test of Creevey and McInerney [41], which is an extension of the contingency test of neutrality proposed by Templeton [42] and McDonald and Kreitman [43]. Briefly, this method reconstructs ancestral sequences for each node in a phylogenetic tree using parsimony and identifies all substitutions that result in nonsynonymous and synonymous changes for each node. Substitutions are classified as replacement invariable (RI, i.e. nonsynonymous substitutions that are not substituted again in descendent lineages), replacement variable (RV, i.e. nonsynonymous substitutions that are substituted again in descendent lineages), silent invariable (SI, i.e. synonymous substitutions that are not substituted again in descendent lineages) and silent variable (RV, i.e. synonymous substitutions that are substituted again in descendent lineages).

Under neutral evolution the ratio of RI/RV will not be significantly different from SI/SV. Similarly, a period of relaxed purifying selection may increase RI/RV relative to SI/SV, but RI/RV will never be significantly greater than the neutral expectation given by SI/SV since the rate of replacement substitution can only exceed the rate of silent (neutral) substitution under positive selection. During an episode of positive selection, advantageous substitutions will become fixed in a lineage and remain invariant in descendent lineages, elevating the ratio of RI/RV relative to the neutral expectation given by SI/SV. Thus, when lineages are identified with a significantly greater RI/RV than SI/SV positive selection is indicated.

Using the relative rate ratio test to examine selective forces after cluster duplications identified that post-duplication lineages under the ((AD)(BC)) and the (B(A(CD))) models had significantly larger RI/RV than SI/SV (Figure 3 and Tables 2 and 3), indicating these duplication events were followed by adaptive evolution and supporting the results obtained with the d N /d S ratio tests and further suggesting that the increase in rates identified from the d N /d S ratio were due to positive selection.

Table 2 Results of the Creevey-McInerney test unde the ((AD)(BC)) topology.
Table 3 Results of the Creevey-McInerney test unde the ((AD)(BC)) topology.

Adaptive evolution of homeodomains after cluster duplication: dN/dS

The lineage-specific d N /d S model utilized above has been extended to account for variable d N /d S between sites and can detect positive selection at specific sites in specific lineages under appropriate conditions [44, 45]. These branch-site models are ideal for detecting short episodes of positive selection that acted on a few sites while the majority of sites in the protein remained under purifying selection, as is likely to have occurred in the homeodomain after Hox cluster duplication. Applying branch-site models and to post cluster duplication (ωPCD) branches identified sites under positive selection after cluster duplication (Figure 3) in paralog groups 1–6, 9 and 13 (Table 4). Positive Sites were identified with posterior probabilities (PP) greater than 0.90 using the both the liberal Neive Empircal Bayes (NEB) and the more conservative Bayes Empirical Bayes (BEB) methods implemented in PAML3.15, although only the result of the BEB method is shown. In addition, two genes in the Hox3, 5, 6, and 13 paralog groups have evidence of positive selection, but the results are not statistically significant. The sites identified under positive selection are the same as those that show evidence of type-II functional divergence and map onto the molecular surface of the homeodomain, facing away from the DNA and in an orientation that would facilitate protein-protein interactions (Figure 2).

Table 4 Likelihood paramater estimates under the branch-site models.

While no sites under positive selection were identified in paralog groups 7, 8 and 10–12, a class of sites in each was identified with ω = 1 (Table 4). Given that the ability of likelihood models to detect sites with ω > 1 is an extremely difficult computational problem, it is possible that these sites actually experienced positive selection, but that the models are not able to identify ω > 1. An equally likely explanation that does not invoke positive selection is that the ω = 1 is an accurate estimate for the rate at this sites, and is actually indicative of relaxed functional constraints after duplication, that the sites have not been substituted again indicates they under strong purifying selection in post-speciation lineages, supporting a Dykhuizen-Hartl mechanism for their evolution.

The structural basis of homeodomain evolution

To gain a better understanding of how functional constraints on the homeodomain relate to sequence divergence, we generated a sequence logo [46, 47] from the multiple sequence alignment of Hox-gene homeodomains and mapped the location of sites under positive selection and residues with known functions onto the logo and the crystal structure of the homeodomain bound to DNA (Figure 4). Adaptive/functionally divergent sites are grouped into three discrete regions of the homeodomain: the extreme amino and carboxy terminal arms just outside of the homeodomain proper and in the C-terminal end of helix-2 extending into the loop connecting helix-2 and helix-3.

Figure 4

Function and Evolution of the Homeodomain. The structure of the homeodomain bound to DNA is shown as ribbon models. The location of the repressor domain is shown in orange, the nuclear localization signal in red, critical hydrophobic residues of the nuclear export signal in blue and positive sites in yellow. Only side chains of amino acids that make base contacts are shown. (C) Sequence logo of the Hox-gene homeodomain and surrounding amino acids. The overall height of the stacked amino acids indicates the sequence conservation at that position, while the height of symbols within the stack indicates the relative frequency of each amino acid at that position. The location of the homeodmain is shown above the logo. The location of protein-protein interaction regions are shown in blue (note that only some sites, not all sites, in blue actually participate in protein-protein interactions), sequence motifs are shown in light gray and the location of helices in dark gray. Sites identified under directional selection after cluster duplication are shown with an asterik (*). Sites with known functional information are shown: G, characteristic paralog-group residue; S, site that assists in binding site discrimination between paralog groups; B, site that makes base contacts; H, site that is part of the hydrophobic core; P, site that contacts the phosphate backbone; E, location of leucine and isoleucine residues critical for the nuclear export signal.

The repressor domain, where the majority of protein interactions have been found, and helix-3 are free from positive sites, likely reflecting conserved functions shared by all Hox genes. Several proteins have been shown to bind in the repressor domain including the CREB binding protein (CBP) [48], high mobility group protein 1 (HMG1) [49], members of the Maf family of basic-leucine zipper (bZip) activators [50], and geminin [51]. This region also overlaps with the Sp1 transactivation region [52]. In addition to characterized protein-protein interactions, the repressor domain also contains the majority of 'characteristic-residues' that distinguish cognate groups from each other indicating the majority of sites in this region were already under functional constraints after the tandem duplications which created the Hox gene cluster and were not available to be targets of adaptive selection after cluster duplication. Interestingly, the first 3 sites of the geminin-binding region were under directional selection in different paralog groups, including radical amino acid substitutions, suggesting selection to modulate geminin binding between paralog group members.

Mapping functionally divergent amino acid sites and sites under positive selection in and around helix-2 onto the logo and crystal structure shows that purifying selection has acted to preserve hydrophobic/aliphatic residues critical for the nuclear exportation signal [53] and positive selection has acted exclusively on sites that occur at the molecular surface. These sites form a small cluster at the posterior end of helix-2 in a prime location for protein-protein interactions. Beyond the ultra-conserved helix-3, which also contains the nuclear localization signal [54], sites under positive selection have been identified in an unstructured connecting loop leading to additional structures in the carboxy terminus. The amino-terminal arm of the homeodomain, which confers functional specificity on Hox proteins, contains the majority of sites under positive selection suggesting that selection has acted to modify functional specificity between paralogs. This region appears to be unstructured and is a prime target for protein-protein interaction sites. This pattern of purifying and positive selection suggests that after Hox cluster duplications, selection acted on protein-protein interaction sites in such a way that ancestral functions were maintained while the acquisition of novel protein interaction partners driven was driven by selection on non-constrained amino acids. These derived interactions could be those responsible for novel Hox gene functions in vertebrates.


The homeodomain serves multiple functions in addition to DNA-binding, including containing nuclear localization and export signals, transcriptional activation and repression domains and other protein-protein interaction sites [33, 54]. These functions combine to impose severe limitations on the degree of sequence divergence that can be accommodated by the homeodomain of Hox genes. Even with these constraints, however, the relatively small set of amino acids that were free to diverge after cluster duplication were subject to positive selection. Although the Hox cluster duplications are relatively ancient (450 MYA), complicating the detection of positive selection, we find congruence between multiple methods a strong indicating that our results are reliable. These results support an important role for the action of positive Darwinian selection in the divergence of Hox genes after cluster duplications, particularly at sites that distinguish paralog groups ('cluster-specific' residues).

Nearly all 'cluster-specific' residues map onto the molecular surface of the homeodomain, similar to the paralog group specific sites [33], suggesting changes in amino acid properties could influence interaction of the homeodomain with other proteins. Cofactor associations are important for Hox proteins and most other transcription factor functions; these protein-protein interactions occur at the molecular surface through the formation of hydrophobic and ionic bonds and other intermolecular interactions such as salt bridges and van der Vaals forces. Thus, changes in the physicochemical properties of amino acids participating in these bonds could disrupt preexisting interactions and/or lead to new interactions. These changes could provide a selective advantage for maintaining duplicate genes through the origin of novel protein-protein interactions (effectively reducing degeneracy between paralogs) leading to new gene functions.


The homeodomain of Hox genes was identified from BLAST searches of the nr database at NCBI. At least four members of each gene from diverse taxa were included in the dataset. The sequences were aligned based on the translated amino acid sequences with Se-Al v2.0, alignments were simple given the high degree of sequence conservation within paralog groups. Regions of ambiguous alignment just outside of the homeodomain but within exon 2 were excluded. Most alignments ranged from 70–82 amino acids. The alignment is available from V.J.L. and has been deposited in TREEBASE.

We used codon-based maximum likelihood models of coding sequence evolution implemented in CODEML in the PAML package of programs (version 3.15) to test for lineages and amino acid sites under positive selection. Sites were classified as being under positive selection if they were identified from the Bayes Empirical Bayes (BEB) method with a posterior probability of greater than 0.90. The branching order of the Hox cluster duplications is still debated (refs), but our analyses suggest that the most likely topologies are ((AD)(BC)) and (B(A(CD))) (a detailed analysis of Hox cluster duplication history is beyond the scope of this paper and will be presented elsewhere). We used 2 alternate trees to test for selection: ((AD)(BC)) and (B(A(CD))) and found no significant differences between the results of these different topologies. Functional divergence was tested with DIVERGE alpha1.2 (obtained from X. Gu). We also used the relative rate ratio test of Creevey and McInerny [41] implemented in the program CRANN to test for adaptive evolution. Both DIVERGE and CRANN analyses used the 2 alternate topologies discussed above.


  1. 1.

    Duboule D: Guidebook to the Homeobox Genese. 1994, Oxford, Oxford University Press

    Google Scholar 

  2. 2.

    Kobayashi A: Developmental Genetics of the Female Reproductive Tract in Mammals. Nature Reviews Genetics. 2003, 1225-1237.

    Google Scholar 

  3. 3.

    Powers TP: Evidence for a Hox14 paralog group in vertebrates. Current Biology. 2004, 14: R183-R184. 10.1016/j.cub.2004.02.015.

    Article  CAS  PubMed  Google Scholar 

  4. 4.

    Schughart K KC: Mammalian homeobox-containing genes: genome organization, structure, expression and evolution. British Journal of Cancer. 1988, 9: 9-13.

    PubMed Central  PubMed  Google Scholar 

  5. 5.

    Wagner GP: Hox Cluster Duplications and the Opportunity for Evolutionary Novelties. Proceedings of the National Academy of Sciences USA. 2003, 100: 14603-14606. 10.1073/pnas.2536656100.

    Article  CAS  Google Scholar 

  6. 6.

    Amores A, Force A, Yan YL, Joly L, Amemiya C, Fritz A, Ho RK, Langeland J, Prince V, Wang YL, Westerfield M, Ekker M, Postlethwait JH: Zebrafish hox Clusters and Vertebrate Genome Evolution. Science. 1998, 282: 1711-1714. 10.1126/science.282.5394.1711.

    Article  CAS  PubMed  Google Scholar 

  7. 7.

    Amores A, Suzuki T, Yan YL, Pomeroy J, Singer A, Amemiya C, Postlethwait JH: Developmental Roles of Pufferfish Hox Clusters and Genome Evolution in Ray-Fin Fish. Genome Res. 2004, 14: 1-10. 10.1101/gr.1717804.

    PubMed Central  Article  CAS  PubMed  Google Scholar 

  8. 8.

    Stadler PF: Evidence for independent Hox gene duplications in the hagfish lineage: a PCR-based gene inventory of Eptatretus stoutii. Molecular Phylogenetics and Evolution. 2004, 32: 686-694. 10.1016/j.ympev.2004.03.015.

    Article  CAS  PubMed  Google Scholar 

  9. 9.

    Fried C: Independent Hox-cluster duplications in lampreys. J Exp Zoolog B Mol Dev Evol. 2003, 299: 18-24. 10.1002/jez.b.37.

    Article  Google Scholar 

  10. 10.

    Kimura M: The neutral theory of molecular evolution. 1983, Cambridge, Cambridge University Press

    Google Scholar 

  11. 11.

    Force A, Lynch M, Pickett FB, Amores A, Yan Y, Postlethwait J: Preservation of Duplicate Genes by Complementary, Degenerative Mutations. Genetics. 1999, 151: 1531-1545.

    PubMed Central  CAS  PubMed  Google Scholar 

  12. 12.

    Goodman M: Darwinian evolution in the genealogy of haemoglobin. Nature. 1975, 253: 603-608. 10.1038/253603a0.

    Article  CAS  PubMed  Google Scholar 

  13. 13.


    PubMed Central  CAS  PubMed  Google Scholar 

  14. 14.

    Hughes AL: Adaptive Evolution of Genes and Genomes. 2000, New York, Oxford University Press

    Google Scholar 

  15. 15.

    He X: Rapid subfunctionalization accompanied by prolonged and substantial neofunctionalization in duplicate gene evolution. Genetics. 2005, 169: 1157-1164. 10.1534/genetics.104.037051.

    PubMed Central  Article  PubMed  Google Scholar 

  16. 16.

    Francino MP: An adaptive radiation model for the origin of new gene functions. Nature Genetics. 2005, 37: 573-577. 10.1038/ng1579.

    Article  CAS  PubMed  Google Scholar 

  17. 17.

    Fares MA, Bezemer D, Moya A, Marin I: Selection on Coding Regions Determined Hox7 Genes Evolution. Mol Biol Evol. 2003, 20: 2104-2112. 10.1093/molbev/msg222.

    Article  CAS  PubMed  Google Scholar 

  18. 18.

    Van de Peer Y: The ghost of selection past: Rates of evolution and functional divergence of anciently duplicated genes. Journal of Molecular Evolution. 2001, 53: 636-446. 10.1007/s002390010233.

    Article  Google Scholar 

  19. 19.

    Lynch VJ: Adaptive Evolution of HoxA-11 and HoxA-13 at the Origin of the Uterus in Mammals. Proceedings of the Royal Soceity of London B. 2004, 271: 2201-2207. 10.1098/rspb.2004.2848.

    Article  CAS  Google Scholar 

  20. 20.

    Crow K: The Fish Specific Hox Cluster Duplication is Coincident With the Origin of Teleosts. Molecular Biology and Evolution.

  21. 21.

    Holland PW: Hox genes and chordate evolution. Developmental Biology. 1996, 173: 382-395. 10.1006/dbio.1996.0034.

    Article  CAS  PubMed  Google Scholar 

  22. 22.

    Wagner GP: Hox clusters duplications and the opportunity for evolutionary novelties. Proceedings of the National Acadmey of Sciences, USA. 2003, 100: 14603-14606. 10.1073/pnas.2536656100.

    Article  CAS  Google Scholar 

  23. 23.

    Malaga-Trillo EAM: Genome duplications and accelerated evolution of Hox genes and cluster architecture in teleost fishes. American Zoologist. 2001, 41: 676-686. 10.1668/0003-1569(2001)041[0676:GDAAEO]2.0.CO;2.

    CAS  Google Scholar 

  24. 24.

    Greer JM: Maintenance of functional equivalence during paralogous Hox gene evolution. Nature. 2000, 403: 661-665. 10.1038/35001077.

    Article  CAS  PubMed  Google Scholar 

  25. 25.

    McGinnis N: Human Hox-4.2 and Drosophila deformed encode similar regulatory specificities in Drosophila embryos and marvae. Cell. 1990, 969-976. 10.1016/0092-8674(90)90500-E.

    Google Scholar 

  26. 26.

    Malicki J: A human Hox4B regulatory selemtns provides head-specific expression in Drosophila embryos. Nature. 1992, 358: 345-347. 10.1038/358345a0.

    Article  CAS  PubMed  Google Scholar 

  27. 27.

    Awgulewitsch A: Deformed autoregulatory element from Drosophila functions in a conserved manner in transgenic mice. Nature. 1992, 341-334. 10.1038/358341a0.

    Google Scholar 

  28. 28.

    Zakany J: Functional equivalence and rescue among group 11 Hox gene products in vertebral patterning. Developmental Biology. 1996, 176: 325-328. 10.1006/dbio.1996.0137.

    Article  CAS  PubMed  Google Scholar 

  29. 29.

    Zhao JJ: The mouse Hox-1.3 is functionally equivalent to the Drosophila sex combs reduced gene. Genes and Development. 1993, 7: 343-354.

    Article  CAS  PubMed  Google Scholar 

  30. 30.

    Grenier JK: Functional evolution of the Ultrabithorax protein. Proceedings of the National Academy of Sciences USA. 2000, 97: 704-709. 10.1073/pnas.97.2.704.

    Article  CAS  Google Scholar 

  31. 31.

    Zhao Y: Functional comparison of the Hoxa 4, Hoxa 10, and Hoxa 11 homeoboxes. Developmental Biology. 2002, 224: 21-36. 10.1006/dbio.2002.0595.

    Article  Google Scholar 

  32. 32.

    Zhao Y: Functional specificity of the Hoxa13 homeobox. Development. 2001, 128: 3197-3207.

    CAS  PubMed  Google Scholar 

  33. 33.

    Sharkey M: Hox genes in evolution: protein surfaces and paralog groups. Trends in Genetics. 1997, 13: 145-151. 10.1016/S0168-9525(97)01096-2.

    Article  CAS  PubMed  Google Scholar 

  34. 34.

    Gu X: Functional divergence in protein (family) sequence evolution. Genetica. 2003, 118: 133-141. 10.1023/A:1024197424306.

    Article  CAS  PubMed  Google Scholar 

  35. 35.

    Gu X: A simple statistical method for estimating type-II (cluster-specific) functional divergence of protein sequences. molecular biology and evolution. 2006, 23: 1937-1945. 10.1093/molbev/msl056.

    Article  CAS  PubMed  Google Scholar 

  36. 36.

    Gu X: DIVERGE: phylogeny based analysis for functional-structural divergence of a protein family. Bioinformatics. 2002, 18: 500-501. 10.1093/bioinformatics/18.3.500.

    Article  CAS  PubMed  Google Scholar 

  37. 37.

    Gu X: Statistical methods for testing functional divergence after gene duplication. molecular biology and evolution. 1999, 16: 1664-1674.

    Article  CAS  PubMed  Google Scholar 

  38. 38.

    Anisimova M: Accuracy and power of the likelihood ratio test in detecting adaptive molecular evolution. Molecular biology and evolution. 2001, 18: 1585-1592.

    Article  CAS  PubMed  Google Scholar 

  39. 39.

    Yang Z: Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Mol Biol Evol. 1998, 15: 568-573.

    Article  CAS  PubMed  Google Scholar 

  40. 40.

    Yang Z: PAML. a program package for phylogenetic analysis by maximum likelihood. CABIOS. 1997, 13: 555-556.

    CAS  PubMed  Google Scholar 

  41. 41.

    Creevey CJ: An algorithm for detecting directional and non-directional positive selection, neutrality and negative selection in protein coding DNA sequences. Genetica. 2002, 300: 43-51.

    CAS  Google Scholar 

  42. 42.

    Templeton A: Genetic systems and evolutionary rates. Rates of Evolutionq. Edited by: MF CKSWD. 1987, London, Allen & Unwin, 218-234.

    Google Scholar 

  43. 43.

    McDonald JH: Adaptive protein evolution at the Adh locus in Drosophila. Nature. 1991, 351: 652-654. 10.1038/351652a0.

    Article  CAS  PubMed  Google Scholar 

  44. 44.

    Yang Z: Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. molecular biology and evolution. 2002, 19: 908-917.

    Article  CAS  PubMed  Google Scholar 

  45. 45.

    Yang Z, Wong WSW, Nielsen R: Bayes Empirical Bayes Inference of Amino Acid Sites Under Positive Selection. Mol Biol Evol. 2005, 22: 1107-1118. 10.1093/molbev/msi097.

    Article  CAS  PubMed  Google Scholar 

  46. 46.

    Crooks GE: WebLogo: A sequence logo generator. Genome Research. 2004, 14: 1188-1190. 10.1101/gr.849004.

    PubMed Central  Article  CAS  PubMed  Google Scholar 

  47. 47.

    Schneider TDS: Sequence Logos: A new way to display consensus sequences. Nucleic Acids Research. 1990, 18: 6097-

    PubMed Central  Article  CAS  PubMed  Google Scholar 

  48. 48.

    Shen W, Chrobak D, Krishnan K, Lawrence HJ, Largman C: HOXB6 Protein Is Bound to CREB-binding Protein and Represses Globin Expression in a DNA Binding-dependent, PBX Interaction-independent Process. J Biol Chem. 2004, 279: 39895-39904. 10.1074/jbc.M404132200.

    Article  CAS  PubMed  Google Scholar 

  49. 49.

    Zappavigna V: HMG1 interacts with HOX proteins and enhances thier DNA binding and transcriptional activation. EMBO Journal. 1996, 15: 4981-4991.

    PubMed Central  CAS  PubMed  Google Scholar 

  50. 50.

    Kataoka K: A set of Hox proteins interact with the Maf oncoprotein to inhibit its DNA binding, transactivation, and transforming activities. Journal of Biological Chemistry. 2001, 276: 819-826. 10.1074/jbc.M007643200.

    Article  CAS  PubMed  Google Scholar 

  51. 51.

    Luo L: The cell-cycle regulator geminin inhibits hox function through direct and polycomb-mediated interactions. Nature. 2004, 427: 749-753. 10.1038/nature02305.

    Article  CAS  PubMed  Google Scholar 

  52. 52.

    Suzuki M: Hox proteins functionally cooperate with the GC box-binding protein system through distinct domains. Journal of Biological Chemistry. 2003, 278: 30148-30156. 10.1074/jbc.M303932200.

    Article  CAS  PubMed  Google Scholar 

  53. 53.

    Maizel A: A short region of its homeodomain is necessary for engrailed nuclear export. Development. 1999, 126: 3183-3190.

    CAS  PubMed  Google Scholar 

  54. 54.

    Roth JJ: Multiple functions contribute to the evolutionary conservation of the homeodomain of Hoxa-11. Journal of Experimental Zoology part B: Molecular and Developmental Evolution.

Download references


The authors would like that thank the members of GPW lab for reading and commenting on earlier version of this manuscript. Financial support for this study was provided by the National Science Foundation (NSF) grant IBN#0321470 to GPW, and intramural funds from Yale University.

Author information



Corresponding author

Correspondence to Vincent J Lynch.

Additional information

Authors' contributions

VJL designed and carried out the project, and wrote the manuscript with contributions from JJR and GPW. JJR provided information on protein-protein interaction sites and GPW provided biological insights and guidance during the course of this study.

Authors’ original submitted files for images

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

Lynch, V.J., Roth, J.J. & Wagner, G.P. Adaptive evolution of Hox-gene homeodomains after cluster duplications. BMC Evol Biol 6, 86 (2006).

Download citation


  • Amino Acid Site
  • Paralog Group
  • Positive Darwinian Selection
  • Descendent Lineage
  • Follow Gene Duplication