Identification of the Irx and Mkx genes from the fully-sequenced genomes of 32 metazoan species
We used the sequences of Irx and Mkx genes from Homo sapiens and Drosophila melanogaster to identify, through similarity searches using BLAST algorithm, the Irx and Mkx genes encoded in the genomes of several species which provide significant coverage of the main metazoan evolutionary lineages (Figure 1). In most cases, we were able to retrieve the complete homeodomain as well as the additional conserved regions of the Irx and Mkx proteins [5]. Given our extensive searches, we think that the retrieved Irx and Mkx genes represent the full gene complement for these two families within each analyzed genome. All the identified sequences can be found in Additional file 1. A multiple sequence alignment of the conserved domains of the Irx and Mkx proteins can be found in Additional file 2.
We used phylogenetic analyses and the presence of the Irx and Mkx specific domains to define the respective complement of Irx and Mkx genes. A representative phylogenetic tree is shown in Figure 2. This tree is based on a multiple sequence alignment of the region that is conserved between Mkx and Irx proteins, i.e. the homeodomain plus a few flanking amino acids (see Additional file 2). Given the high number of sequences, we excluded from this alignment some sequences to simplify the tree, in particular the more divergent ones, such as those from the leech Helobdella. A similar tree topology was obtained using an alignment including all sequences (not shown). We also included the putative Irx gene that has been cloned from the sponge Suberites domuncula [10] as its affiliation to either the Irx or the Mkx families was not clear [5]. We found a well-supported monophyletic group that includes the known Mkx proteins together with several newly-identified putative Mkx proteins (Figure 2). This phylogenetic analysis and the presence of Mkx-specific conserved domains in these proteins (Additional file 2) allow their clear-cut identification as Mkx proteins. Another monophyletic group includes the known Irx proteins and a large number of other newly-identified putative Irx proteins (Figure 2). Although this group has poor statistical support, the presence of the Irx-specific conserved domains in most of the newly-identified proteins within this monophyletic group, allowed their safe identification as Irx proteins (Additional files 1 and 2). The proteins from the sponge Suberites and Amphimedon cluster with the Irx proteins in our phylogenetic analyses supporting the hypothesis that the corresponding sponge genes are bona fide Irx genes, as previously suggested by Larroux et al. [11]. However, we have to notice that these proteins lack the Irx-specific domains and are much divergent with respect to bilaterian Irx proteins (as well as to Mkx proteins; Additional file 2).
When reported on the species phylogeny, the numbers of identified Irx and Mkx genes indicate contrasting trends in the evolution of these two families (Figure 1). First, bona fide Mkx genes cannot be found in non-bilaterian species in contrast to Irx genes which are found, at least in cnidarians and the placozoan Trichoplax, as well as probably in sponges. We are therefore faced with two alternative hypotheses: either the Mkx genes are ancestral to metazoans and have been lost in the analysed non-bilaterian species, or the Mkx genes may represent an innovation of bilaterians and could be considered as bilaterian-specific divergent Irx genes. Second, while Mkx genes are found in both protostomes and deuterostomes (and are therefore likely to be already present in Urbilateria), several unrelated species (13 out of 36) lack the Mkx gene, indicating several independent events of gene loss. In contrast, Irx genes are found in all studied species indicating strong evolutionary pressures to conserve these genes. Third, while Mkx genes are usually found as a single gene in each species, we found several Irx genes in most cases (27 out of 36 metazoan species, 26 out of 32 if we only consider bilaterians). This indicates that the Irx (but not the Mkx) gene family evolution has been shaped by gene duplications. We further studied these gene duplications by phylogenetic analyses and characterizing the genomic organization of the Irx genes.
Phylogenetic analyses of the Irx genes suggest the occurrence of many independent duplication events in protostomes and deuterostomes
We first analysed a large sampling of Irx genes representative of the main metazoan lineages (Figure 3). We used a multiple sequence alignment that includes both the homeodomain and the additional conserved domains (Additional file 2). We excluded the most divergent sequences from this alignment, including those from sponges that lack the Irx specific domains. We found several monophyletic groups, most of which were already observed in the phylogenetic tree that includes the Mkx genes (Figure 2). These groups (which will be detailed below) include either arthropods ("mirror" and "araucan/caupolican" groups), or vertebrates ("Irx4/Irx6", "Irx2/Irx5", and "Irx1/Irx3" groups), or lophotrochozoans (one group with an annelid and a mollusc gene). However, we did not find any statistically significant groups that include arthropods and other protostomes, nor protostomes and deuterostomes sequences. For example, while the "araucan/caupolican" (arthropod sequences) clusters with the "Irx1/Irx3" group (vertebrates sequences) in the ML tree (black arrow in Figure 3), this group has a very week statistical support (10%) and is not identified with the other phylogenetic reconstruction methods (not shown). A similar situation is observed for most of the other groups observed in the tree in Figure 3. This phylogenetic analysis therefore suggests that several independent duplications have occurred in protostomes and deuterostomes and that the presence of several Irx genes in many animals mainly represent evolutionary convergences. This result is in concordance with the conclusions of a previously published study based on a more limited set of Irx genes and less detailed phylogenetic analyses [12].
We next separately analyzed protostome and deuterostome Irx genes, as it allowed us to construct phylogenetic trees based on alignments which include many more aminoacid residues than when all metazoan sequences are considered. We excluded from our analyses the most divergent Irx sequences (those from Helobdella, Ciona, Oikopleura, and the different nematode species) in order to maximize the size of the unambiguously aligned portion of the proteins.
Phylogenetic analyses of the protostome Irx genes suggest the presence of two ancestral Irx genes in arthropods and lophotrochozoans
In protostomes, we found four monophyletic groups in the trees constructed by the different phylogenetic methods (Figure 4). Two of these groups only include arthropod sequences: one group includes the Drosophila mirror gene and one of the Irx gene found in the other analyzed arthropod species; the other includes the Drosophila araucan and caupolican genes and the other Irx gene(s) from the various arthropod species analyzed. To confirm the validity of these two monophyletic groups, we analyzed the arthropod sequences alone (this allowed to construct phylogenetic trees based on an alignment of the full-length proteins) and found strong support for the existence of monophyletic "mirror" and "araucan/caupolican" groups (Additional file 3). Since each of these two groups includes one gene from Daphnia pulex (a crustacean) and one or two genes from every studied insect, the presence of two Irx genes represents the ancestral situation for the crustaceans+insects ('pancrustacea') clade. These data therefore show the occurrence of an ancient duplication event in the arthropod lineage. A second duplication happened much more recently in some dipterans, comprised in the brachycera lineage ("flies") that gave rise to the araucan and caupolican genes found in Drosophila. In the other dipteran lineage, the nematocera ("mosquitoes"), a single "araucan/caupolican" gene has been retained as seen in the three studied nematocera species, Aedes, Culex, and Anopheles (Figures 1 and 4). Our data therefore confirm the occurrence of two duplication events in the Irx gene family in arthropods, as previously suggested [5].
The two other protostome monophyletic groups concern lophotrochozoan sequences (Figure 4): one group includes three Irx genes from the limpet Lottia and one gene identified in an EST collection of the mussel Mytilus and therefore indicates the occurrence of duplications specific to molluscs. The other group includes Irx genes from three distantly-related species, the annelid Capitella (1 gene), the mollusc Lottia (1 gene), and the flatworm Schmidtea (2 genes). The other Irx genes from Capitella (2 genes), Lottia (3), and Schmidtea (2) do not cluster together (Figure 4). Our interpretation of this phylogenetic tree is that there were two Irx genes in the last common ancestor of the three aforementioned lophotrochozoan species and that one of the paralogs in each evolutionary lineage underwent highly divergent evolution (in such a way that these paralogs do not cluster in the phylogenetic trees).
Since an ancestral two gene situation is found for both arthropods and lophotrochozoans, it is therefore conceivable that the presence of two Irx genes may be ancestral to protostomes, but that differential evolutionary rates have obscured the orthology relationships between genes from the arthropod and lophotrochozoan lineages. We however have to note that a single Irx gene is found in several different nematode species (Figure 1) which belong, together with arthropods, to the ecdysozoans, one of the two main protostome branches. If our hypothesis of an ancestral two gene situation in protostomes is true, we therefore have to consider that one or several Irx gene losses have occurred in the nematode lineage. This is not unconceivable as it is known that strongly-conserved genes in bilatarians have been lost in nematodes, for example several Hox genes [16], and our study points to the loss of the Mkx genes in all the studied nematode species. We can however clearly not exclude that the presence of a single Irx gene in nematodes may represent the ancestral state in protostomes and that independent gene duplications occurred in arthropods and lophotrochozoans.
Phylogenetic analyses of the deuterostome Irx genes indicate the presence of a single Irx cluster of at least 2 genes in the last common ancestor of present-day vertebrates and suggest gene losses in non vertebrate deuterostomes
We next focused on deuterostome sequences (Figure 5). We found in our phylogenetic trees the 6 previously described groups of Irx genes (Irx1 to Irx6) from mouse, human, Xenopus, zebrafish and pufferfish, as well as their association into three pairs of paralogs, Irx1/Irx3, Irx2/Irx5 and Irx4/Irx6. This confirms that the last common ancestor of the aforementioned vertebrate species (they all belong to the osteichthyan lineage of gnathostomes) already owned 6 Irx genes which have been produced by the duplication, earlier in vertebrate evolution, of 3 ancestral genes. The inclusion in our analysis of the Irx genes from a non gnathostome species, the sea lamprey Petromyzon, allowed us to further study the early evolution of Irx genes in vertebrates. We found two of the four Petromyzon genes to cluster with gnathostome groups, one as outgroup to the Irx2/Irx5 group and the other as outgroup to the Irx1/Irx3 group (Figure 5). This indicates that the last common ancestor of the sea lamprey and gnathostomes has one Irx2/Irx5 and one Irx1/Irx3 gene. No Petromyzon gene clusters with the gnathostome Irx4/Irx6 group and the two other Petromyzon genes (Irx-b and Irx-c) strongly cluster together but branch off from the gnathostome genes. We have to mention that for these genes we only retrieved a small part of their coding sequence despite extensive efforts (Additional files 1 and 2) and therefore incomplete sequences were used for the phylogenetic analyses. To our opinion, the most likely interpretation is that Irx-b and Irx-c derive from an ancestral Irx4/Irx6 gene that was independently duplicated in the evolutionary lineage leading to the sea lamprey and in gnathostomes. We think that the position of the Irx-b and Irx-c sequences at the root of the deuterostome Irx tree is due, at least in part, to the fact that partial sequences are used. Taken together, our data therefore suggest that there were three Irx genes in the last common ancestor of lampreys and gnathostomes, and that the chromosomal duplication that gave rise to the 6 aforementioned Irx groups occurred in the gnathostome lineage, after the split with non gnathostomes, such as sea lampreys (Figure 5). The identication of the full set of Irx genes in chondrychthyans would allow further definition of the timing of the duplication event.
We also studied Irx genes from urochordates and cephalochordates. Unfortunately, the Irx genes from the urochordates (the two Ciona species and Oikopleura) are very divergent and when included in the phylogenetic analyses, they perturb the overall topology of the trees and do not cluster with vertebrate sequences (not shown). The phylogenetic tree shown in Figure 5 therefore contains only the Irx genes from the cephalochordate Branchiostoma (amphioxus), as well as the only two Irx genes known from non-chordate deuterostomes, the single Irx gene encoded by the genome of the echinoderm Strongylocentrotus and the single Irx gene cloned (other Irx genes may exist) in the hemichordate Saccoglossus (hemichordates and echinoderms form a monophyletic group – the Ambulacria – within deuterostomes). The 3 Branchiostoma Irx genes strongly cluster together (and therefore derive from Branchiostoma-specific duplications) and with the Irx2/Irx5 group (statistical supports are not strong, but this clustering is found with all methods). Similarly, the Ambulacria Irx genes strongly cluster together and with the Irx4/Irx6 group. The fact that the Branchiostoma Irx genes, on one hand, and the Ambulacria Irx genes, on the other hand, cluster with different vertebrate Irx gene (Irx2/Irx5 and Irx4/Irx6 groups, respectively) suggest that there were at least two Irx genes in the last common ancestor of the deuterostomes, like in prostostomes. The fact that a third independent group (Irx1/Irx3) exists in vertebrates may even suggest an ancestral situation where three Irx genes form a cluster in deuterostomes (Irx1/Irx3, Irx2/Irx5 and Irx4/Irx6). In these views, we have to consider that the two or three ancestral genes would have been conserved in vertebrates (and subsequently duplicated), but one or two of them were independently lost in Ambulacria (Irx4/Irx6 remained) and cephalochordates (Irx2/Irx5 remained and was subsequently duplicated).
Organization of the Irx genes in clusters is a general rule in bilaterians
As Irx genes are clustered in several species [e.g. [3, 8, 9, 12, 13]] and more than one Irx gene is observed in most bilaterian species, we wondered whether similar clustering may be found in all these species. We found these genes organized into clusters in most species (20 out of 28 bilaterian species; for the others, either there is a single gene, or the current state of the genome assembly does not allow to establish potential clusters due to very small genomic scaffolds; Figures 1 and 6; the data used to construct Figure 6 can be found in Additional file 4). The presence of clusters of Irx genes seems therefore to be an almost general rule in bilaterians, as previously suggested [12]. We also confirmed the observation made by Irimia et al. [12] that the Irx genes are associated in most bilaterian species except vertebrates, with a structurally and functionally unrelated gene known as CG10632 in Drosophila (Figure 6). CG10632 which encodes a well-conserved protein with Ankyrin repeats (Additional file 5), is found either 5' to the Irx cluster or within the cluster depending on the species analyzed (Figure 6). In vertebrates – as well as in the cnidarian Nematostella and the placozoan Trichoplax – putative orthologs are found (human Ankyrin repeat domain protein 43 and 56, for example), but are not physically linked to the Irx genes (not shown). This indicates therefore that (i) a cluster of one (or more) Irx and CG10632 genes was present in the last common ancestor of bilaterians, (ii) there has been a strong evolutionary pressure to maintain association of the Irx and CG10632 genes in bilaterians, (iii) this pressure has been relaxed in vertebrates. Further characterization of the CG10632 genes in species such as Drosophila and Branchiostoma is needed to define the mechanistic reasons for this association (such as regulatory region sharing).
These data about the genomic organization of the Irx genes can be interpreted in two different ways. The simplest and most parsimonious explanation is that a cluster of at least two Irx genes (+CG10632) is ancestral to bilaterians and has been conserved in this evolutionary lineage, like what has been observed for other homeobox gene clusters, such as the Hox, ParaHox and NK clusters [1, 17]. This hypothesis is supported by our analyses that suggest the presence of at least 2 genes in the last common ancestor of each investigated lineage, lophotrochozoans, arthropods, and deuterostomes. One plausible and parsimonious interpretation of these analyses is that this situation might be ancestral to bilaterians. This view is, however, not supported by the phylogenetic analyses of the Irx genes at the scale of the bilaterians, which suggest independent gene duplications in the different bilaterian lineages (see previous sections). The hypothesis of cluster of Irx genes already present in the last common ancestor of bilaterians would require that we postulate that the phylogenetic trees do not show the real relationships between the Irx genes from the different bilaterian lineages, which might be a consequence of differential rates of evolution in these lineages. This hypothesis is also not supported by the presence of a single Irx gene in several different nematode species – we would have to postulate that one or more Irx genes have been lost in the nematode lineages.
The second possibility is that the duplications of the Irx genes have occurred independently and in all cases there have been pressures to maintain the physical linkage of the duplicated genes. This explanation already proposed by Irimia et al. [12] is in agreement with the phylogenetic analyses, but is faced with one major problem, explaining why in several independent lineages there have been similar pressures to keep the duplicated genes in clusters. Indeed, it is easy to explain that following tandem gene duplications there could be, in some rare cases, molecular events that lead to phenomenon such as shared regulatory regions or global gene regulation, favouring cluster maintenance, while in most other cases, the duplicated genes would be, after some time, dispersed in the genome, as it is observed for most multigenic families. It is much more difficult to understand why, in the case of the Irx genes, there would have been systematic events leading to cluster maintenance after numerous instances of gene duplications, unless postulating some particular properties of the Irx genomic region that would, by itself, favor the conservation of the physical link between the duplicated genes. The existence of an ancestral cluster of a single Irx gene and CG10632 may represent such a property, constraining the duplicated Irx genes to remain associated with CG10632 and therefore with each others. This remains nevertheless to be proven and does not explain everything, as for example why the CG10632 gene has never been duplicated while the Irx genes would have duplicated so many times (Figure 6, Additional file 5).