Evolution of hes gene family in vertebrates: hes5 cluster genes were specifically increased in Xenopus

Background hes genes are chordate homologs of Drosophila genes, hairy and enhancer of split, which encode a basic helix-loop-helix (bHLH) transcriptional repressor with a WRPW motif. Various developmental functions of hes genes, including early embryogenesis and neurogenesis, have been elucidated in vertebrates. However, their orthologous relationships remain unclear partly because of less conservation of relatively short amino acid sequences, less conserved synteny, and species-specific gene duplication. This results in complicated gene names in vertebrates, which are not consistent in orthologs. In a previous study, we revealed that Xenopus frogs have two clusters of hes5, named “the hes5.1 cluster” and “the hes5.3 cluster.” The origin has not yet been revealed. Here, we elucidated the orthologous and paralogous relationships of all hes genes of human, mouse, chicken, gecko, zebrafish, medaka, coelacanth, spotted gar, elephant shark, and Xenopus frogs (X. tropicalis and X. laevis) by phylogenic and synteny analysis. Any clusters of hes5 were not found in amniotes, whereas duplicated hes5 clusters in teleost were found although not as many genes as Xenopus. In addition, hes5 cluster-like structure was found in the elephant shark genome, but not found in cyclostomata. These data suggest that the hes5 cluster existed in the gnathostome ancestor, but was lost in amniotes.


Abstract
Background hes genes are chordate homologs of Drosophila genes, hairy and enhancer of split, which encode a basic helix-loop-helix (bHLH) transcriptional repressor with a WRPW motif. Various developmental functions of hes genes, including early embryogenesis and neurogenesis, have been elucidated in vertebrates. However, their orthologous relationships remain unclear partly because of less conservation of relatively short amino acid sequences, less conserved synteny, and species-specific gene duplication. This results in complicated gene names in vertebrates, which are not consistent in orthologs.
In a previous study, we revealed that Xenopus frogs have two clusters of hes5, named "the hes5.1 cluster" and "the hes5.3 cluster." The origin has not yet been revealed.

Results
Here, we elucidated the orthologous and paralogous relationships of all hes genes of human, mouse, chicken, gecko, zebrafish, medaka, coelacanth, spotted gar, elephant shark, and Xenopus frogs (X. tropicalis and X. laevis) by phylogenic and synteny analysis.
Any clusters of hes5 were not found in amniotes, whereas duplicated hes5 clusters in teleost were found although not as many genes as Xenopus. In addition, hes5 clusterlike structure was found in the elephant shark genome, but not found in cyclostomata.

Conclusion
These data suggest that the hes5 cluster existed in the gnathostome ancestor, but was lost in amniotes.

3
Background hes genes are chordate homologs of Drosophila hairy and enhancer of split genes and encode a basic helix-loop-helix (bHLH) transcriptional repressor with a WRPW motif at the C terminus [1]. These genes are known to have various developmental functions, including Notch signaling target and neurogenesis [2], somitogenesis, and early development of the presumptive midbrain-hindbrain boundary (pre-MHB) [3,4] .
Mammals including human and mouse have seven hes genes which form subfamilies [5,6]. Most of the hes homologs in zebrafish are called her [7]. As in zebrafish, their orthologs in vertebrates remain unclear partly because the two domains, bHLH and Orange domains, and the WRPW motif at the C terminus are not well conserved, and the sequences are relatively small to compare (their total sizes are about 200 aa).
Another possible cause was that the genomes of the model organisms including Xenopus laevis (X. laevis) and X. tropicalis had yet to be sequenced.
Recently, many animal genome analyses, including frogs, X. laevis and X. tropicalis, have been reported. Xenopus includes diploid to dodecaploid species, although polyploidy is considered to be rare in amniotes. X. tropicalis has a diploid genome, and X. laevis has an allotetraploid genome [8]. The genomic analysis showed that the allotetraploidization was caused by interspecific crosses between two species that have a diploid genome.
In a previous study, we annotated all hes genes of X. tropicalis and X. laevis by phylogenetic analysis and synteny analysis [11]. In brief, for X. tropicalis, we revealed the phylogenetic and synteny relationships of the 18 hes genes and renamed them properly. X. laevis has 37 hes genes including 18 homeologs, one laevis-specific gene, hes5.7, and a pseudogene, hes7.4. Although the number of genes doubled after allotetraploidization, hes genes, except for hes2, have been conserved in X. laevis. In addition, Xenopus has more than two paralogs of hes5, hes6, and hes7 subfamily genes, in contrast to human hes genes. In particular, the number of hes5 genes in Xenopus is quite high. Interestingly, they form two clusters, which we call "the hes5.1 cluster" and "the hes5.3 cluster". The hes5.3 cluster is formed with eight genes (hes5.3 to hes5.10).
Clustered genes, such as the Hox gene cluster, the human β-globin gene cluster, and four clustered human growth hormone (hGH)/chorionic somatomammotropin genes, have various functions with unique regulatory mechanisms. The cluster is considered to be formed as a result of gene duplication and divergence [12,13].
Some of hes genes are already known to be indispensable in neurogenesis [4] and the genes are well conserved despite having many genes, forming two clusters at least in Xenopus. This implies that the hes5 cluster may also play an important role during embryogenesis.
To understand the evolution and role of hes genes in vertebrates, it is important to reveal the orthologous relationship. In this study, we first elucidated orthologous and paralogous relationships of the hes gene family using phylogenic and synteny analyses of human, mouse, chicken, zebrafish, medaka, frogs (X. tropicalis and X. laevis), Gekko japonicus, coelacanth, spotted gar, elephant shark, lamprey, and amphioxus. With this analysis, we revealed that hes genes are specifically increased in Xenopus and also discussed the evolution of the two hes5 clusters.

Classification of hes genes in sarcopterygian
Previous studies on the annotation of hes genes have shown that there are ten hes5 paralogs in X. laevis, which we refer to as "the hes5.1 cluster" or "the hes5.3 cluster" [11].
To determine when hes5 clusters emerged, we first performed a phylogenetic analysis of sarcopterygian hes5 genes (Fig. 1, Fig. S2; complete tree is shown in Fig. S4). In this analysis, other hes paralogues were included to clarify the outgroup. Maximum likelihood (ML) phylogenetic tree construction revealed that all of the hes5 genes we observed were assigned in a single clade with high bootstrap percentage (93%; Fig. 1A).
Next, to examine the presence of hes5 clusters, we analyzed synteny of hes5 locus in chicken, geckos, and coelacanth genomes (synteny of other hes genes are shown in Fig.   S1A, C). In the chicken genome, hes5 genes were located on a single chromosome, chromosome 21 ( Fig. 2B). In gecko, synteny around hes5 was observed in scaffolds 135 and 31595 (Fig. 2C). In coelacanth, we found four hes5 genes in scaffold00199, 00001, 00319, and 00059 (Fig. 2D). Hes5chr21-1-3 genes in chick, hes5sc135-1~2, and hes5-like genes in gecko, and hes5sc00319 gene in coelacanth were all located next to pank4, suggesting that these genes correspond to the hes5.1 cluster (orange background). In chicken and gecko, however, there were no hes5 genes between nol9 and zbtb48, defined as hes5.3 cluster genes in Xenopus (blue background). In contrast, the coelacanth hes5sc00199 gene was located near nol9. This result suggested that coelacanth hes5sc00199 may be homologous to the hes5.3 cluster gene. In coelacanth Sc00001, a 6 hes5 gene was found near chd5 (Fig. 2D). In Xenopus, the chd5 gene (chd5-like) was located next to rnf207 near the hes5 clusters, suggesting the relevance of the coelacanth gene to the hes5 clusters. Lachhes5c00059 was found near ppil2, which is located on the 1st chromosome in Xenopus, indicating that the syntenic property was different from other hes5 genes. Phylogenetic analysis also indicated that Lachhes5sc00059 was first divided in the hes5 gene family, suggesting a distinct evolution of this gene. Together with these results, it is suggested that the hes5 genes of both chicken and gecko correlate more with the hes5.1 cluster, whereas coelacanth hes5 genes are located in both hes5.1 and hes5.3 clusters.

Comparison of hes genes between teleosts and Xenopus
It is known that whole genome duplication (WGD) occurred 500 million year ago in the common ancestor of vertebrates. In addition to this, in teleost, another WGD occurred 3.7 million years ago after divergence from the common ancestor of gnathostomata [14,15]. Thus, in teleost genome, the two loci having similar gene order to each other, which are called doubly conserved synteny, are often found. [16]. In zebrafish (Danio renio) and medaka/Japanese ricefish (Oryzias latipes), hes genes have not been well characterized as mammalian orthologue. Indeed, many genes that seem to be hes were described as "her" genes. Therefore, we attempted to identify the orthologous relationship of teleost hes genes based on their amino acid sequences. By our phylogenic analysis, we found that many zebrafish and medaka '"her"' genes formed a clade with the Xenopus hes subfamily genes ( Fig. 3A; complete tree is shown in Fig. S5 Fig. S1B, D).
In the zebrafish genome, the her4.1-4.4/12 cluster and her2/15.1-15.2 cluster were present on chromosomes 23 and 11, respectively. dnajc11 and rnf207 genes were found in the genomic region around the clusters. In addition, icmt, kcnab2, nol9, and chd5 genes, which are located in Xenopus hes5 locus, were also found on either chromosome 23 (DRE23) or chromosome 11 (DRE11). These results suggested that DCSs were found in the hes5 region of the zebrafish genome. Near the her2/15 cluster on DRE11, dnajc11, which is located near Xenopus hes5.3 cluster, was found (Fig. 4A). However, other typical features of the hes5.3 cluster were not observed in the locus. For instance, nol9 or zbtb48 was not located near the hes2/15 locus.
The icmt gene was located near the hes3 gene in Xenopus (Fig. 4B). No her/hes gene was detected between zbtb48 and nol9, as in the chicken genomes (Fig. 2B) In medaka, the her7 gene was found to be located near grik5, which was located near the hes5.1 cluster in Xenopus (Fig. 4C). However, phylogenetic analysis showed that OLA her7 was in the Xenopus hes7.1 subclade (Fig. 3A). Conversely, OLA her4.4 and her12 were located around these genes, espn, acot7, and hes2.2, which are near the hes5.3 cluster locus in Xenopus, even though no genes were located between nol9 and zbtb48 (Fig. 4D). A gene order similar to the Xenopus hes5 region was also observed in chromosome 1 in medaka, but no hes-related genes were found in the locus (Fig. 4D).

Classification of hes genes in gnathostomata
To determine the origin of the hes5 cluster, we carried out phylogenetic analysis with spotted gar (Lepisosteus oculatus), elephant shark (Callorhinchus milii), lamprey (Petromyzon marinus), and amphioxus (Branchiostoma floridae) (Fig. 5A, B, S3; the complete tree is shown in Fig. S6). As a result, genes of hes7 and hes5 were clearly separated from the other genes with high bootstrap values. First, we counted the number of hes genes in these species with the exception of hes7and hes5-classified genes, although the bootstrap values were low. Spotted gar was considered to have two hes3, two hes7, and three hes6 (Fig. 5A, shown in red letter). Elephant shark had one hes1, hes2, hes4, and hes6 (Fig. 5A, shown in blue letter). In lamprey, there were one hes4, three hes2, and one hes3 (Fig. 5A, shown in purple letter). In amphioxus, the hes A-G gene was found, but not in any of the hes subfamily clades (Fig. 5A, shown in green letter).
In the hes5 clade, both spotted gar and elephant shark possessed four genes (Fig. 5B), but all the genes were separately classified from the Xenopus genes (Fig. 5B, red and blue letters). This feature was different from other hes-related genes (Fig. 5A). In lamprey and amphioxus, any hes5 gene was not found (Fig. 5B). These results indicated that the phylogenetic analysis could not identify the homologous relationship of hes5 genes between Xenopus and gar/elephant shark.
Next, we compared the gene order around Xenopus hes5 cluster region in spotted gar and elephant shark. In the spotted gar linkage group (LG) 25, four clustered hes5like genes were located next to pank4, but no hes5 genes were found near nol9 (Fig. 6B). This suggests that gar had a hes5 cluster, and the cluster was closer to the hes5.1 cluster than to the hes5.3 cluster in Xenopus. In contrast, three of the four genes in elephant shark were clustered near nol9 on KI635912.1 (Fig. 6C). This suggested that the clustered genes might be related to the hes5.3 cluster in Xenopus. In addition, the gene named her3 was located near pank4, which is located near the hes5.1 cluster in Xenopus, on HMISc93. Although the gene may have been given a wrong name because the sequence lacking WPRW domain, the synteny analysis suggested that the gene might be the homolog of hes5, and thus, the hes5.1 cluster might be conserved in elephant shark.
Another hes5 gene in elephant shark was located next to ppil2. The order was conserved in coelacanth (Fig. 2D), but not in the Xenopus hes5 cluster. This suggests the possibility that the common ancestor of teleost and cartilaginous fishes had another hes5 next to ppil2, but later lost the gene.

Discussion
Phylogenetic analysis showed that the hes5 genes were absent in lamprey and amphioxus (Fig. 5). However, eight hairy genes have been reported in amphioxus, four of which have conserved gene expression patterns in vertebrates (in the central nervous system, presomitic mesoderm, somites, notochord, and gut) [18]. Some instances have been reported that the gene names of hes and the function were mismatched [19]. It remains unclear why hes5 was specifically absent in lamprey and amphioxus, but other hes genes might substitute for hes5 function in these species.
We found that elephant shark possessed hes5 (Figs. 5 and 6). Interestingly, synteny analysis indicated that three hes5 genes might be the orthologue of the hes5.3 cluster in Xenopus. Together with the result that the putative hes5 gene (her3) existed near pank4 in the shark, it is thought that a common ancestor of gnathostomata acquired both hes5.1 and hes5.3 genes. In spotted gar, the hes5.3 cluster was not found (Fig. 6B). One of the possibilities is that, after divergence into cartilaginous fishes and neopterygii, hes5 near nol9 was translocated to the locus next to pank4. Another possibility is that her3 (=hes5.1 cluster gene) was duplicated, and three hes5-like genes (=hes5.3 cluster gene) in elephant shark were lost in the spotted gar. Unfortunately, we have not yet obtained direct evidence for these possibilities from phylogenetic analysis (Fig. 5).
The synteny of both hes5.1 and hes5.3 cluster seemed to be maintained in both teleost and neopterygian, even though the gene orders of these species in these loci were highly divergent (Figs. 4 and 6). In addition, although no cluster was formed probably due to the insufficient scaffold connection, many hes5 genes were found in coelacanths ( Fig. 2), suggesting that the prototype of the hes5.1/hes5.3 clusters would be retained in the common ancestor of amphibians and sarcopterygians. However, all amniote hes5 genes seemed to be the hes5.1 cluster genes, and not hes5.3 cluster genes (Figs. 1 and 2), suggesting that the hes5.3 clusters was lost after branching into amniotes. We further examined the number of exons in the coding regions of each hes5 gene. In both X.
tropicalis and X. laevis, almost all hes5 consisted of three exons, except for hes5.8. On the other hand, hes5 genes of many actinopterygian including zebrafish, medaka, and spotted gar genes possessed two exons in coding region (Table 2). This might reflect that hes5 genes in actinopteryozoa and osteichthyes were increased in an independent manner.
In this study, we showed that the number of hes5 genes is specifically high in Xenopus, especially the number of hes5.3 cluster genes. To estimate the duplication process, comparison of the transcriptional direction among hes5 genes may be considered important [11]. As we previously reported, the directions of hes5.5, 5.6, 5,7, and 5.9 are the same. Phylogenetic analysis also indicated that these genes were closely mapped in the tree (Fig. 1A), suggesting that these genes may share a common origin and may be tandemly duplicated in Xenopus. Phylogenetic analysis also indicated that hes5.1, hes5.2, and hes5.10 showed high similarity (Fig. 1B, 3B, 5B, and 7B). This result suggests another possibility that hes5.10 duplicated from hes5.1/5.2 (or vice versa).
It is known that hes5 functions downstream of Notch signaling and inhibits neuronal differentiation [20,21]. RNA-seq analysis revealed that the expression of almost all hes5 genes is high during the gastrula and neurula stages in Xenopus [11].
These results suggests that the function of hes5 is conserved between mouse and Xenopus. How hes5 works in neurogenesis should be investigated, and this may elucidate the significance of the high number of hes5 genes in Xenopus.

Conclusion
In this study, to understand the evolutionary process of hes genes, we estimated the evolutionary origins of two hes5 clusters. Although the hes5 gene was found in other jawed vertebrates, the number of hes5 genes was highest in Xenopus (Fig. 8). The rudiment of the two clusters was found in elephant shark, suggesting that ancestral species of chondrichthyans might have these clusters. In addition, we reorganized the orthologous relationship of hes genes in vertebrates using phylogenic and synteny analyses. These findings go a step further in the research on the function of all hes genes in vertebrates as well as the understanding of the evolutionary process of large gene clusters.

Protein sequencing comparison
A multiple alignment of protein sequence of hes genes were visualized with MUSCLE [22].

Phylogenic analysis
Phylogenetic analysis was performed using RAxML (v8.2.0) [23]. Multiple alignments of protein sequence were carried out using MAFFT (v7.221) [24] with the -auto strategy.  respectively. The broken lined circle shows "partial" hes gene.      The tree shows the phylogenetic relationship of jawed vertebrate. The list shows the number of paralogous hes5 genes which synteny conserved with hes5.1cluster and hes5.3 cluster genes and the number of paralogous hes5 genes derived from other hes5 gene. Table 1. Amino acid sequence identities of Xenopus hes5 genes with hes5 genes of zebrafish Amino acid sequence identities of zebrafish (Dare) hes5 genes are in the left column and that of Xenopus tropicalis hes5 genes are in the right column.