Comparative genomics highlight the importance of lineage-specific gene families in evolutionary divergence of the coral genus, Montipora

Scleractinian corals of the genus Montipora (Anthozoa, Cnidaria) possess some unusual biological traits, such as vertical transmission of algal symbionts; however, the genetic bases for those traits remain unknown. We performed extensive comparative genomic analyses among members of the family Acroporidae (Montipora, Acropora, and Astreopora) to explore genomic novelties that might explain unique biological traits of Montipora using improved genome assemblies and gene predictions for M. cactus, M. efflorescens and Astreopora myriophthalma. We obtained genomic data for the three species of comparable high quality to other published coral genomes. Comparative genomic analyses revealed that the gene families restricted to Montipora are significantly more numerous than those of Acropora and Astreopora, but their functions are largely unknown. The number of gene families specifically expanded in Montipora was much lower than the number specifically expanded in Acropora. In addition, we found that evolutionary rates of the Montipora-specific gene families were significantly higher than other gene families shared with Acropora and/or Astreopora. Of 40 gene families under positive selection (Ka/Ks ratio > 1) in Montipora, 30 were specifically detected in Montipora-specific gene families. Comparative transcriptomic analysis of early life stages of Montipora, which possesses maternally inherited symbionts, and Acropora, which lacks them, revealed that most gene families continuously expressed in Montipora, but not expressed in Acropora do not have orthologs in Acropora. Among the 30 Montipora-specific gene families under positive selection, 27 are expressed in early life stages. Lineage-specific gene families were important to establish the genus Montipora, particularly genes expressed throughout early life stages, which under positive selection, gave rise to biological traits unique to Montipora. Our findings highlight evolutionarily acquired genomic bases that may support symbiosis in these stony corals and provide novel insights into mechanisms of coral-algal symbiosis, the physiological foundation of coral reefs.

that are fundamental to coral reefs [2][3][4]. However, reefbuilding corals have declined in recent decades due to a variety of anthropogenic stresses, including ocean warming associated with climate change [5][6][7]. These stresses result in coral bleaching (the breakdown of the symbiosis between corals and their algal endosymbionts [8]), which ultimately leads to loss of habitat for numerous marine species and can precipitate the collapse of entire coral reef ecosystems [9].
The genus Montipora (family Acroporidae; Fig. 1) is one of the most widespread genera of reef-building corals in the Indo-Pacific [10]. Colony morphology in the genus varies from submassive to laminar, encrusting, and branching colonies [10,11]. Montipora has some unusual and interesting biological traits among acroporid corals, such as maternal transmission of symbionts and higher stress tolerance. Symbiont transmission maintains symbioses across generations and strongly influences host evolution and adaptation to environments [12][13][14]. Two fundamental symbiont transmission modes predominate in nature (reviewed in [14]): horizontal transmission (symbionts acquired from the environment) and vertical transmission (symbionts acquired maternally). While most coral species (~ 71%), including Acropora, acquire symbionts from the ocean in each generation [15], all known Montipora species acquire algal symbionts vertically [16,17] (Fig. 1). Offspring of horizontal recipients generally associate with a broad range of symbiont types and acquire optimal symbionts from new environments [18,19]; however, there is no guarantee that optimal symbionts will be available. By contrast, offspring of vertical recipients inherit symbionts that are suitable for their physiology [20], but if they encounter an environment that differs significantly from that of their parents, or if the environment changes too drastically, the inherited symbionts may be disadvantageous. Montipora also exhibits low sensitivity to ocean acidification and thermal stressors compared to other coral species [21,22]. These  (g). h A colony of Acropora tenuis. i and j A planula larva of A. tenuis without algal symbionts photographed under visible light (i) and blue light (j). k A colony of Astreopora myriophthalma. Algal symbionts (brown dots) in eggs and planula larvae of Montipora (c and f). Green fluorescence was from fluorescent proteins of Montipora and red fluorescence was from chlorophyll in algal symbionts (d and g). Orange and cyan-green fluorescence were from fluorescent proteins of Acropora (j) distinct differences between Montipora and its close relative, Acropora, may have occurred after their divergence (approx. 125 Mya [23]).
In the family Acroporidae, whole-genome data are becoming more readily available, now including 16 species of Acropora [23][24][25][26], 3 species of Montipora [23,27,28] and 1 Astreopora species [23], the latter being the sister taxon of the remainder of the Acroporidae [29] (Fig. 1). Recently, Shinzato et al. [23] performed a large-scale genomic comparison of acroporids (using genomes of Acropora, Montipora, and Astreopora) and proposed that the evolutionary success of Acropora may have resulted from gene duplications. Although some studies have performed genome-wide analysis using Montipora genomes [27,28], the genomic basis for their unique biological traits remains unknown. Exploiting abundant acroporid genomic resources, we performed comparative genomic analyses using improved genomic data of Montipora and Astreopora. We further identified genes with high evolutionary rates in Montipora that may be associated with adaptive evolution, and we specifically attempted to identify genes related to maintenance of maternally inherited symbionts by comparing gene expression during early life stages of Montipora and Acropora.

Improvement of genome assemblies and gene predictions for Montipora and Astreopora
Assembly error, including retention of allelic contigs in haploid assemblies, is problematic for downstream analyses, mainly due to redundant genome sequences (alleles from the same genetic locus). We curated scaffold sequences of M. cactus and M. efflorescens by removing scaffold sequences with high or low coverage and those that may have originated from one of two allelic copies in heterozygous regions. Numbers of scaffold sequences were significantly reduced from the previous version, from 4925 to 3521 in M. cactus and from 5162 to 3589 in M. efflorescens (Table 1). For Astreopora, possible allelic scaffold sequences were removed from the genome assembly during the previous study [23]. The previous version of gene models for M. cactus, M. efflorescens, and Astreopora were predicted using AUGUS-TUS, based solely on a training set built for Acropora or for protein homology with gene models of other corals [23]. Thus, it was highly possible that lineage-specific genes were missed in the previous version. In this study, we performed gene prediction for M. cactus, M. efflorescens, and Astreopora myriophthalma using a combination of ab initio and RNA-seq evidence-based prediction. We predicted 29,158 protein-coding genes for M. cactus, 29,424 for M. efflorescens and 25,406 for Astreopora myriophthalma (Table 1). Benchmarking universal single-copy orthologs (BUSCO) completeness scores were 93.3% (of which 0.8% were duplicated) for M. cactus, 91.2% (of which 1.4% were duplicated) for M. efflorescens and 94.5% (of which 1.3% were duplicated) for Astreopora myriophthalma, which were considerably better scores than the previous version (Table 1). In comparison to other Montipora gene models, gene models reported by Shumaker et al. [28] may have contained a higher fraction of diploid copies (93.4% complete BUSCO, with 18.3% duplicated; Table1). Completeness of gene models reported by Helmkampf et al. [27] was lower than that reported by Shumaker et al. [28] (64.2%, with 0.5% duplicated; Table1). Thus, the gene models reported by Shumaker et al. [28] contained many duplicates, but those reported by Helmkampf et al. [27] lacked many genes. In contrast, BUSCO completeness scores of M. cactus, M. efflorescens and Astreopora myriophthalma reported in this study were comparable to published gene models of other coral species, including A. millepora, predicted using the NCBI annotation pipeline (Table 1). These improvements to the Montipora and Astreopora genomes enabled more accurate comparative genomics among acroporids.
The two major clades of reef-building corals, known as Robusta and Complexa [30], possess different metabolic pathways [31]. From the six species, we compared 303 functional modules comprising ten categories in the Shumaker et al. [28] Helmkampf et al. [27] This study Shinzato et al. [23] Shinzato et al. [23] Shinzato et al. [23] Ying et al. [ Kyoto Encyclopedia of Genes and Genomes (KEGG) metabolic pathways and found that metabolic pathways were basically conserved in the three genera (Additional file 1: Table S1). An enzyme involved in cysteine biosynthesis (KEGG module ID: M00338) and methionine degradation (KEGG module ID: M00035) was not detected among the six species (Additional file 1: Table S1), as reported in Shinzato et al. [23,24]. Although one gene (KEGG entry ID: K04486) involved in the histidine biosynthetic pathway (KEGG module ID: M00026) was detected in acroporid corals used in this study, the remaining genes required to complete the pathway were not detected (Additional file 1: Table S1), as reported in Ying et al. [31]. Taken together, gene families involved in common features, such as amino acid synthesis, are widely conserved in the three genera.
While we identified 696 lineage-specific gene families in Astreopora and 316 in Acropora, we identified 1670 gene families restricted to Montipora (2307 genes in M. cactus and 2303 in M. efflorescens) (Fig. 2). The proportion of lineage-specific gene families in Montipora (13.07%) was significantly larger than that in Acropora (2.87%) and Astreopora (6.15%) (Pairwise proportion test: p < 0.05). Although we performed gene annotation with BLAST searches against the Swiss-Prot database (BLASTP, E-value cutoff: 1e −5 ) and hidden Markov models against the Pfam database (InterProScan, E-value cutoff: 1e −3 ), the proportion of Montipora-specific gene families with gene annotation was significantly lower than in Acropora and Astreopora (Pairwise proportion test: p < 0.05 for all combinations; Fig. 2). This indicates that functions of gene families restricted to Montipora are largely unknown.

Gene expansions in Montipora and comparisons among acroporids
Gene duplication has contributed to acquisition of new gene functions during evolution [32,33]. To explore gene families that underwent expansions, we first compared gene numbers of 9,690 gene families shared by the three genera and 743 gene families shared by Montipora and Acropora (Fig. 2). In these two groups, genes in families . Numbers in pie charts indicate the number of genes with similarity to either Swiss-Prot or Pfam or neither. Proportions of gene annotations were compared among gene families specific to each lineage and asterisks indicate statistical significance (Pairwise proportion test: p < 0.05). Upset plot was produced using the "UpSetR" package [79] that underwent gene expansions in either Montipora or Acropora might have been duplicated after Montipora and Acropora diverged from their last common ancestor. Three gene families, homologous dimethylsulfoniopropionate (DMSP) lyase (ALMA; HOG0000829), Endonuclease-reverse transcriptase (GP1; HOG0000531), and Spondin (SPON1; HOG0001590), and three non-annotated gene families (NA; HOG0000965, HOG0001135, and HOG0001312), were significantly expanded in Acropora (Fisher's exact test: p < 0.05; Fig. 3a and b). Recently, it was reported that DMSP lyase is the most expanded gene family in Acropora [28], and our result is consistent with that report, supporting the accuracy of this analysis. We found that three gene families, transient receptor potential protein (TRPC; HOG0002487), collagen alpha-1 (VII) chain (COL7A1; HOG0003259) and nonannotated gene family (NA; HOG0001797) are significantly expanded in Montipora compared with Acropora (Fisher's exact test: p < 0.05; Fig. 3a and 3b).
Next, we compared gene numbers of 665 gene families shared by Montipora and Astreopora (Fig. 2), in which gene duplication may have occurred after divergence of Montipora or Astreopora. These genes may have been lost in Acropora. Two gene families (HOG0003949 and HOG0004557) lacking Swiss-Prot annotation were significantly expanded in Astreopora (Fisher's exact test: p < 0.05; Fig. 3c), whereas one other gene family, tetratricopeptide repeat protein 28 (TTC28; HOG0000387), which is involved in the cell cycle in humans [34], was significantly expanded in Montipora compared with Astreopora (Fisher's exact test: p < 0.05; Fig. 3c).

Estimation of evolutionary rate in each Montipora gene family group
The ratio of nonsynonymous (Ka) to synonymous substitutions (Ks) reflects the strength of selective pressure on protein sequences [35]. For example, when Ka is less than Ks (Ka/Ks < 1), selection has occurred to eliminate mutations of protein sequences (purifying selection). In contrast, when Ka is larger than Ks (Ka/Ks > 1), selection has occurred to mutate the protein sequences (positive selection). In order to evaluate the strength of selective pressure acting on protein sequences in each Montipora gene family, we calculated pairwise Ka/Ks ratios between Montipora single-copy orthologous gene pairs (M. cactus versus M. efflorescens) for each of the four groups: (1) gene families shared by the three Acroporidae genera, (2) gene families shared by Montipora and Acropora, (3) gene families shared by Montipora and Astreopora, and (4) gene families restricted to Montipora (Fig. 4). When we compared Ka/Ks ratio between groups, gene families restricted to Montipora showed the highest Ka/Ks ratio (Wilcoxon rank sum test: p < 0.05; Fig. 4), indicating that this gene family group has undergone a relaxation of purifying selection, and that functional constraints on this gene family group are relaxed. This could explain why the deduced gene functions of gene families restricted to Montipora are largely unknown.

Positive selection specific to Montipora
To identify genes with fast evolutionary rates that may be associated with adaptive evolution in Montipora, we focused on gene families exhibiting Ka/Ks > 1. We found evidence of positive selection in 40 gene families (rapidly evolving gene families) ( Table 2). Of those, 10 families are shared by the three genera or shared by Montipora and Acropora, while the remaining 30 families are restricted to Montipora ( Table 2), suggesting that these 30 gene families arose specifically in that lineage and likely contribute to biological traits unique to Montipora. Although 28 of the 30 gene families restricted to Montipora were without annotation, their possible subcellular localization ranging from membrane to organelle was predicted by DeepLoc, a deep learning neural networks model ( Table 2).

Gene expression unique to early life stages of Montipora
Presence of maternally inherited algal symbionts at an early life stage is the most obvious difference between vertical and horizontal transmitters (Fig. 1). In order to identify gene families specifically involved in symbiosis in vertical transmitters, we compared the repertoire of expressed genes in early life stages of Montipora with those expressed in Acropora. In this analysis, a gene family was considered expressed even if only one gene in that family was expressed (Transcripts per million (TPM) > 1). We confirmed that 11,930 and 10,838 gene families were expressed during early life stages of Montipora and Acropora, respectively (Fig. 5a). Of these, 10,051 gene families (84% in Montipora and 93% in Acropora) were shared by both during early life stages (Fig. 5a), suggesting that these are essential for early development of acroporid corals; thus, we did not focus on these in the present study. We identified 1,879 gene families that were exclusively expressed in Montipora (Fig. 5b). Among those, 60% (1132 gene families) were expressed in planula larvae, metamorphosed larvae, and recruit stages (Fig. 5a), suggesting that these genes may be related to maintenance of algal symbionts in Montipora. Interestingly, 97% of these gene families ((753 + 344) / 1132, Fig. 5b) that were expressed throughout the three life stages were specific to Montipora or shared by Astreopora (Additional file 2: Table S2). In contrast, the remaining 3% of gene families ((22 + 13) / 1,132, Fig. 5b) have orthologs in Acropora, but were not expressed in Acropora. Nonetheless, they were expressed throughout early life stages of Montipora Additional file 3: Table S3). Within gene families containing gene duplications in the Montipora genomes above, two gene families (HOG0001797 and HOG0000387) were exclusively expressed in at least one early life stage in Montipora, and one of them (HOG0000387) was expressed throughout all three early life stages (Additional file 2: Table S2). Among the identified 30 rapidly evolving gene families restricted to Montipora, we detected gene expression of 90% of these families. Expression of nine families was detected in at least one early life stage of Montipora, and the remaining 18 gene families were continuously expressed throughout all three early life stages (Table 2).  Ka/Ks ratios were compared among gene families and significant differences were observed in all pairwise combinations (Wilcoxon rank sum test: p < 0.05). A raincloud plot was produced using the "raincloudplots" package [81]. Details of 40 gene families exhibiting Ka/Ks > 1 are shown in Table 2 [27]; Table 1), and were comparable to those of other coral species (Table 1). These numbers indicate that we successfully obtained high-quality gene models from Montipora and Astreopora species. Numbers of genes in M. cactus and M. efflorescens genomes were not quite as large as those of M. capitata reported by Shumaker et al. [28]. Previously, it was reported that M. capitata has fewer exons and shorter coding regions per gene than other corals [27,28]; however, this was not the case with M. cactus and M. efflorescens (Table 1). Fewer exons and shorter coding regions per gene could be an unusual feature of the M. capitata genome or could reflect the quality of the genome assembly. Indeed, the N50 size, one of the indices to evaluate the quality of genome assembly, was larger for both M. cactus and M. efflorescens genome assemblies than for M. capitata (Table 1).

Possible genomic evolutionary strategy unique to Montipora
Recent large-scale genome comparisons of acroporid genomes showed that 28 gene families were specifically expanded in Acropora, but none in Montipora [23]. Nonetheless, we identified four expanded gene families in Montipora (Fig. 3). Although the number of gene families in Montipora is not much different from those of Acropora and Astreopora, the proportion of lineage-specific gene families in Montipora was significantly larger (Fig. 2). Montipora does not appear to have duplicated existing gene families, as has Acropora. Lineage-specific gene families contribute to larger gene numbers in Montipora genomes, and emergence of lineage-specific genes may have helped to establish maternal transmission of symbionts in Montipora corals. In particular, Montiporaspecific gene families under positive selection may be major contributors. Three gene families, homologous to TRPC6, TTC28, and COL7A1, and one gene family without annotation were significantly expanded in Montipora compared with Acropora or Astreopora (Fig. 3). Known functions of transient receptor potential (TRP) proteins encoded by TRPC are diverse (reviewed in [36]). For example, TRP proteins respond to hypertonicity in yeasts [37,38], detect and avoid noxious chemicals in nematodes [39], and discriminate warmth, heat, and cold in humans [36]. In each case, TRP proteins mediate sensory transduction in cells [36]. In corals, expression levels of TRP-like genes change when the concentration  RNA-seq data from planula larvae, metamorphosed larvae and recruits were used. For Acropora, RNA-seq data from blastulae, gastrulae, planula larvae and polyps were used. SRA accession numbers for the RNA-seq data are provided in Additional file 7: Table S7 of CO 2 in seawater changes [40]. They also change diurnally [41,42] or when exposed to symbiotic algae [43,44]. The TRPC6-like gene family, specifically expanded in Montipora, may also be involved in sensory transduction during environmental transitions. The TTC28-like gene family has tetratricopeptide repeats (PF12176 and/or PF13424) and caspase HetF associated with Tprs (CHAT) domains (PF12770) (Additional file 4: Table S4). Canonical TTC28 is composed of tetratricopeptide repeats and CHAT domains (Q96AY4: TTC28_HUMAN [34]) and genes in the gene family (HOG0000387) are also composed of tetratricopeptide repeats and CHAT domains, indicating that this gene family may have been duplicated from canonical TTC28, which is conserved in all acroporids examined in this study (HOG0016559 in Additional file 9: Data S1). TTC28 is required for the cell cycle in humans [34]. The expanded TTC28-like gene family may also be involved in cell cycle in Montipora. Collagen is expressed in gastrodermis at a specific developmental stage of cnidarian larvae [45][46][47] and the expanded collagen-like gene family may function in early development of Montipora.
In this study, we identified 40 genes under positive selection in Montipora (Table 2). Positive selection has often been detected in genes involved in immunity in vertebrates [48]. In corals, genes related to immunity, such as lectins and antimicrobial peptides, are also under positive selection [23,49,50]. In the 40 rapidly evolving gene families found in this study, with one exception, no genes appeared homologous to immune-related genes ( Table 2). In addition, 28 of 30 rapidly evolving gene families restricted to Montipora have no annotated function (Table 2). Generally, genes with no homology to genes of other lineages are called orphan genes [51]. They are thought to arise principally by two processes: gene duplication or de novo evolution from non-coding regions [51]. If a gene originates by duplication, the protein domains tend to be conserved in the new genes, since a functional protein domain cannot easily be changed by mutations [52], suggesting that the 28 rapidly evolving gene families originated by de novo evolution from non-coding regions. Orphan genes are expected to interact specifically with the environment as a consequence of lineage-specific adaptation [51]. Therefore, orphan genes may contribute to adaptive evolution in Montipora. In particular, 18 rapidly evolving gene families with expression throughout the three early life stages, planula larvae, metamorphosed larvae, and recruits, may have important functions in symbiosis during early life stages of Montipora.

Conclusions
In this study, we highlighted possible genomic underpinnings of maternal transmission of symbionts in Montipora using high-quality genomic information of Montipora and Astreopora. We found that the driving force behind evolution of Montipora was lineage-specific gene families, rather than gene duplication, as among Acropora corals. The importance of rapidly evolving gene families in Montipora for maternal transmission of symbionts is inferred. Our dataset and findings offer novel insights into mechanisms of coral-algal symbiosis. Although genetic tools for manipulating corals have been established [53,54], development of more efficient methods to deliver gene-knockdown or -knockout reagents into large numbers of zygotes will facilitate rapid screening for relevant phenotypes of candidate genes. In addition, coral cell lines which have the capacity to incorporate algal symbionts has been developed [55], allowing us to observe ongoing symbiosis at the single cell level. Together, these advances will facilitate a deeper understanding of cellular and molecular mechanisms of coralalgal symbiosis.

Sample preparation, RNA extraction, and RNA-Seq
Colonies of M. cactus, M. efflorescens, and Astreopora myriophthalma were collected in Sekisei Lagoon, Okinawa, Japan in May 2015, and were maintained in aquaria at the Research Center for Subtropical Fisheries, Seikai National Fisheries Research Institute, until spawning. Permits for coral collection were kindly provided by the Okinawa Prefectural Government for research use (Permits #29-74). Coral fragments (~ 3 cm diameter) from adult colonies of M. cactus, M. efflorescens, and Astreopora myriophthalma were snap frozen in liquid nitrogen and stored at -80℃ until use. Fragments were then crushed in liquid nitrogen with an iron and hammer into powder. Total RNA was extracted from the powder using an RNeasy Plant Mini Kit (QIAGEN). A TruSeq Stranded mRNA Library Kit (Illumina) was used for mRNA sequencing library preparation, and each library was sequenced from 100-bp paired-end libraries using a NovaSeq 6000 (Illumina). For Montipora, eggs, sperm, planula larvae (1 and 4 d post-fertilization) were collected and preserved with TRIzol reagent (Thermo Fisher Scientific) at -80℃ until use. Total RNA was extracted from preserved eggs, sperm, and planula larvae as in Yoshioka et al. [56]. KAPA RNA HyperPrep Kits (Kapa Biosystems) and MGIEasy RNA Directional Library Prep Sets (MGI) were used for total RNA and mRNA sequencing library preparation, and each library was sequenced on a NovaSeq 6000 in 150-bp paired-end and a DNBSEQ-G400RS (MGI) in 100-bp paired-end mode. This information is summarized in Additional file 5: Table S5.
efflorescens gene models as a reference) using SALMON v1.0.0 [72]. Expression levels were quantified using SALMON v1.0.0. Genes with TPM > 1 were considered expressed. Then expressed genes were classified into corresponding gene families based on the above gene family inference.

Estimation of the ratio of nonsynonymous to synonymous substitutions
Protein sequences of putative single-copy orthologs between M. cactus and M. efflorescens were aligned pairwise with MAFFT [73]. Aligned nucleotide codon sequences without alignment gaps were retrieved using the PAL2NAL script [74]. Genes with nucleotide alignment lengths longer than 120 bp were used for further analysis. We calculated pairwise nonsynonymous (Ka) and synonymous (Ks) substitution ratios of singlecopy genes between M. cactus and M. efflorescens using KaKs_Calculator 2.0 [75] with option "-MA". Following Villanueva-Canas et al. [76], we discarded gene families showing Ks < 0.01, as such low Ks values may result in inaccurate Ka/Ks estimates, and gene families showing Ks or Ka > 2 indicating saturation of substitutions. Genes exhibiting Ka/Ks ratios with p < 0.05 (Fisher's exact test) were used for further analysis. Subcellular localization of gene families showing Ka/Ks > 1 was predicted using the DeepLoc-1.0 online server [77].

Statistical analysis
Pairwise proportion tests were conducted to compare lineage-specific gene families ("number of lineage-specific gene families" / "number of gene families in lineage") and gene annotation proportions of lineage-specific gene families ("number of genes with annotation" / "number of genes without annotation"). Fisher's exact test was conducted to identify expanded gene families in each group ("number of genes in one gene family in species A" / "number of genes in the rest of the gene family in species A" versus "number of genes in one gene family in species B" / "number of genes in the rest of the gene family in species B"). We considered a p < 0.05 as significantly expanded. The Wilcoxon rank sum test was conducted to compare median Ka/Ks values between gene family groups. All statistical tests were performed in R v4.0.3 [78].