Assessing the utility of Hsp90 gene for inferring evolutionary relationships within the ciliate subclass Hypotricha (Protista, Ciliophora)

Although phylogenomic analyses are increasingly used to reveal evolutionary relationships among ciliates, relatively few nuclear protein-coding gene markers have been tested for their suitability as candidates for inferring phylogenies within this group. In this study, we investigate the utility of the heat-shock protein 90 gene (Hsp90) as a marker for inferring phylogenetic relationships among hypotrich ciliates. A total of 87 novel Hsp90 gene sequences of 10 hypotrich species were generated. Of these, 85 were distinct sequences. Phylogenetic analyses based on these data showed that: (1) the Hsp90 gene amino acid trees are comparable to the small subunit rDNA tree for recovering phylogenetic relationships at the rank of class, but lack sufficient phylogenetic signal for inferring evolutionary relationships at the genus level; (2) Hsp90 gene paralogs are recent and therefore unlikely to pose a significant problem for recovering hypotrich clades; (3) definitions of some hypotrich orders and families need to be revised as their monophylies are not supported by various gene markers; (4) The order Sporadotrichida is paraphyletic, but the monophyly of the “core” Urostylida is supported; (5) both the subfamily Oxytrichinae and the genus Urosoma seem to be non-monophyletic, but monophyly of Urosoma is not rejected by AU tests. Our results for the first time demonstrate that the Hsp90 gene is comparable to SSU rDNA for recovering phylogenetic relationships at the rank of class, and its paralogs are unlikely to pose a significant problem for recovering hypotrich clades. This study shows the value of careful gene marker selection for phylogenomic analyses of ciliates.


Background
Evolutionary relationships of many ciliated protists (ciliates) remain unknown due to difficulties in species identification caused by their small size, complex morphological characters and high species diversity, although this topic is vital for understanding the evolution of life on Earth [1][2][3][4]. Most published molecular phylogenetic trees of ciliates are based on rDNA sequences, including the small subunit rDNA (SSU rDNA), large subunit rDNA (LSU rDNA) and the internal transcribed spacer (ITS) region [5][6][7][8]. All presently used markers give rise to artifacts in phylogenetic reconstructions resulting in the recovery of ambiguous relationships in gene trees [9,10]. Analysis of sequences of multiple unlinked loci having different evolutionary histories, as opposed to linked SSU rDNA, LSU rDNA, and ITS regions, is necessary since phylogenetic bias might be common to loci that are linked [11]. Hence, protein-coding genes, such as alpha-tubulin, actin, Hsp70 and histone H4 genes, are increasingly used to reconstruct ciliate phylogenies [1,3,[12][13][14][15][16][17]. With the development of phylogenomics, it is now possible to reconstruct phylogenetic relationships among ciliates based on analyses of hundreds of proteincoding genes [1,[18][19][20][21][22]. However, only a few of these protein-coding gene markers in comparatively few taxa have been tested for their suitability as candidates for inferring phylogenetic relationships among ciliates [23][24][25].
It has previously been shown that heat shock protein 90 (Hsp90) can resolve evolutionary relationships among eukaryotic taxa at high taxonomic rank, including alveolates and deep-branching dinoflagellates [26][27][28][29], and possibly also at genus level, e.g., Paramecium and Tetrahymena [30]. Furthermore, Hsp90 is ubiquitous and highly conserved in eukaryotic cells making it easy to amplify using PCR [31,32]. As in other protein-coding genes used for molecular phylogenies, paralogs caused by gene duplication, especially recent ones, might be the biggest disadvantage of using Hsp90 in phylogenetic analyses [3,33,34]. Notably, gene duplications in ciliates are common due to extremely high copy numbers of genes caused by unique features such as nuclear dimorphism [6,35,36]. Hence, it is necessary to test whether Hsp90 gene paralogs will confound phylogenetic analyses before this gene is widely used for determining evolutionary relationships among ciliates.
The subclass Hypotricha is mainly defined by the patterns of ventral cirri arranged either in longitudinal files or in scattered groups and represents one of the most diverse groups within the Ciliophora [37]. This subclass has been the subject of many taxonomic revisions, particularly in the past three decades [1,[38][39][40][41][42][43]. Phylogenetic relationships within the Hypotricha are still poorly understood and the rapidly growing molecular phylogenetic studies have consistently questioned monophylies and assignments of many taxa [44][45][46].
In the present study, phylogenetic analyses based on the Hsp90 gene were carried out for 10 species (11 populations) of hypotrichs. The main aims were to examine whether Hsp90 is a suitable gene marker for resolving evolutionary relationships among ciliates using Hypotricha as an example, and to re-evaluate phylogenetic relationships within the subclass Hypotricha based on new Hsp90 sequence data.

Hsp90 gene sequences of hypotrich ciliates
We obtained a total of 87 novel Hsp90 gene sequences from 11 populations of hypotrichs (Table 1). In order to avoid data redundancy we only used 85 distinct sequences in downstream analyses. The amplified fragment length of the Hsp90 gene was 1089-1119 base pairs (bp) with 41.2-60.1% GC content of which 543 sites (47.4%) were variable. Numbers of variable sites were highly divergent within each population, ranging from 16 (1.4%, Pseudokeronopsis rubra) to 216 (18.9%, Ponturostyla enigmatica). The third codon position exhibited the highest level of variation (42.0%). A total of 74 distinct amino acid sequences were detected among the 85 distinct nucleotide sequences. The 11 identical amino acid sequences were removed. The numbers of parsimony-informative and variable sites varied greatly at intra-population level. For instance, only one parsimony-informative site and five variable sites were present in Pseudokeronopsis rubra, while 48 and 67, respectively, were observed in Ponturostyla enigmatica (Fig. 1a).

Phylogenetic trees
In the Hsp90 amino acid (HSP90-Amino) tree, Oligohymenophorea and Spirotrichea were the only monophyletic classes with more than one representative species included in the analyses (Fig. 2). Another two classes, i.e. Heterotrichea and Phyllopharyngea, contain only one sequence each. Within Hypotricha, the orders Sporadotrichida and Urostylida were both monophyletic and order Stichotrichida was represented by only one species. Two out of six hypotrich families (Oxytrichidae and Urostylidae) represented by more than one species were polyphyletic. At the genus level, there was no sister relationship The topologies of the Hsp90 nucleotide (HSP90-Nuc) trees ( Fig. 3) and Hsp90 nucleotide trees without third codon positions (HSP90-Nuc12) (Fig. 4) corresponded closely with that of the HSP90-Amino tree (Fig. 2), although there were three notable differences. Firstly, sequences of Urosoma caudata p1 and U. caudata p2 were separated from each other within a large clade and both populations appeared to be monophyletic in both the HSP90-Nuc and HSP90-Nuc12 trees, whereas they clustered together in the HSP90-Amino tree. Secondly, the clade comprising sequences of Chilodonella uncinata (DQ662856) occupied the basal position in the HSP90-Nuc and HSP90-Nuc12 trees, but it grouped with Oligohymenophorea and Heterotrichea in the HSP90-Amino tree. Thirdly, the subclass Hypotricha was polyphyletic in the HSP90-Nuc and HSP90-Nuc12 trees, but monophyletic in the HSP90-Amino tree.
The topology of the SSU-rDNA tree (Fig. 5a) was almost congruent with the HSP90-Amino tree (Fig. 2). The major difference was that the two Urosoma species clustered together in the SSU-rDNA tree, whereas they were separated into different clades in the HSP90-Amino tree.
The topology of the SSU-HSPNuc12 tree (Fig. 5b) was nearly identical to that of the SSU-rDNA tree (Fig. 5a) except that Urostylida/Urostylidae was monophyletic only in the SSU-HSPNuc12 tree (Fig. 5b). The monophyly of the subclass Hypotricha was also recovered in SSU-HSPNuc12 tree which is consistent with the HSP90-Amino and SSU-rDNA trees (Figs. 2 and 5a).

Discussion
Is Hsp90 gene a suitable gene marker for inferring hypotrich phylogeny?
Previous studies have shown that the best way to evaluate the utility of a gene marker is to compare its congruence with other data [9]. Some researchers have suggested that Hsp90 is as good or better than any other single gene marker for inferring eukaryote phylogeny [29]. Here, the reliability of the Hsp90 gene is compared with other gene markers, especially SSU rDNA, based on accepted monophyletic groups at different taxonomic ranks. In the present investigation the class Oligohymenophorea was monophyletic, which is consistent with previous reports based on a variety of gene markers [3,6,54,55]. In contrast, the subclass Hypotricha was non-monophyletic in   [44,[47][48][49][50][51][52][53] suitable for the classification of ciliates. The subclass Hypotricha was monophyletic in most previous molecular phylogenetic trees based on a variety of gene markers [6,[56][57][58][59], although in a few studies it was nonmonophyletic [1,6,33]. The utility of the Hsp90 gene is comparable to that of SSU rDNA at class level. Out of three hypotrich genera for which Hsp90 gene sequences from more than one species were available, two (Urosoma and Ponturostyla) were paraphyletic ( Figs. 2 and 3). This suggests that the Hsp90 gene might lack sufficient phylogenetic signal for inferring evolutionary relationships at the genus level. Due to the lack of taxon coverage, we were unable to determine whether the Hsp90 gene yields sufficient phylogenetic signal to determine relationships at order or family levels. The order Sporadotrichida, and two families for which sequences from multiple species were available, were polyphyletic (Figs. 2, 3, 4, 5).  [44,[47][48][49][50][51][52][53] This is consistent with previous findings based on SSU rDNA, actin and alpha-tubulin gene markers [43,44,46].
Multiple distinct Hsp90 nucleotide sequences were detected in every species, probably as a result of gene duplications. Of the 11 populations represented by more than one sequence, eight formed a well-supported clade in the HSP90-Nuc tree, the exceptions being Pseudokeronopsis erythrina, Ponturostyla sp. and Ponturostyla enigmatica (Fig. 3). This indicates that the Hsp90 gene duplications are recent and might have occurred after the separation of populations, suggesting that Hsp90 gene paralogs are unlikely to pose a substantial problem in defining hypotrich clades. Similarly, only recent duplication events have been detected in ciliates for the alpha-tubulin gene [17], this being the most widely used protein-coding gene for inferring evolutionary relationships in a range of ciliate groups [1,13,16,17,25]. Some studies, however, have concluded that the alpha-tubulin gene might not provide sufficient phylogenetic signal to resolve evolutionary relationships within some groups, e.g., Spirostomum, possibly due to the especially pronounced purifying selection [16,60,61]. In the case of the actin gene, the presence of ancient gene duplications  [44,[47][48][49][50][51][52][53] and actin isoforms being too divergent means that it is not possible to resolve evolutionary relationships below the rank of class [3]. These findings indicate that gene duplication patterns of different protein-coding genes are variable depending on the gene and the ciliate group. More studies are therefore needed to test more genes and more ciliate groups.

Phylogenetic relationships within the subclass Hypotricha
In the present study, each of the orders Sporadotrichida and Stichotrichida were paraphyletic and support values for some clades were low (Figs. 2, 3, 4, 5). This is consistent with previous molecular phylogenetic studies [3,[62][63][64][65][66][67]. It seems that gene markers with sufficient phylogenetic signal to recover the monophyletic Hypotricha are currently unavailable, although several (e.g., SSU rDNA, LSU rDNA, ITS and alpha-tubulin) have been widely used. This is reasonable considering that classification systems based on morphology, morphogenesis and gene sequences are not concordant with each other, probably due to the high diversity of ciliary and other somatic structures, as well as various modes of cortical development, within the Hypotricha [38][39][40][41][42][43]68]. Currently, Hsp90 gene sequence data are available for only one species of Stichotrichida, namely Hypotrichidium paraconicum. Therefore, relationships within this order could not be analyzed. In contrast, several important phylogenetic relationships are recovered within the orders Sporadotrichida and Urostylida.

Sporadotrichida
It has been suggested that the family Oxytrichidae, which is characterized by typically having 18 frontventral-transverse (FVT) cirri, should be divided into  [48][49][50] two subfamilies, the Oxytrichinae, with a flexible body, and the Stylonychinae, with a rigid body [38,39]. All genera of Oxytrichidae included in the present study, i.e., Oxytricha, Urosoma, Ponturostyla and Hemigastrostyla belong to the subfamily Oxytrichinae. However, the subfamily Oxytrichinae was not monophyletic in any of the trees (Figs. 2, 3, 4, 5). This is consistent with previous studies based on SSU rDNA [69], actin [3] and alpha-tubulin genes [17], suggesting that the ontogenetic character which characterizes this group, viz. the participation of the posteriormost postoral ventral cirrus in primordia formation, might not be a synapomorphy. Furthermore, approximately Unbiased (AU) tests performed on HSP90-Nuc, HSP90-Nuc12, HSP90-Amino, SSU-rDNA and SSU-HSPNuc12 datasets rejected the monophyly of the subfamily Oxytrichinae (P < 0.05) ( Table 2). The monophyly of the genus Urosoma was not supported in the HSP90-Nuc, HSP90-Nuc12, HSP90-Amino and SSU-HSPNuc12 trees, although this was not rejected by AU tests (P > 0.05, Table 2) (Figs. 2, 3, 4, 5b). Furthermore, no support values were revealed for the clade containing two Urosoma species shown in ML analyses based on the SSU-rDNA dataset (Fig. 5a). It is noteworthy that the monophyly of the genus Urosoma was not recovered in previous studies based on SSU rDNA sequences [47,48]. The genus Urosoma might therefore be an artificial assemblage and its synapomorphies (i.e., frontoventral cirri arrange in a row with anterior cirrus (III/2) located slightly to the left, postoral ventral cirri in a dense cluster behind the buccal vertex, usually with two pretransverse ventral, five transverse cirri, one right and one left row of marginal cirri, and caudal cirri present) might be plesiomorphies of the subclass Hypotricha [38,48]. Morphological, morphogenetic and molecular data for more taxa are required in order to resolve the systematics of Sporadotrichida.

Urostylida
It is not surprising that Pseudoamphisiella was separated from the "core" Urostylida in most our trees (Figs. 2, 3, 4, 5a) since phylogenetic trees based on various gene markers have previously revealed the fragmentation of the order Urostylida into a "core" group and the rest [1,46,70,71]. Considering that the monophyly of the "core" urostylid group was consistently recovered and other urostylid taxa usually had no robust assignment in our phylogenetic trees, we suggest that the morphological definition of the order Urostylida needs to be further refined.

Conclusions
Our investigation indicates that, though lacking sufficient phylogenetic signal at the genus level, the Hsp90 gene is comparable to SSU rDNA for recovering evolutionary relationships at the rank of class and that its paralogs are unlikely to pose a significant problem for recovering hypotrich clades. The Hsp90 gene therefore has significant potential utility for determining the systematics of ciliates. The findings of this study also suggest that careful selection of nuclear protein-coding gene markers is needed for phylogenetic analyses of ciliates.

Methods
Culturing, DNA extraction, PCR amplification, and sequencing All the ciliates used in this study are listed in Table 3. Pure cultures were maintained in Petri dishes at room temperature (approximately 25°C) with rice grains to stimulate the growth of bacteria as food for the ciliates [72]. One or more cells of each culture were repeatedly washed in sterile water with the same salinity as that of the sampling site. Genomic DNA was extracted using REDExtract-NAmp Tissue PCR Kit (Sigma, St. Louis, MO, USA). PCR amplifications of the Hsp90 genes were performed using a TaKaRa Ex Taq DNA Polymerase Kit (TaKaRa Biomedicals, Japan). The primers used for Hsp90 gene amplification were Hsp90F4 (5′-CGGCAC GTTCTACWSNAAYAARGA-3′) and Hsp90R3 (5′-GGTCTTTCTTCTGGCGTGTTCAGTGTA-3′) [26]. PCR conditions were: 2 min initial denaturation (95°C), After confirmation of the amplified DNA by 1.0% agarose gel, a single bright band containing the target DNA was purified using the Universal DNA purification

Sequence analyses and construction of phylogenetic trees
Five datasets were included in the phylogenetic analyses: (1) HSP90-Nuc (98 sequences in total, i.e., Hsp90 nucleotide sequences including 85 that were distinct and newly obtained from the present study together with all 10 ciliate sequences available from the GenBank database); (2) HSP90-Amino (98 sequences in total, i.e., corresponding amino acid sequences of HSP90-Nuc); (3) HSP90-Nuc12 (98 sequences in total, i.e. HSP90-Nuc including only first two codon positions); (4) SSU-rDNA (15 sequences in total, i.e., corresponding SSU rDNA sequences of species in Dataset HSP90-Nuc, expect for three that were absent in GenBank, namely Oxytricha trifallax, Newallackia sp. and Blepharisma intermedium); (5) SSU-HSPNuc12 (15 sequences in total: two-gene combined dataset including SSU-rDNA and HSP90-Nuc including only first two codon positions). In each dataset, three species of Apicomplexa were chosen as the outgroup.
The Hsp90 nucleotide sequences were translated into amino acid sequences by DAMBE 6.3.0.1 software [73]. Intra-and inter-population genetic distances were calculated using MEGA 7.0 [74]. Mean pairwise nucleotide distances were calculated using the Kimura 2-parameter correction model [75]. Mean pairwise amino acid distances were calculated using the Poisson model [76]. Multiple sequence alignments of all nucleotide and amino acid sequences were performed with CLUSTAL W implemented in BioEdit v.7.0.1 [77] and then manually modified in order to trim both ends. Alignments used for subsequent phylogenetic analyses included the following numbers of positions: 1743 (HSP90-Nuc), 386 (HSP90-Amino), 1162 (HSP90-Nuc12), 1806 (SSU-rDNA), and 2968 (SSU-HSPNuc12). Models for nucleotide datasets were selected under the Akaike information criterion (AIC) by MrModeltest [78]. GTR + I + G was the best-fitted model for datasets HSP90-Nuc and SSU-rDNA. GTR + G was the best-fitted model for datasets HSPNuc12, SSU-HSPNuc12, as well as first and second codon positions of Hsp90 nucleotide sequences. Blo-sum62 + I + G was the best-fitted model for the amino acid dataset (HSP90-Amino) selected by AIC as implemented in ProtTest 1.4 [79]. Bayesian inference (BI) analyses were performed with MrBayes 3.1.2 [80]. Markov chain Monte Carlo (MCMC) simulations were run with two sets of four chains for 1,000,000 generations with trees sampled every 100 generations. The first 2500 trees were discarded as burn-in. The remaining trees were retained to generate a consensus tree and to calculate the posterior probabilities (PP) of all branches using a majority-rule consensus approach. A maximum likelihood tree (ML) was constructed using online software RAxML-HPC2 on XSEDE (http://www.phylo.org/). The branches of the resulting tree were evaluated by the GTRGAMMA model of nucleotide substitution and the PROTGAMMA model of amino substitution. MEGA7.0 [74] was used to visualize tree topologies. Terminology and systematic classification follow Gao et al., (2016) [1] and Lynn (2008) [43].
Additional file 1: Supplementary Table S1. Inter-specific genetic distances of Hsp90 nucleotide and amino acid sequences.