Skip to main content

Transcriptomics provides a robust framework for the relationships of the major clades of cladobranch sea slugs (Mollusca, Gastropoda, Heterobranchia), but fails to resolve the position of the enigmatic genus Embletonia

Abstract

Background

The soft-bodied cladobranch sea slugs represent roughly half of the biodiversity of marine nudibranch molluscs on the planet. Despite their global distribution from shallow waters to the deep sea, from tropical into polar seas, and their important role in marine ecosystems and for humans (as targets for drug discovery), the evolutionary history of cladobranch sea slugs is not yet fully understood.

Results

To enlarge the current knowledge on the phylogenetic relationships, we generated new transcriptome data for 19 species of cladobranch sea slugs and two additional outgroup taxa (Berthella plumula and Polycera quadrilineata). We complemented our taxon sampling with previously published transcriptome data, resulting in a final data set covering 56 species from all but one accepted cladobranch superfamilies. We assembled all transcriptomes using six different assemblers, selecting those assemblies that provided the largest amount of potentially phylogenetically informative sites. Quality-driven compilation of data sets resulted in four different supermatrices: two with full coverage of genes per species (446 and 335 single-copy protein-coding genes, respectively) and two with a less stringent coverage (667 genes with 98.9% partition coverage and 1767 genes with 86% partition coverage, respectively). We used these supermatrices to infer statistically robust maximum-likelihood trees. All analyses, irrespective of the data set, indicate maximal statistical support for all major splits and phylogenetic relationships at the family level. Besides the questionable position of Noumeaella rubrofasciata, rendering the Facelinidae as polyphyletic, the only notable discordance between the inferred trees is the position of Embletonia pulchra. Extensive testing using Four-cluster Likelihood Mapping, Approximately Unbiased tests, and Quartet Scores revealed that its position is not due to any informative phylogenetic signal, but caused by confounding signal.

Conclusions

Our data matrices and the inferred trees can serve as a solid foundation for future work on the taxonomy and evolutionary history of Cladobranchia. The placement of E. pulchra, however, proves challenging, even with large data sets and various optimization strategies. Moreover, quartet mapping results show that confounding signal present in the data is sufficient to explain the inferred position of E. pulchra, again leaving its phylogenetic position as an enigma.

Background

Marine Heterobranchia (Gastropoda) have become a major focus in monitoring reef biodiversity [1,2,3,4,5]. They mainly prey on a high variety of marine sessile organisms, from algae to sponges, cnidarians, bryozoans and tunicates, and very often take up the chemical compounds of the food for their own defence. These “stolen” compounds have become of high interest for pharmacists in finding new drug leads for medical applications [6,7,8,9,10]. However, they are also of high interest because of various unique biological phenomena. Sacoglossans and some nudibranchs incorporate chloroplasts or dinoflagellates and thus can serve as model organisms in understanding the evolution of photosymbiosis [11,12,13]. Within marine Heterobranchia, the shell-less Nudibranchia have developed a variety of biological strategies that make them unique within Metazoa. Of particular interest is the sequestration of cnidocysts from the cnidarian prey, storing them in special morphological structures (cnidosacs) in exposed body areas, and the ability to mature the stolen cnidocysts (cleptocnides) in the cnidosac [14,15,16,17]. This unique defence system seems to have evolved only in one of the major nudibranch clades, the Cladobranchia, within which there are likely two independent origins [17].

Nudibranchia, with the two clades Cladobranchia and Anthobranchia, form a monophyletic group that is well explained by morphological features [18]. The sister group relationship to Pleurobranchomorpha (Pleurobranchida) as well as monophyly of Nudibranchia was shown with transcriptomic data by Zapata and colleagues [19] and was again later confirmed by additional data [20]. The monophyly of Nudibranchia has also been confirmed in various molecular analyses using larger taxon sets, albeit small gene sets (see review in [21]). However, few studies have used both morphological and molecular methods to obtain and explain phylogenetic relationships within Cladobranchia. A comprehensive study of Anthobranchia (Doridida) applying both molecular phylogenetic and ontogenetic data was published recently [22]. Similar studies are still lacking for Cladobranchia.

Cladobranchia comprise seven superfamilies (World Register of Marine Species [23]) and two unassigned families, Goniaeolididae and Heroidae. Two of the superfamilies, Aeolidioidea and Fionoidea, are usually united under the name Aeolidida, with the possession of the cnidosac as one shared character [18]. Dendronotoidea and Tritonoidea were united in former times under the name Dendronotacea [24], and Proctonotoidea and Arminoidea under the name Arminacea [25]. However, many morphological and molecular studies have contested the hypotheses of Dendronotacea and Arminacea (see review in [21]), but this is not completely disregarded according to ancestral state reconstruction [26]. The enigmatic Doridoxoidea were considered for some time as the sister taxon of all other Cladobranchia, representing the Dexiarchia concept in the sense of [27]. However, it has been shown that Doridoxidae is not a sister taxon to, but part of the Nudibranchia s. str. (Cladobranchia) [28, 29].

Pola and Gosliner [30] aimed to resolve the phylogeny of Cladobranchia using one nuclear (Histone 3) and two mitochondrial genes (cytochrome c oxidase subunit I and 16S rRNA): the study resulted in a topology that primarily consisted of an unresolved polytomy on family level. Several further studies focussed on different parts of the cladobranch tree. Bleidissel [31] analysed the Aeolidida within the Cladobranchia, based on three genes (18S, 16S, and CO1), in order to investigate the evolution of the incorporation of algae from the genus Symbiodinium in certain sea slugs. In that study, for the first time, the paraphyly of the aeolidid family Facelinidae was shown. Recently, by the inclusion of the type species of the genus Facelina, Facelinidae sensu stricto was revealed and the name Myrrhinidae resurrected for the second “facelinid” clade [32]. Korshunova and colleagues [33] studied the relationships within the former Flabellinidae, including representatives of many Aeolidida. The authors provided much evidence for the paraphyly of the former Flabellinidae, which they then split into five different families. More recently, Korshunova and Martynov [26] were able to resurrect old genus names (e.g., Duvaucelia) and proposed a new genus Tritonicula after a careful analysis of the Tritoniidae. They also confirmed the inclusion of Doridoxa within the Cladobranchia. Detailed studies on the general taxonomic patterns of Fionoidea including the families Abronicidae, Calmidae, Cuthonellidae, Cuthonidae, Eubranchidae, Fionidae s. str., Tergipedidae, Trinchesiidae, Xenocratenidae, and Murmaniidae are presented in [34, 35]. All molecular analyses of these studies are based on only a few genes.

Recent analyses, using a large transcriptomic data set, provided the first robust cladobranch tree that enabled the study of evolution of food preferences [36, 37]. In a subsequent study, a broader data set with nearly 90 taxa was used to examine the evolution of the cnidosac [17], which is the main defence system of Aeolidida [35]. Within some taxa (e.g., members of the genus Phestilla of the family Trinchesiidae) cnidosacs are secondarily reduced [34]. Similar defence structures have evolved independently in Hancockia [14], a genus assigned to Dendronotida [17]. However, the authors based their interpretations on a phylogenetic tree with partly low statistical support. Moreover, a few taxa showed relatively long branches compared to other members of the family (Cerberilla) or even the same genus (Janolus). Therefore, bias due to possible long branch artefacts cannot be excluded. A reduced data set was used by Goodheart and Wägele [39] to study the taxonomic relationship of an enigmatic pelagic cladobranch, the genus Phylliroe, to analyse morphological traits enabling a shift from a benthic life style into a pelagic form. Robustly resolved and reliably inferred phylogenetic trees that are not affected by confounding signal, but driven by informative phylogenetic signal, are one prerequisite for answering questions about the evolutionary history of taxa and biological phenomena, such as the aforementioned evolution of the cnidosac and photosymbiosis. Therefore, only trees that reflect most likely the “true” history of species allow the inference of biological traits to understand biodiversity and its origin. Inferred trees resulting from methodological or computational inadequacy can lead to erroneous hypotheses (see, e.g., [40]). Taxa that diversified quickly and/or underwent rapid radiation events within a short period of time are especially difficult to analyse (see, e.g., [41] and several examples in [42]).

Here, we present a thorough study on 57 cladobranch and four outgroup transcriptomes in order to obtain a statistically highly supported phylogenetic tree and to check whether or not ambiguous splits in this tree might be based on confounding and thus erroneous signal. After discarding three species with a low coverage of the ortholog set as well as two species due to model violation, we compiled four final data sets. All data sets included 56 out of the original 61 species but differed in their alignment completeness and gene partition coverage. Based on the most complete data set, we comprehensively examined the ambiguously inferred position of Embletonia, which has been assigned to various groups in the past without any current consensus [15, 17, 31, 43, 44], for alternative topologies with approximately unbiased (AU) tests [45], Four-cluster Likelihood Mapping [46, 47], and sampling puzzling [48] approaches.

Results and discussion

Data preparation prior to phylogenetic analyses

A list with details on the 21 species with newly sequenced transcriptome data is provided in Additional file 2: Table S1. Accession numbers for all species are given in Additional file 2: Table S2. Higher species affiliation follows the World Register of Marine Species (WoRMS), with the exception of the clade of all aeolids, which we call Aeolidida.

Transcriptome sequencing and data processing

Paired-end sequencing resulted in approximately 7.5 Gbases of raw data per sample. For the newly generated transcriptomes, the number of complete read pairs ranged from 20,266,817 in Calmella cavolini to 43,524,035 in Facelina rubrovittata with a median of 24,882,673 (Hancockia cf. uncinata). After trimming of possible adapter sequences and sequence regions of low quality, the average read length of complete read pairs ranged from 118.1 bp in Hermissenda emurai to 139.6 bp in Doto sp. with a median of 133.8 bp in Polycera quadrilineata (Additional file 2: Table S3). Details on sequence processing are provided in Additional file 1. Transcriptome assembly using six different de novo assemblers per data set resulted in a total number of 366 assemblies, i.e., six assemblies for each of the 61 transcriptomic data sets (see Additional file 1 and Additional file 2: Table S4).

Evaluation of transcriptome assemblies, orthology prediction, and alignment procedures

Evaluation of assembled transcriptomes and subsequently applying BUSCO version 3.0.0 [49] with the Metazoa set including 978 orthologs revealed a median of 731 (75%) complete BUSCO genes per sample (maximum: 943 complete BUSCO genes [27 fragmented, 8 missing] in Caloria elegans; minimum: 158 complete BUSCO genes [123 fragmented, 697 missing] in Doris kerguelenensis). All quality assessment results of the transcriptomes using BUSCO are summarised in Additional file 2: Table S5.

We additionally evaluated the quality of all transcriptomes separately for each assembly method based on the results of orthology prediction and identified single-copy protein-coding genes with our custom-made ortholog set comprising 1992 orthologs (see “Methods” and Additional file 1). Results were ranked based on the cumulative length of transcripts that were successfully assigned to the reference genes used to identify single-copy orthologs (OGs) in the transcriptomes (see Additional file 2: Table S6). The cumulative lengths ranged from 82,409 bp in Pseudobornella orientalis (the genus was recently resurrected by Korshunova and colleagues [50]) (IDBA-Tran, 458 genes successfully assigned) to 784,043 bp in Caloria elegans (Shannon, 1904 genes successfully assigned). The median was 472,305 bp for the cumulative length and 1577 for the number of successfully assigned genes. The best assembly (according to the largest cumulative length) out of the six available per sample was selected as the representative transcriptome for the respective species and was submitted to NCBI. Transcriptomes accepted by NCBI after removal of possible foreign sequence contamination were used for all further downstream analyses (see Additional file 2: Tables S2 and S7). In order to reduce the amount of missing data in subsequent analyses we excluded three samples for which less than 60% of OGs included in the search had been identified: Pseudobornella orientalis, Dermatobranchus sp., and Tritoniopsis frydis. Furthermore, we only kept OGs for which at least 50% of the investigated 58 species had a positive hit. This resulted in 1767 OGs that we subsequently used to generate multiple sequence alignments (MSAs) on amino acid level. Checking the MSAs for outlier sequences (i.e., putatively misaligned or misassigned amino acid sequences), we identified 897 sequences in 112 MSAs that were subsequently removed. Outliers were found in sequences from all remaining 58 species with the highest number of 30 outlier sequences in Limenandra confusa and the lowest number of eight outlier sequences in Doris kerguelenensis (median: 15 outliers, all details are provided in Additional file 1 and Additional file 2: Table S8).

Alignment masking resulted in masking of alignment sites in 1,519 out of 1,767 genes (Additional file 1) leaving ~ 71% of aligned unmasked sites for subsequent analyses.

Compilation, evaluation and optimization of data sets

Analysing the concatenated supermatrix using MARE v. 1.2-rc [51], AliStat v. 1.6 [52] for information content and data coverage, and SymTest v. 2.0.47 [53] for putative violation of stationary, (time-)reversible and homogenous (SRH) model conditions [54, 55] using the implemented Bowker’s matched pairs test of symmetry [56] led to the results shown in Additional file 3: Figs. S1 and S2.

With respect to the amount and distribution of missing data we initially compiled two data sets as described in the methods section. The data set allowing for the highest amount of missing data, termed “original unreduced data set”, was not further reduced after concatenation and comprised 58 species, 771,739 aligned amino acid positions and 1767 gene partitions. The second data set with a full gene coverage for all 58 species (termed “original reduced data set”) comprised 143,859 aligned amino acid positions and 364 gene partitions. Analysing both data sets for violation of SRH model conditions with SymTest revealed that two species strongly violated the SRH conditions: Calmella cavolini and Doris kerguelenensis (Additional file 3: Fig. S2). Therefore, the sequences belonging to these two species were removed entirely from all MSAs from the original unreduced data set. This newly created data set (termed “unreduced data set”) spanned a superalignment length of 771,706 amino acid positions including 1767 gene partitions.

To reduce the amount of missing data, we compiled an "intermediate” data set featuring only those gene partitions for which at least one representative of the selected taxa was present (see “Methods”, Additional file 1, and Additional file 2: Table S9). This data set (termed “intermediate data set”) spanned a superalignment length of 271,732 amino acid positions and included 667 gene partitions. The third and most strict data set with full gene coverage for each of the 56 species (termed “strict data set”) had a superalignment length of 170,140 amino acid positions and included 446 gene partitions. This data set was further examined in an additional analysis using mixture models (i.e., without partition information). Finally, we compiled a fourth data subset with increased overall information content discarding less informative genes (hereafter called “strict SOS data set”) with all 56 taxa and an even further reduced number of 126,094 aligned sites and 335 gene partitions. Details on data matrix diagnostics are provided in Additional file 1, Additional file 2: Tables S10 and S11, and Additional file 3: Figs. S3–S8.

Phylogenetic relationships of sea slug taxa

All analyses irrespective of the data set indicate maximum statistical support for all major splits and phylogenetic relationships at the family level (Fig. 1, Additional file 3: Figs. S9–S17). Notably, low statistical support was inferred with regard to the phylogenetic position of the genus Embletonia. In the following, we discuss taxa relationships using the names according to the latest changes [57] that are implemented in World Register of Marine Species [23, 58], although we disagree with several assignments as discussed below.

Fig. 1
figure 1

Best ML tree (phylogram) from the strict data set. Maximum likelihood (ML) tree with bootstrap (BS) support values calculated on the strict data set (partitioned). The “men at work” sign illustrates the enigmatic position of Embletonia pulchra: The black dashed line shows the position of E. pulchra as it was recovered by the analysis of the strict (partitioned) and the unreduced data set. The red solid line shows the alternative position of E. pulchra as it was obtained from the analysis of the intermediate, the strict unpartitioned, and strict SOS data set. Black dots () indicate a BS support value of 100. The numbers represent splits that are discussed in the main text and the surrounding coloured circles represent Quartet Sampling (QS) scores for the corresponding split. QFreq. quartet frequencies, QC quartet concordance, QD quartet differential, QI quartet informativeness. Higher affiliation (except for the term Aeolidida) is according to the system as proposed in WoRMS

Phylogenetic relationships of major taxa and sea slug families

Out of the seven accepted superfamilies of Cladobranchia, we were able to include members of six superfamilies, whereas a representative of the rare Doridoxoidea, recently confirmed as sister to the Arminoidea [26], was not available to us. We inferred Aeolidida, Aeolidioidea (sensu WoRMS), Proctonotoidea, and Dendronotoidea, with representatives of various families and genera, as being monophyletic. This was fully supported by the quartet scores [48] for Aeolidida, Aeolidioidea, and Proctonotoidea, and strongly supported for Dendronotoidea (see Quartet Sampling scores, splits 1–3 and 8 in Fig. 1 and Additional file 2: Table S12). Arminoidea and Tritonioidea are only represented by one genus each. Thus, former results on monophyly [26] still need to be tested in future genomic analyses.

Our analyses revealed the following ambiguities: Flabellina affinis (type species of Flabellinidae), which is currently regarded as a representative of Fionoidea [17], is inferred as sister taxon to Aeolidioidea with maximal statistical support. Quartet sampling, on the other hand, showed only medium support (split 4 in Fig. 1, Additional file 2: Table S12) with the large majority of quartets (67%) supporting the focal branch (Aeolidioidea + Flabellina affinis), but the strong skew in discordance [quartet differential (QD) = 0] indicates the possibility of a single different evolutionary history supported by all remaining quartets. Although the type species was included in our analyses, additional data of further members of the family should be included in future studies to address a possible taxonomic revision.

The family Flabellinopsidae is currently listed as a member of the Aeolidioidea in WoRMS [58] with Flabellinopsis iodinea (Flabellinopsidae) being sister to all remaining taxa in this large clade, confirming previous results [17, 33, 36, 37]. Again, this position is statistically maximally supported by classical support values (BS, aLRT, aBayes) in our study and quartet sampling scores confirmed this position (split 5, Fig. 1) with strong support (94% of the non-uncertain quartets). Although a strong skew in discordance (QD = 0) indicates the possible presence of an alternative quartet relationship, this result is rather less meaningful due to the low number of discordant trees (5% of the non-uncertain quartets). Thus, our results on Flabellinidae and Flabellinopsidae partly contradict recent analyses and subsequent systematic assignments.

The majority of the family Facelinidae is inferred as being monophyletic, but the facelinid species Noumeaella rubrofasciata groups with Myrrhinidae in published analyses [17, 37] as well as in our study with nearly maximal classical statistical support. However, quartet sampling only shows weak support for this relationship (38% of the non-uncertain quartets; see split 6 in Fig. 1 and Additional file 2: Table S12). In fact, the quartet frequencies show no clear signal since all three quartet topologies are roughly equally supported (27% of the non-uncertain quartets support the second possible quartet topology, 36% support the third; QD = 0.85). Thus, the assignment of this species to Facelinidae [57] or Myrrhinidae (our results) should be reconsidered in future studies. Interestingly, this species did not cluster with several other Noumeaella species in a three-gene analysis of Aeolidida by Schillo and colleagues [59].

Within Aeolidioidea, the families Myrrhinidae—excluding Noumeaella rubrofasciata—and Aeolidiidae form a monophyletic sister group relationship in our study, thus confirming the results of [17, 37], and [32]. This is also consistent with recent morphological and molecular analyses using all acknowledged families of Aeolidida [60].

Fionoidea in the sense of Bouchet and colleagues [57] is polyphyletic, mainly due to the position of Flabellina affinis and Embletonia pulchra, the latter is discussed below. Within Fionoidea, the family Trinchesiidae, represented here with three genera, is monophyletic. Fionidae and Eubranchidae are related to Trinchesiidae. Unidentia is sister to these three families. A recent analysis [60] inferred Unidentiidae as sister taxon to Embletoniidae and a clade Embletoniidae + Unidentiidae as sister to all other clades of Aeolidida. Previous analyses did not reveal a robust placement of Unidentiidae and inferred this clade as sister to members of the Aeolidioidea (Facelinidae, Babakinidae, and Aeolidiidae), which in turn are sister to a larger group of Coryphellidae, Flabellinidae, Flabellinopsidae, and Paracoryphellidae [33]. In a subsequent study [61], Unidentiidae was inferred as sister taxon to a clade of Aeolidiidae, Babakinidae, Coryphellidae, Facelinidae, Flabellinidae, Flabellinopsidae, and Paracoryphellidae. In contrast, our quartet sampling analyses do not unambiguously support the relationship of the Unidentiidae as sister to other Fionoidea with a rather weak quartet support (52% of the non-uncertain quartets). The support for the other two possible quartet topologies is almost similar (QD = 0.99), which indicates that no alternative history is favoured (see split 7 in Fig. 1 and Additional file 2: Table S12). In this context, the results of Goodheart and colleagues [17] are quite noteworthy, because in their study, Unidentiidae is the sister taxon of Embletonia while the clade Embletonia + Unidentiidae is sister to all remaining Fionoidea. Results by Martynov and colleagues [60] suggest a sister group relationship to other aeolidoidean families, which is in part compatible with our results (see below).

The family Samlidae, represented by Luisella babai, is considered as being part of Fionoidea [17, 33]. In our study, however, it is inferred as sister to all remaining Aeolidida in all analyses with maximum classical statistical support as well as very strong quartet support (see split 8 in Fig. 1 and Additional file 2: Table S12): About 98% of the quartets support this relationship without evidence for alternative quartet topologies (QD = 1). Including more taxa representing more families of the Aeolidida is necessary to address and hopefully solve this incongruence of our data in comparison to published results.

With regard to Proctonotoidea, Tritonioidea, and Dendronotoidea, our results confirm the findings published by Goodheart and colleagues [17] with the family Embletoniidae being the only exception, as we will discuss below.

The phylogenetic position of Embletoniidae remains ambiguous

The monogeneric family Embletoniidae, which currently only comprises two recognized species, Embletonia pulchra and E. gracilis, has experienced a rich history since the first description of the genus Embletonia by Alder and Hancock [62], with Pterochilus pulcher Alder and Hancock, 1844 as type species. The authors considered this species as a link between cladobranch aeolids and panpulmonate sacoglossans, two taxa that are not closely related to each other, but show many convergent characters. Pruvot-Fol [43], who named the family for the first time, included members of Trinchesiidae, but assigned the whole clade as a “section” to the dendronotoid family Dotidae. The two recognized members of Embletonia share some characters with members of Fionoidea or Aeolidioidea, e.g., the reduction of the lateral teeth, the absence of rhinophoral sheaths [64], and the presence of a cnidosac at the end of the cerata, a synapomorphy of Aeolidida [18], which additionally favours a position within this clade. However, Martin and colleagues [15] and Goodheart and colleagues [17] have shown that this cnidosac differs to a great extent from the typical aeolidid cnidosac by lacking a proper sac-like structure with musculature around it, as well as a connection to the digestive gland, which is necessary for taking up sequestered cnidocysts. Nevertheless, cnidocysts were found in the structures investigated by Goodheart and colleagues [17]. The authors explain this atypical situation with a loss of characters or as constituting a transitional form in the evolution of the cnidosac. Most recently, Martynov and colleagues [60] provided evidence for paedomorphic processes (e.g., reduction of oral tentacles), which would explain a regressive evolution of Embletoniidae within Aeolidida. This phenomenon is known from various unrelated taxa inhabiting soft-bottom interstitial environments [60]. Embletonia feeds on hydrozoans, which is a typical food source of many aeolidids, but also of some dendronotoids. Unique to this genus are the cerata, which show bi- to quadrifid apices. Highly structured cerata are not known from any aeolidids. However, the digestive gland reaches far into these cerata, a character less pronounced in Proctonotoidea, and only present in one further non-aeolidid group, the genus Hancockia [65].

Embletonia also shares traits that are characteristic for non-aeolidid groups, a reason why Pruvot-Fol [43] included the genus into the family Dotidae (Dendronotoidea). This assignment to Dotidae, as well as grouping with Trinchesiidae was, however, rejected later by Schmekel [44], and the closer relationship to Dendronotoidea was emphasized by Miller and Willan [66]. The primary connecting character is the lack of oral tentacles, which are considered to be a synapomorphy of the traditional dendronotaceans [18, 26]. Furthermore, their oral gland ducts do not open into the oral tube by two separate ducts, but fuse into one common duct, which is described for Proctonotoidea. Proctonotoidea mainly feed on bryozoans, however, a few members also rely on hydrozoan prey, similar to Embletonia.

Few studies addressed the phylogenetic relationship of Embletoniidae using molecular data [17, 31, 60]. Bleidissel [31] focussed on Aeolidida and included Embletonia, because of its putative assignment to this group. Bleidissel’s analyses, based on three genes, inferred a sister group relationship of Embletoniidae with Notaeolidiidae, with the latter again being sister to all remaining Aeolidida. In the only study based on a large data set, Embletonia was inferred, along with Unidentia, within Aeolidida as sister to the remaining Fionoidea, thus excluding a closer relationship with Notaeolidia [17]. Martin and colleagues [15] included characters of the cnidosac into the morphological data matrix published by Wägele and Willan [18], and their analysis resulted in an assignment of Embletonia to Aeolidida (tree not shown in the publication). Likewise, our unpublished morphological analyses render Embletonia as a sister taxon to Aeolidida. However, it is more likely the lack of data that constrains the position than apomorphic characters of high phylogenetic information.

In our analyses comprising the unreduced and strict data set, Embletonia pulchra is inferred as sister to Proctonotoidea with full support in the unreduced data set (100 BS, 100 aLRT, 1 aBayes), but with negligible support in the strict data set (65 BS, 50.1 aLRT, 1 aBayes). When assuming that Embletonia is a sister taxon of the Proctonotoidea (see split 9 in Fig. 1 and position i in Fig. 2) and taking into consideration the studies on the evolution of prey preferences [37] and cnidocyst incorporation [17], we have to conclude that (1) feeding on Hydrozoa is an ancestral trait within Cladobranchia and has not changed in Embletonia (in contrast to Proctonotoidea) and (2) the evolution of the cnidosac might have started in the stemline of the clade Aeolidida/Proctonotoidea/Embletoniidae, with Janolus and Dirona probably representing a condition where the ability to store cnidocysts was lost due to a food switch to bryozoan prey. Both an independent evolution of cnidosacs and cnidocyst storage (in the genus Hancockia) as well as a loss or strong reduction of these complex structures has occurred within Dendronotoidea [17] and Aeolidioidea [34, 60].

Fig. 2
figure 2

Best ML tree (cladogram): AU tests + FcLM. Cladogram with summarised major families/clades and images of representative species. Splits for which additional testing was performed are marked with Roman numerals (i–vii) in a coloured circle (AU test) and a triangle (FcLM, splits i–iii). The original position of E. pulchra as obtained from the strict data set (partitioned) is marked by a blue branch (T1). Alternative positions of E. pulchra are indicated by a red (T2) and yellow branch (T3), respectively. We thank Craig A. Hoover for providing the picture of Flabellinopsis iodinea and Karen Cheney for permission to use the picture of Unidentia angelvaldesi

In our results from the intermediate data set, the unpartitioned strict data set analysed using a mixture model approach, and the strict SOS data set, Embletonia is a sister group to all remaining Aeolidida, but again with ambiguous support (intermediate data set: 51 BS, 33.1 aLRT, 1 aBayes; strict data set analysed with a mixture model approach (unpartitioned): 88 BS, 85.9 aLRT, 1 aBayes; strict SOS data set: 88 BS, 92.3 aLRT, 1 aBayes). Considering this relationship as a possible evolutionary scenario (Figs. 1, 2, position ii) means that the evolution of the cnidosac would have had to start in the stemline of Embletoniidae/Aeolidida, while the typical character of Dendronotoidea and Tritonioidea, the rhinophoral sheaths, had already been lost and oral tentacles had not yet evolved.

However, both discussed possibilities (see Figs. 1, 2, positions i and ii) are neither supported statistically by classical bootstrap values nor by our quartet analyses: Frequencies of the three possible quartet topologies are almost equal (33% vs. 35% vs. 31% of all non-uncertain quartets, split 9 in Fig. 1 and Additional file 2: Table S12), which indicates a highly complex evolution or rapid radiation.

Morphological analyses of important characters, like the positions of the anus, jaws, and radula also contradict both relationships discussed above with apomorphic features lacking for both hypotheses [60]. Instead, Embletoniidae shows an uniserial radula with central teeth more similar to various aeolidids [60].

Evaluation of alternative positions of Embletoniidae and possible confounding signal

To gain more insights into one of the obtained positions of Embletonia and to investigate alternative positions (see Fig. 2), further analyses were conducted. Note, that we consider the strict data set as most reliable, since it has full gene coverage for all species, following the rationale of Dell’Ampio and colleagues [67] and Misof and colleagues [47], who showed that inferred positions with high statistical support can be simply due to non-phylogenetic signal, e.g., the distribution of missing data. In addition, the strict data set shows the highest completeness scores of all data sets (see Additional file 2: Table S11). However, we also performed some of the analyses on the intermediate data set, the unpartitioned strict data set, and the strict SOS data set as described in the following.

We applied approximately unbiased (AU) tests [45] for alternative positions of Embletonia on the intermediate data set, both strict data sets (partitioned and unpartitioned), and the strict SOS data set. An AU test always takes the complete tree topology into account and not only single splits. Further, it does not test whether or not confounding signal is inherent in the data set, e.g., due to non-randomly distributed data and/or among-lineage heterogeneity violating SRH conditions. We therefore also applied Four-cluster Likelihood Mapping (FcLM) [46] along with a permutation approach on the strict (partitioned) data set. By testing all three possible quartet topologies around Embletonia we evaluated whether or not there was an alternative signal. Further, we checked for any sign of confounding signal (see [47]). To this end, we defined four groups (Additional file 2: Table S13) considering group 4 as an outgroup. We performed separate analyses for two outgroup variations: first, with 19 species including Anthobranchia and Pleurobranchomorpha and second, only with the 15 remaining cladobranch species. We drew quartets on the original data set and on three artificial data sets, from which any existing phylogenetic signal was removed in three different ways (see “Methods”, Additional file 1, and [47]): (a) by destroying the phylogenetic signal but leaving the distribution of missing data and the compositional heterogeneity, which can lead to violating SRH conditions, untouched; (b) by leaving the distribution of missing data untouched but making the data set completely homogenous (no SRH model violation possible), and (c) by randomizing the missing data distribution and making the data set completely homogenous. For all details see Additional file 1.

Interestingly, the results of the phylogenetic trees and the results of the FcLM (Additional file 2: Table S14) and AU tests (Additional file 2: Table S15) were quite contradicting:

(i) Although the best ML trees of the unreduced data set and strict (partitioned) data set suggest that Embletonia is sister to Proctonotoidea and although the AU test was unable to reject this topology (p > 0.05), it received the lowest proportion of quartets (< 20%) in the FcLM approach. Thus, this relationship can only be explained by confounding signal (see original and permutation results in Additional file 2: Table S14).

(ii) Although the best ML trees of the intermediate data set, the unpartitioned strict data set analysed with a mixture model approach, and the strict SOS data set suggest Embletonia to be sister to all remaining Aeolidida, a position that is not rejected by the AU test (p > 0.05), the FcLM results indicate only minimal support for such a relationship: the proportion of supporting quartets, excluding those that can be explained by confounding signal, was only around 3%. This also implies that AU tests, irrespective of whether or not a topology for the data set is significantly rejected, should not only be used to check if the signal is confounding.

(iii) A sister group relationship of Embletonia to a clade Aeolidida + Proctonotoidea, which received strongest support in the FcLM analyses (8–16% of all quartets after excluding the proportion of supporting quartets that can be explained by confounding signal, see Additional file 2: Table S14), was equally rejected by all AU tests.

There is only very little signal that is not confounding (around 3–8%, compare quartets of original with permuted approaches, Additional file 2: Table S14), which would support either Embletonia + Aeolidida (position ii in Fig. 2) or Embletonia as sister to a clade Aeolidida + Proctonotoidea (position iii in Fig. 2). Thus, these results clearly indicate that the position of Embletonia as a sister taxon of Proctonotoidea is most likely not due to informative phylogenetic signal, but mainly due to confounding signal in our data set, and again leaves the phylogenetic position of Embletonia as an enigma.

In order to analyse further possibilities of putative relationships of Embletonia, we tested four alternative positions (iv–vii, see Fig. 2) of Embletonia, which have been discussed in the literature before, by applying the AU test on the (partitioned) strict data set (see Fig. 2 and see below). Note that none of these positions were inferred in any of our ML analyses.

(iv) Since Embletonia exhibits characters, which are shared with the Dendronotoidea, we analysed a putative sister group relationship with this superfamily.

(v) Although an assignment to Tritonioidea is very unlikely, because Embletonia does not share all the characters special for this superfamily, the position of the Arminoidea is variable within the various published phylogenies [17, 68, 69] when including this superfamily. Nevertheless, we tested this possibility.

The last two tests imply a closer relationship of Embletonia with Fionoidea, a relationship that was assumed in former times and reflects the current systematics [57]. Therefore, we tested (vi) a position of Embletonia as sister to Fionoidea and (vii) Embletonia as sister to Unidentiidae and this clade being again sister to the remaining Fionoidea in restricted sense [17, 60].

AU tests significantly rejected (p < 0.05) all four alternative positions (iv–vii, see Fig. 2) of Embletonia (see Additional file 2: Table S15).

Despite our extensive molecular data sets and tests, we still cannot unambiguously assign Embletonia to one of the superfamilies in our tree. Beyond only small putative phylogenetic signal as indicated by our FcLM analyses, which is also in line with the negligible support considering classical statistical support, a reason could be the lack of relevant taxa in our data set that could positively influence the position of Embletoniidae in the cladobranch tree (e.g., Doridomorpha, Curnonidae, Notaeolidiidae). Achieving congruence between morphological and molecular data within the Cladobranchia is a task for future research.

Conclusions

Due to the high number of orthologous single-copy genes that could be successfully extracted from the transcriptomes, the high information content and up to full gene coverage of the supermatrices, and the high resolution of all phylogenies, we conclude that the use of transcriptomic data is a valuable tool for analysing phylogenetic relationships within Cladobranchia. Nevertheless, analyses of large data sets can be error-prone to systematic bias and classical support values might be inflated as has been shown and discussed [70,71,72,73]. Beyond careful data processing prior to phylogenetic tree inference, additional thorough tests, e.g., AU tests, quartet approaches like FcLM and quartet sampling as well as checks for confounding signal on a variety of different data matrices become more and more indispensable. Our study has revealed that, despite previous efforts, the position of some families within this group, especially the Embletoniidae, requires further investigation and possibly taxonomic revision. In future studies, the present data set should be extended by increasing the number of group-specific orthologous single-copy genes and by including Curnonidae, Notaolidiidae, and other relevant species to shed light on the relationships between families and superfamilies in Cladobranchia in order to draw a more complete image of the evolution of this enigmatic group.

Methods

An overview of the complete workflow is displayed in Fig. 3. Major steps are described here while all details and settings can be found in Additional file 1.

Fig. 3
figure 3

Analysis workflow. Schematic workflow representing all steps from NGS data to the testing of alternative topologies with major steps being highlighted in shades of gray

Taxon sampling and sampling of transcriptome data

For this study, we used recently published transcriptome data and generated new transcriptome data for 21 species. We collected 19 species of Cladobranchia and two more distantly related species of heterobranch sea slugs from different locations in the Mediterranean Sea and the Sea of Japan (Additional file 2: Table S1). The specimens were preserved in RNAlater (Qiagen) or IntactRNA (Evrogen) and stored at − 80 °C. The specimens collected on Elba island (Additional file 2: Table S1) were stored at − 20 °C for approximately two weeks and then transferred to − 80 °C until RNA extraction. RNA extraction of mostly whole organisms (for Armina tigrina only foot tissue was used) was performed using the Macherey & Nagel NucleoSpin RNA II kit. Preparation and amplification of the cDNA libraries were performed by StarSeq GmbH, Mainz using the Illumina TruSeq Stranded RNA HT kit. Paired-end sequencing was also conducted at StarSeq with a read length of 150 base pairs on an Illumina NextSeq 500 sequencing platform. Raw reads were submitted to the NCBI SRA database. All accession numbers are provided in Additional file 2: Table S2.

Our newly generated transcriptome samples were combined with the published transcriptome data of another 40 samples that we downloaded from the NCBI SRA database (Additional file 2: Table S2) [19, 36, 37, 74]. The published data comprised 37 species of Cladobranchia as well as two dorids, Prodoris clavigera and Doris kerguelenensis, and one pleurobranchid, Pleurobranchaea californica (Additional file 2: Table S2).

De novo transcriptome assembly

All raw sequence reads of published and newly generated samples were quality-checked prior to and after adapter trimming using FastQC Version 0.11.5 [75]. Adapter trimming and quality filtering were performed with Trimmomatic v. 0.36 [76] using a custom adapter file (see Additional file 4). Reads that were shorter than 80 bp after trimming, were removed from the read set.

Data from all 61 samples were assembled using six assembly tools: BinPacker v. 1.1 [77], IDBA-Tran v. 1.1.1 [78], Shannon v. 0.0.2 [79], SOAPdenovo-Trans v. 1.04 [80], Trans-ABySS v. 1.5.5 [81], and Trinity v. 2.4.0 [82]. All assemblers were run with default settings and all paired-end reads that survived the trimming process were used as input. We additionally provided surviving single-end reads to those assemblers that were capable of processing them (IDBA-Tran, SOAPdenovo-Trans, and Trans-ABySS).

Following identification of the best transcriptome assembly per species (see below), possible foreign contaminants were identified upon submission of the newly sequenced transcriptomes to NCBI Transcriptome Shotgun Assembly (TSA) database and subsequently removed from the sequences. Details are provided in Additional file 1 and Additional file 2: Table S7. The five alternative assemblies for each sample that has been sequenced in frame of this study are provided here: https://figshare.com/articles/dataset/Additional_file__of_Transcriptomics_provides_a_robust_framework_for_the_relationships_of_the_major_clades_of_cladobranch_sea_slugs_Mollusca_Gastropoda_Heterobranchia_but_fails_to_resolve_the_position_of_the_enigmatic_genus_Embletonia/17701594.

Orthology prediction and generation of data matrices

We designed a custom-made ortholog set by selecting all genes that were listed by OrthoDB version 9 [83] to be single-copy at the hierarchical level “Lophotrochozoa” and downloaded the respective table with the IDs of the ortholog groups (called OGs hereinafter). We additionally downloaded the official gene sets of three species with well-sequenced and annotated genomes, which we selected as reference species: Biomphalaria glabrata, Official Gene Set (OGS) version 1.2 vectorbase [84], Crassostrea gigas, OGS version Sep-2012 (ENA genebuild) [85], and Lottia gigantea, OGS version Jan-2013 (JGI genebuild) [86]. We excluded five genes from this set due to defective sequence headers, leading to a custom-made ortholog set comprising 1,992 orthologs. Orthology prediction was performed using Orthograph v.0.6.2 [87], for which we used the aforementioned ortholog set (Additional file 5). Details are provided in Additional file 1. To reduce the amount of missing data per species, three transcriptome assemblies that covered less than 60% of the ortholog set were excluded from further analyses: Pseudobornella orientalis (53% of the ortholog set missing), Dermatobranchus sp. (46% missing), and Tritoniopsis frydis (51% missing). We then removed all OGs for which less than 50% of the investigated species had a positive hit. This resulted in 1,767 OGs for further analyses.

The quality of all transcriptome assemblies was further assessed with BUSCO v3.0.0 using the metazoa_odb9 reference set genes comprising 978 BUSCO groups [49] and default settings (Additional file 2: Table S5). Because BUSCO’s general Metazoa data set is not very specific for nudibranchs and since there is no way to easily compile a nudibranch-specific reference data set (R. Waterhouse, personal communication), we devised a method that makes use of the output generated by Orthograph. For each Orthograph run, we calculated the number of sequences that were assigned to OGs by Orthograph as well as the cumulative length of these sequences. With the aim to maximize the amount of data, the latter was used as a criterion to determine the best assembly for each species (for details see Additional file 1, Additional file 2: Table S6).

Multiple sequence alignments on amino acid level (as automatically produced by Orthograph) were generated using DIALIGN-TX version 1.0.2 [88] and checked for outlier sequences using a newly implemented version of the outlier script described in [47] (see Additional file 1 for details; unfiltered alignments are provided in Additional file 6). Sequences identified as outliers as well as all sequences belonging to the three reference taxa were removed from the alignments.

The amino acid multiple sequence alignments were examined with the program Aliscore version 2.0 [89, 90] in order to identify ambiguous or randomly similar aligned sites. All positions flagged by Aliscore (~ 29% of the originally aligned sites, see Additional file 1) were discarded using AliCut version 2.31 [91] (Additional file 7). The resulting masked amino acid alignments were concatenated into a supermatrix along with the creation of a partition file using FASconCAT-G version 1.04 [92].

Compilation, evaluation and optimization of data sets

Based on the concatenated supermatrix, we compiled another matrix with the aid of the Perl script fasta2hypo (see Supplement of [47]) and kept only those gene partitions with sequence data for all 58 species, ensuring 100% partition coverage for each included species. Both amino acid supermatrices were analysed using the software tool MARE version 1.2-rc [51] in order to assess the potential information content (IC) of each gene partition, the overall information content of the matrices, and the coverage in terms of gene partitions. The tool AliStat version 1.6 [52] was used to calculate alignment diagnostics and the software SymTest version 2.0.47 [53,54,55] was used to analyse the compositional heterogeneity of the supermatrices in order to detect possible violations of stationary, (time-)reversible, and homogeneous (SRH) conditions [56].

To reduce especially among-lineage heterogeneity (see “Results and discussion”), we excluded the species Doris kerguelenensis and Calmella cavolini from our data (see Additional file 3: Figs. S2 and S8).

We repeated analyses with MARE, AliStat, and SymTest and compiled four final data sets, allowing different levels of missing data (Additional file 2: Table S11): (1) an unreduced data set with 56 species and all 1,767 gene partitions with 771,707 aligned amino-acid sites and allowing ~ 39% missing data; (2) an intermediate data set in which data for at least one representative of the defined groups (Additional file 2: Table S9) had to be present, which led to a data matrix of 56 species and 667 gene partitions (271,732 aligned sites) with 98% gene coverage and 18% of missing data, and (3) our most strict data set only including genes present in all 56 species. This led to a data matrix with 170,140 aligned sites, 446 gene partitions and less than 13% of missing data. Note that we used this data set without partitioning information in an additional analysis using mixture models (see below). Finally, we again increased the overall information content (IC) using MARE with default settings. By discarding less informative genes as identified by MARE this resulted in (4) a strict selected optimal subset (SOS) with all 56 taxa, 126,094 aligned sites, 335 gene partitions and circa 15% of missing data. Missing data can lead to confounding signal in phylogenetic inference [47, 51, 67]. We therefore consider our strict data set as most reliable. Details are provided in Additional file 1. All supermatrices are provided in Additional file 8.

Phylogenetic tree inference

For all four data sets, maximum likelihood (ML) trees were calculated using IQ-TREE version 1.6.12 [93]. The best fitting amino acid models for each partition were identified using ModelFinder [94], which was run using an edge-link partitioned approach [95]. Additionally, we performed a tree inference using a mixture model approach for the strict data set (i.e., without gene partitioning) using IQ-TREE. Again, ModelFinder was used to select the best mixture model. Out of 20 tree searches per data set, we selected the best ML tree according to the best log-likelihood. Statistical support was derived from non-parametric bootstrap replicates (BS) ensuring bootstrap convergence. Additionally, we calculated SH-like approximate likelihood ratio test (aLRT) support [96] and approximate Bayes test (aBayes) support [97]. The best ML tree of each of the three data sets was tested for the presence of rogue taxa using RogueNaRok v.1.0 [98]. Details for each step including used settings are provided in Additional file 1.

Testing for alternative topologies

Quartet sampling

To analyse phylogenetic discordance, we applied the Quartet Sampling (QS) method [48], which aims to identify the lack of branch support due to low phylogenetic information, discordance due to lineage sorting or introgression, and misplaced or erroneous taxa (rogue taxa). Details on the analysis and interpretation of scores are provided in Additional file 1, Additional file 2: Table S12, and Additional file 9.

Testing the position of Embletonia

Since the inferred position of Embletonia pulchra was not stable comparing the best ML trees of our various data sets, we tested various possible topologies with AU tests (see Fig. 2) [45] as implemented in IQ-TREE version 1.6.12 (see “Results and discussion” Additional file 1 and Additional file 10). To further analyse whether or not the placement of Embletonia in our best tree inferred from the strict data set was influenced by confounding signal and violating SRH conditions, and whether or not there was putative phylogenetic signal for alternative positions of Embletonia, we additionally performed Four-cluster Likelihood Mapping (FcLM), which is outlined in the results section and in detail in Additional file 1 (see also Additional file 11). In summary, we tested the following seven alternative hypotheses concerning the position of Embletonia:

  1. (i)

    Embletonia is sister to Proctonotoidea (AU test + FcLM)

  2. (ii)

    Embletonia is sister to all Aeolidida (AU test + FcLM)

  3. (iii)

    Embletonia is sister to (Aeolidida, Proctonotoidea) (AU test + FcLM)

  4. (iv)

    Embletonia is sister to Dendronotoidea (AU test)

  5. (v)

    Embletonia is sister to Arminoidea (AU test)

  6. (vi)

    Embletonia is sister to Fionoidea (AU test)

  7. (vii)

    Embletonia is sister to Unidentiidae and this clade is sister to remaining Fionoidea (AU test).

Availability of data and materials

The data sets supporting the conclusions of this article are available as additional files. The datasets generated and/or analysed during study are available at NCBI GenBank. All accession numbers are provided in Additional file 2: Table S2. The newly implemented version of the outlier script is available on github: https://github.com/alexdonath/checker_complete-v2. All additional scripts developed as part of this study are available on github: https://github.com/alexdonath/Embletonia

Abbreviations

aLRT:

Approximate likelihood ratio test

AU test:

Approximately unbiased test

BS:

Bootstrap

FcLM:

Four-cluster likelihood mapping

IC:

Information content

ML:

Maximum likelihood

MSA:

Multiple sequence alignment

OG:

Ortholog group

OGS:

Official gene set

QD:

Quartet differential

QS:

Quartet sampling

SRH:

Stationary, (time-)reversible and homogenous

TSA database:

Transcriptome Shotgun Assembly database

WoRMS:

World Register of Marine Species

References

  1. 1.

    Nimbs MJ, Larkin M, Davis TR, Harasti D, Willan RC, Smith SDA. Southern range extensions for twelve heterobranch sea slugs (Gastropoda: Heterobranchia) on the eastern coast of Australia. Mar Biodivers Rec. 2016;9:27.

    Google Scholar 

  2. 2.

    Eisenbarth J-H, Undap N, Papu A, Schillo D, Dialao J, Reumschüssel S, et al. Marine Heterobranchia (Gastropoda, Mollusca) in Bunaken National Park, North Sulawesi, Indonesia—a follow-up diversity study. Diversity. 2018;10:127.

    Google Scholar 

  3. 3.

    Nimbs M, Smith S. Beyond Capricornia: tropical sea slugs (Gastropoda, Heterobranchia) extend their distributions into the Tasman Sea. Diversity. 2018;10:99.

    Google Scholar 

  4. 4.

    Ompi M, Lumoindong F, Undap N, Papu A, Wägele H. Monitoring marine Heterobranchia in Lembeh Strait, North Sulawesi (Indonesia), in a changing environment. AACL Bioflux. 2019;12:664–77.

    Google Scholar 

  5. 5.

    Papu A, Undap N, Martinez NA, Segre MR, Datang IG, Kuada RR, et al. First study on Marine Heterobranchia (Gastropoda, Mollusca) in Bangka Archipelago, North Sulawesi, Indonesia. Diversity. 2020;12:52.

    Google Scholar 

  6. 6.

    Fisch K, Hertzer C, Böhringer N, Wuisan Z, Schillo D, Bara R, et al. The potential of Indonesian heterobranchs found around Bunaken Island for the production of bioactive compounds. Mar Drugs. 2017;15:384.

    PubMed Central  Google Scholar 

  7. 7.

    Gavagnin M, Carbone M, Ciavatta ML, Mollo E. Natural products from marine heterobranchs: an overview of recent results. Chem J Mold. 2019;14:9–31.

    CAS  Google Scholar 

  8. 8.

    Avila C. Terpenoids in marine heterobranch molluscs. Mar Drugs. 2020;18:162.

    CAS  PubMed Central  Google Scholar 

  9. 9.

    Avila C, Angulo-Preckler C. Bioactive compounds from marine heterobranchs. Mar Drugs. 2020;18:657.

    CAS  PubMed  PubMed Central  Google Scholar 

  10. 10.

    Avila C, Núñez-Pons L, Moles J. From the tropics to the poles: chemical defense strategies in sea slugs (Mollusca: Heterobranchia). In: Puglisi MP, Becerro MA, editors. Chemical ecology: the ecological impacts of marine natural products. 1st ed. CRC Press; 2018. p. 71–163.

    Google Scholar 

  11. 11.

    Burghardt I, Wägele H. The symbiosis between the ‘solar-powered’ nudibranch Melibe engeli Risbec, 1937 (Dendronotoidea) and Symbiodinium sp. (Dinophyceae). J Molluscan Stud. 2014;80:508–17.

    Google Scholar 

  12. 12.

    Melo Clavijo J, Donath A, Serôdio J, Christa G. Polymorphic adaptations in metazoans to establish and maintain photosymbioses: evolution of photosymbiosis. Biol Rev. 2018;93:2006–20.

    PubMed  Google Scholar 

  13. 13.

    Laetz EMJ, Wägele H. Comparing amylose production in two solar-powered sea slugs: the sister taxa Elysia timida and E. cornigera (Heterobranchia: Sacoglossa). J Molluscan Stud. 2019;85:166–71.

    Google Scholar 

  14. 14.

    Martin R, Heß M, Schrödl M, Tomaschko K-H. Cnidosac morphology in dendronotacean and aeolidacean nudibranch molluscs: from expulsion of nematocysts to use in defense? Mar Biol. 2009;156:261–8.

    Google Scholar 

  15. 15.

    Martin R, Tomaschko K-H, Heß M, Schrödl M. Cnidosac-related structures in Embletonia (Mollusca, Nudibranchia) compared with dendronotacean and aeolidacean species. Open Mar Biol J. 2010;4:96–100.

    Google Scholar 

  16. 16.

    Obermann D, Bickmeyer U, Wägele H. Incorporated nematocysts in Aeolidiella stephanieae (Gastropoda, Opisthobranchia, Aeolidoidea) mature by acidification shown by the pH sensitive fluorescing alkaloid Ageladine A. Toxicon. 2012;60:1108–16.

    CAS  PubMed  Google Scholar 

  17. 17.

    Goodheart JA, Bleidißel S, Schillo D, Strong EE, Ayres DL, Preisfeld A, et al. Comparative morphology and evolution of the cnidosac in Cladobranchia (Gastropoda: Heterobranchia: Nudibranchia). Front Zool. 2018;15:43.

    CAS  PubMed  PubMed Central  Google Scholar 

  18. 18.

    Wägele H, Willan RC. Phylogeny of the Nudibranchia. Zool J Linn Soc. 2000;130:83–181.

    Google Scholar 

  19. 19.

    Zapata F, Wilson NG, Howison M, Andrade SCS, Jörger KM, Schrödl M, et al. Phylogenomic analyses of deep gastropod relationships reject Orthogastropoda. Proc R Soc Lond B Biol Sci. 2014;281:20141739.

    Google Scholar 

  20. 20.

    Pabst EA, Kocot KM. Phylogenomics confirms monophyly of Nudipleura (Gastropoda: Heterobranchia). J Molluscan Stud. 2018;84:259–65.

    Google Scholar 

  21. 21.

    Wägele H, Klussmann-Kolb A, Verbeek E, Schrödl M. Flashback and foreshadowing—a review of the taxon Opisthobranchia. Org Divers Evol. 2014;14:133–49.

    Google Scholar 

  22. 22.

    Korshunova T, Fletcher K, Picton B, Lundin K, Kashio S, Sanamyan N, et al. The Emperor’s Cadlina, hidden diversity and gill cavity evolution: new insights for the taxonomy and phylogeny of dorid nudibranchs (Mollusca: Gastropoda). Zool J Linn Soc. 2020;189:762–827.

    Google Scholar 

  23. 23.

    WoRMS Editorial Board. World Register of Marine Species. 2020. https://doi.org/10.14284/170.

  24. 24.

    Odhner NH. Nudibranchia Dendronotacea. A revision of the system. Mém Mus R Hist Nat Belg Deux Sér. 1936;3:1057–128.

  25. 25.

    Odhner NH. The Nudibranchiata. British Antarctic (“Terra Nova”) Expedition, 1910. Natural History Report. Zoology. 1934;7:229–310.

  26. 26.

    Korshunova T, Martynov A. Consolidated data on the phylogeny and evolution of the family Tritoniidae (Gastropoda: Nudibranchia) contribute to genera reassessment and clarify the taxonomic status of the neuroscience models Tritonia and Tochuina. PLoS ONE. 2020;15:e0242103.

    CAS  PubMed  PubMed Central  Google Scholar 

  27. 27.

    Schrödl M, Wägele H, Willan RC. Taxonomic redescription of the doridoxidae (Gastropoda: Opisthobranchia), an enigmatic family of deep water nudibranchs, with discussion of basal nudibranch phylogeny. Zool Anz J Comp Zool. 2001;240:83–97.

    Google Scholar 

  28. 28.

    Mahguib J, Valdés Á. Molecular investigation of the phylogenetic position of the polar nudibranch Doridoxa (Mollusca, Gastropoda, Heterobranchia). Polar Biol. 2015;38:1369–77.

    Google Scholar 

  29. 29.

    Martynov A. From, “Tree-Thinking” to “Cycle-Thinking”: ontogenetic systematics of nudibranch molluscs. Thalassas. 2011;27:193–224.

    Google Scholar 

  30. 30.

    Pola M, Gosliner TM. The first molecular phylogeny of cladobranchian opisthobranchs (Mollusca, Gastropoda, Nudibranchia). Mol Phylogenet Evol. 2010;56:931–41.

    CAS  PubMed  Google Scholar 

  31. 31.

    Bleidissel S. Molekulare Untersuchungen zur Evolution der Aeolidida (Mollusca, Gastropoda, Nudibranchia, Cladobranchia) und zur Evolution einer sekundären Symbiose mit Symbiodinium (Dinoflagellata) in den Aeolidida. Dissertation. Bergische University of Wuppertal; 2010. http://nbn-resolving.de/urn/resolver.pl?urn=urn:nbn:de:hbz:468-20110509-151022-7.

  32. 32.

    Martynov A, Mehrotra R, Chavanich S, Nakano R, Kashio S, Lundin K, et al. The extraordinary genus Myja is not a tergipedid, but related to the Facelinidae s. str. with the addition of two new species from Japan (Mollusca, Nudibranchia). ZooKeys. 2019;818:89–116.

    Google Scholar 

  33. 33.

    Korshunova T, Martynov A, Bakken T, Evertsen J, Fletcher K, Mudianta IW, et al. Polyphyly of the traditional family Flabellinidae affects a major group of Nudibranchia: aeolidacean taxonomic reassessment with descriptions of several new families, genera, and species (Mollusca, Gastropoda). ZooKeys. 2017;717:1–139.

    Google Scholar 

  34. 34.

    Korshunova T, Martynov A, Picton B. Ontogeny as an important part of integrative taxonomy in tergipedid aeolidaceans (Gastropoda: Nudibranchia) with a description of a new genus and species from the Barents Sea. Zootaxa. 2017;4324:1–22.

    Google Scholar 

  35. 35.

    Korshunova TA, Sanamyan NP, Sanamyan KE, Bakken T, Lundin K, Fletcher K, et al. Biodiversity hotspot in cold waters: a review of the genus Cuthonella with descriptions of seven new species (Mollusca, Nudibranchia). Contrib Zool. 2020;90:216–83.

    Google Scholar 

  36. 36.

    Goodheart JA, Bazinet AL, Collins AG, Cummings MP. Relationships within Cladobranchia (Gastropoda: Nudibranchia) based on RNA-Seq data: an initial investigation. R Soc Open Sci. 2015;2:150196.

    PubMed  PubMed Central  Google Scholar 

  37. 37.

    Goodheart JA, Bazinet AL, Valdés Á, Collins AG, Cummings MP. Prey preference follows phylogeny: evolutionary dietary patterns within the marine gastropod group Cladobranchia (Gastropoda: Heterobranchia: Nudibranchia). BMC Evol Biol. 2017;17:221.

    PubMed  PubMed Central  Google Scholar 

  38. 38.

    Putz A, König GM, Wägele H. Defensive strategies of Cladobranchia (Gastropoda, Opisthobranchia). Nat Prod Rep. 2010;27:1386–402.

    CAS  PubMed  Google Scholar 

  39. 39.

    Goodheart JA, Wägele H. Phylogenomic analysis and morphological data suggest left-right swimming behavior evolved prior to the origin of the pelagic Phylliroidae (Gastropoda: Nudibranchia). Org Divers Evol. 2020. https://doi.org/10.1007/s13127-020-00458-9.

    Article  Google Scholar 

  40. 40.

    Wägele JW, Letsch H, Klussmann-Kolb A, Mayer C, Misof B, Wägele H. Phylogenetic support values are not necessarily informative: the case of the Serialia hypothesis (a mollusk phylogeny). Front Zool. 2009;6:12.

    PubMed  PubMed Central  Google Scholar 

  41. 41.

    Kück P, Wägele JW. Plesiomorphic character states cause systematic errors in molecular phylogenetic analyses: a simulation study. Cladistics. 2016;32:461–78.

    PubMed  Google Scholar 

  42. 42.

    Wägele JW, Bartolomaeus T, editors. Deep Metazoan Phylogeny: the Backbone of the Tree of Life: new insights from analyses of molecules, morphology, and theory of data analysis. Berlin, Boston: DE GRUYTER; 2014. https://doi.org/10.1515/9783110277524.

  43. 43.

    Pruvot-Fol A. Faune De France n° 58, Mollusques Opisthobranches. Paris: Paul Lechevalier; 1954. https://www.abebooks.com/book-search/title/faune-de-france-58-mollusques-opisthobranches/author/mme-alice-pruvot-fol/. Accessed 21 Aug 2020.

  44. 44.

    Schmekel L. Anatomie der Genitalorgane von Nudibranchiern (Gastropoda Euthyneura). Pubbl Staz zool Napoli. 1970;38:120–217.

  45. 45.

    Shimodaira H. An approximately unbiased test of phylogenetic tree selection. Syst Biol. 2002;51:492–508.

    PubMed  Google Scholar 

  46. 46.

    Strimmer K, von Haeseler A. Likelihood-mapping: a simple method to visualize phylogenetic content of a sequence alignment. Proc Natl Acad Sci U S A. 1997;94:6815–9.

    CAS  PubMed  PubMed Central  Google Scholar 

  47. 47.

    Misof B, Liu S, Meusemann K, Peters RS, Donath A, Mayer C, et al. Phylogenomics resolves the timing and pattern of insect evolution. Science. 2014;346:763–7.

    CAS  PubMed  Google Scholar 

  48. 48.

    Pease JB, Brown JW, Walker JF, Hinchliff CE, Smith SA. Quartet Sampling distinguishes lack of support from conflicting support in the green plant tree of life. Am J Bot. 2018;105:385–403.

    PubMed  Google Scholar 

  49. 49.

    Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–2.

    PubMed  Google Scholar 

  50. 50.

    Korshunova T, Bakken T, Grøtan VV, Johnson KB, Lundin K, Martynov A. A synoptic review of the family Dendronotidae (Mollusca: Nudibranchia): a multilevel organismal diversity approach. Contrib Zool. 2020;1–61.

  51. 51.

    Misof B, Meyer B, von Reumont BM, Kück P, Misof K, Meusemann K. Selecting informative subsets of sparse supermatrices increases the chance to find correct trees. BMC Bioinformatics. 2013;14:348.

    PubMed  PubMed Central  Google Scholar 

  52. 52.

    Wong TKF, Kalyaanamoorthy S, Meusemann K, Yeates DK, Misof B, Jermiin LS. A minimum reporting standard for multiple sequence alignments. NAR Genomics Bioinforma. 2020. https://doi.org/10.1093/nargab/lqaa024.

    Article  Google Scholar 

  53. 53.

    Jermiin LS, Ott M. SymTest. C++. 2017. https://github.com/ottmi/symtest. Accessed 28 May 2020.

  54. 54.

    Ho SYW, Jermiin LS. Tracing the decay of the historical signal in biological sequence data. Syst Biol. 2004;53:623–37.

    PubMed  Google Scholar 

  55. 55.

    Ababneh F, Jermiin LS, Ma C, Robinson J. Matched-pairs tests of homogeneity with applications to homologous nucleotide sequences. Bioinformatics. 2006;22:1225–31.

    CAS  PubMed  Google Scholar 

  56. 56.

    Bowker AH. A test for symmetry in contingency tables. J Am Stat Assoc. 1948;43:572–4.

    CAS  PubMed  Google Scholar 

  57. 57.

    Bouchet P, Rocroi J-P, Hausdorf B, Kaim A, Kano Y, Nützel A, et al. Revised classification, nomenclator and typification of gastropod and monoplacophoran families. Malacologia. 2017;61:1–526.

    Google Scholar 

  58. 58.

    MolluscaBase eds. MolluscaBase. MolluscaBase. 2020. Accessed at http://www.molluscabase.org on 2020-08-25. Accessed 31 Aug 2020.

  59. 59.

    Schillo D, Wipfler B, Undap N, Papu A, Böhringer N, Eisenbarth J-H, et al. Description of a new Moridilla species from North Sulawesi, Indonesia (Mollusca: Nudibranchia: Aeolidioidea)—based on MicroCT, histological and molecular analyses. Zootaxa. 2019;4652:265–95.

    Google Scholar 

  60. 60.

    Martynov A, Lundin K, Picton B, Fletcher K, Malmberg K, Korshunova T. Multiple paedomorphic lineages of soft-substrate burrowing invertebrates: parallels in the origin of Xenocratena and Xenoturbella. PLoS ONE. 2020;15:1–24.

    Google Scholar 

  61. 61.

    Korshunova T, Mehrotra R, Arnold S, Lundin K, Picton B, Martynov A. The formerly enigmatic Unidentiidae in the limelight again: a new species of the genus Unidentia from Thailand (Gastropoda: Nudibranchia). Zootaxa. 2019;4551:556.

    PubMed  Google Scholar 

  62. 62.

    Alder J, Hancock A. A monograph of the British nudibranchiate Mollusca: with figures of all the species. Part 5. London: The Ray Society; 1851.

  63. 63.

    Alder J, Hancock A. Descriptions of Pterochilus, a new genus of nudibranchiate mollusca, and two new species of Doris. Ann Mag Nat Hist. 1844;14:329–31.

    Google Scholar 

  64. 64.

    Edmunds M. Opisthobranchiate mollusca from Ghana: Flabellinidae, Piseinotecidae, Eubranchidae & Embletoniidae. J Conchol. 2015;42:105–24.

    Google Scholar 

  65. 65.

    Thompson TE, Brown GH. Biology of opisthobranch molluscs, vol. 2. London: Ray Society; 1984.

    Google Scholar 

  66. 66.

    Miller MC, Willan RC. Redescription of Embletonia gracile Risbec, 1928 (Nudibranchia: Embletoniidae): relocation to suborder Dendronotacea with taxonomic and phylogenetic implications. J Molluscan Stud. 1992;58:1–11.

    Google Scholar 

  67. 67.

    Dell’Ampio E, Meusemann K, Szucsich NU, Peters RS, Meyer B, Borner J, et al. Decisive data sets in phylogenomics: lessons from studies on the phylogenetic relationships of primarily wingless insects. Mol Biol Evol. 2014;31:239–49.

    PubMed  Google Scholar 

  68. 68.

    Wägele H, Vonnemann V, Wägele J-W. Toward a phylogeny of the Opisthobranchia. In: Lydeard C, Lindberg D, editors. Molecular systematics and phylogeography of mollusks. Smithsonian Books; 2003. p. 185–228. https://www.researchgate.net/profile/Heike_Waegele/publication/284477221_Toward_a_phylogeny_of_the_Opisthobranchia/links/566039f808ae1ef929857b4d.pdf. Accessed 18 Sep 2016.

  69. 69.

    Wägele H, Klussmann-Kolb A. Opisthobranchia (Mollusca, Gastropoda)—more than just slimy slugs. Shell reduction and its implications on defence and foraging. Front Zool. 2005;2:3.

    PubMed  PubMed Central  Google Scholar 

  70. 70.

    Simon S, Blanke A, Meusemann K. Reanalyzing the Palaeoptera problem—the origin of insect flight remains obscure. Arthropod Struct Dev. 2018;47:328–38.

    PubMed  Google Scholar 

  71. 71.

    Vasilikopoulos A, Balke M, Beutel RG, Donath A, Podsiadlowski L, Pflug JM, et al. Phylogenomics of the superfamily Dytiscoidea (Coleoptera: Adephaga) with an evaluation of phylogenetic conflict and systematic error. Mol Phylogenet Evol. 2019;135:270–85.

    PubMed  Google Scholar 

  72. 72.

    Vasilikopoulos A, Misof B, Meusemann K, Lieberz D, Flouri T, Beutel RG, et al. An integrative phylogenomic approach to elucidate the evolutionary history and divergence times of Neuropterida (Insecta: Holometabola). BMC Evol Biol. 2020;20:64.

    PubMed  PubMed Central  Google Scholar 

  73. 73.

    Szucsich NU, Bartel D, Blanke A, Böhm A, Donath A, Fukui M, et al. Four myriapod relatives—but who are sisters? No end to debates on relationships among the four major myriapod subgroups. BMC Evol Biol. 2020;20:144.

    CAS  PubMed  PubMed Central  Google Scholar 

  74. 74.

    Senatore A, Edirisinghe N, Katz PS. Deep mRNA sequencing of the Tritonia diomedea brain transcriptome provides access to gene homologues for neuronal excitability, synaptic transmission and peptidergic signalling. PLoS ONE. 2015;10:e0118321.

    PubMed  PubMed Central  Google Scholar 

  75. 75.

    Andrews S. FastQC: a quality control tool for high throughput sequence data. 2010. http://www.bioinformatics.babraham.ac.uk/projects/fastqc.

  76. 76.

    Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–20.

    CAS  PubMed  PubMed Central  Google Scholar 

  77. 77.

    Liu J, Li G, Chang Z, Yu T, Liu B, McMullen R, et al. BinPacker: packing-based de novo transcriptome assembly from RNA-seq sata. PLOS Comput Biol. 2016;12:e1004772.

    PubMed  PubMed Central  Google Scholar 

  78. 78.

    Peng Y, Leung HCM, Yiu S-M, Lv M-J, Zhu X-G, Chin FYL. IDBA-tran: a more robust de novo de Bruijn graph assembler for transcriptomes with uneven expression levels. Bioinformatics. 2013;29:i326–34.

    CAS  PubMed  PubMed Central  Google Scholar 

  79. 79.

    Kannan S, Hui J, Mazooji K, Pachter L, Tse D. Shannon: an information-optimal de novo RNA-Seq assembler. bioRxiv. 2016;039230.

  80. 80.

    Xie Y, Wu G, Tang J, Luo R, Patterson J, Liu S, et al. SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq reads. Bioinformatics. 2014;30:1660–6.

    CAS  PubMed  Google Scholar 

  81. 81.

    Robertson G, Schein J, Chiu R, Corbett R, Field M, Jackman SD, et al. De novo assembly and analysis of RNA-seq data. Nat Methods. 2010;7:909–12.

    CAS  PubMed  Google Scholar 

  82. 82.

    Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011;29:644–52.

    CAS  PubMed  PubMed Central  Google Scholar 

  83. 83.

    Kriventseva EV, Tegenfeldt F, Petty TJ, Waterhouse RM, Simão FA, Pozdnyakov IA, et al. OrthoDB v8: update of the hierarchical catalog of orthologs and the underlying free software. Nucleic Acids Res. 2015;43:D250–6.

    CAS  PubMed  Google Scholar 

  84. 84.

    DeJong RJ, Emery AM, Adema CM. The mitochondrial genome of Biomphalaria glabrata (Gastropoda: Basommatophora), intermediate host of Schistosoma mansoni. J Parasitol. 2004;90:991–7.

    CAS  PubMed  Google Scholar 

  85. 85.

    Zhang G, Fang X, Guo X, Li L, Luo R, Xu F, et al. The oyster genome reveals stress adaptation and complexity of shell formation. Nature. 2012;490:49–54.

    CAS  PubMed  Google Scholar 

  86. 86.

    Simakov O, Marletaz F, Cho S-J, Edsinger-Gonzales E, Havlak P, Hellsten U, et al. Insights into bilaterian evolution from three spiralian genomes. Nature. 2013;493:526–31.

    CAS  PubMed  Google Scholar 

  87. 87.

    Petersen M, Meusemann K, Donath A, Dowling D, Liu S, Peters RS, et al. Orthograph: a versatile tool for mapping coding nucleotide sequences to clusters of orthologous genes. BMC Bioinformatics. 2017;18:111.

    PubMed  PubMed Central  Google Scholar 

  88. 88.

    Subramanian AR, Kaufmann M, Morgenstern B. DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment. Algorithms Mol Biol. 2008;3:6.

    PubMed  PubMed Central  Google Scholar 

  89. 89.

    Misof B, Misof K. A Monte Carlo approach successfully identifies randomness in multiple sequence alignments: a more objective means of data exclusion. Syst Biol. 2009;58:21–34.

    CAS  PubMed  Google Scholar 

  90. 90.

    Kück P, Meusemann K, Dambach J, Thormann B, von Reumont BM, Wägele JW, et al. Parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees. Front Zool. 2010;7:10.

    PubMed  PubMed Central  Google Scholar 

  91. 91.

    Kück P. AliCUT. 2019. https://github.com/PatrickKueck/AliCUT.

  92. 92.

    Kück P, Longo GC. FASconCAT-G: extensive functions for multiple sequence alignment preparations concerning phylogenetic studies. Front Zool. 2014;11:81.

    PubMed  PubMed Central  Google Scholar 

  93. 93.

    Nguyen L-T, Schmidt HA, von Haeseler A, Minh BQ. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol. 2015;32:268–74.

    CAS  Google Scholar 

  94. 94.

    Kalyaanamoorthy S, Minh BQ, Wong TKF, von Haeseler A, Jermiin LS. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat Methods. 2017;14:587–9.

    CAS  PubMed  PubMed Central  Google Scholar 

  95. 95.

    Chernomor O, von Haeseler A, Minh BQ. Terrace aware data structure for phylogenomic inference from supermatrices. Syst Biol. 2016;65:997–1008.

    PubMed  PubMed Central  Google Scholar 

  96. 96.

    Guindon S, Dufayard J-F, Lefort V, Anisimova M, Hordijk W, Gascuel O. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol. 2010;59:307–21.

    CAS  PubMed  PubMed Central  Google Scholar 

  97. 97.

    Anisimova M, Gil M, Dufayard J-F, Dessimoz C, Gascuel O. Survey of branch support methods demonstrates accuracy, power, and robustness of fast likelihood-based approximation schemes. Syst Biol. 2011;60:685–99.

    PubMed  PubMed Central  Google Scholar 

  98. 98.

    Aberer AJ, Krompass D, Stamatakis A. Pruning rogue taxa improves phylogenetic accuracy: an efficient algorithm and webservice. Syst Biol. 2013;62:162–6.

    PubMed  Google Scholar 

Download references

Acknowledgements

KM thanks Ondrej Hlinka (CSIRO) for HPC support and Thomas Wong (ANU) for kindly providing Unique Tree. KM and DK thank Robert Waterhouse for details on official gene sets used in OrthoDB 8. DK thanks Malte Petersen for suggestions on using Orthograph. We sincerely appreciate the insightful editing of Prof. Jeffrey Townsend and the helpful comments and suggestions of two anonymous reviewers.

Funding

Open Access funding enabled and organized by Projekt DEAL. AD, HW, and MS received funding by the German Research Foundation (DFG DO 1781/1-1, DFG WA 618/18-1, SCHR 667/13-1). AM received support by the MSU Zoological Museum (18-1-21 No. 121032300105-0). The work of TK was conducted under the IDB RAS Government basic research program in 2021 No. 0088-2021-0008. The funding bodies played no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.

Author information

Affiliations

Authors

Contributions

AD, DK, and HW designed the study. DK, HW, MS, AM, and TK collected and provided material. AM and TK performed taxonomic assessments. DK, KM, AD, and HW performed all data analyses. DK and AD developed scripts. AD performed sequence data management. MS, AM, and TK provided pictures. JG provided access to unpublished data. All authors contributed in writing the manuscript, with AD, DK, KM, and HW taking the lead. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Alexander Donath.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1.

Additional text.

Additional file 2. Table S1:

Sampling information for the species collected for this study. Table S2: NCBI accession numbers for all species used in this study. Table S3: Statistics of raw sequence reads before and after trimming. Table S4: Assembly statistics. Table S5: BUSCO results. Table S6: Results of the Quality Checker script and selection of the best assembly. Table S7: Information on sequences removed during contamination filtering. Table S8: Number of removed outlier sequences per species. Table S9: Group definitions to compile the intermediate data set. Table S10: Information content and evolutionary rates of the orthologs included in the strict data set. Table S11: Supermatrix diagnostics of data sets used in this study. Table S12: Results of the Quartet Sampling analysis. Table S13: Group definitions used for Four-cluster Likelihood Mapping (FcLM) analyses. Table S14: FcLM results testing the position of Embletonia. Table S15: AU test results on the intermediate and strict (partitioned, unpartitioned, SOS) data sets.

Additional file 3. Figure S1.

Species-pairwise site-coverage of the original unreduced and reduced data sets. Heat maps indicate species-pairwise amino acid site-coverage of the sequences of 58 species in the original data sets inferred with AliStat. Low shared site-coverage is in shades of red and high shared site-coverage is in shades of green. AliStat scores are given in Additional file 2: Table S11. a) original unreduced data set. b) original reduced data set. Figure S2. Heat maps calculated with SymTest applying the Bowker’s test on the original unreduced and reduced data sets. Heat maps show the results of pairwise Bowker’s test as implemented in SymTest 2.0.47 analysing the original data sets unreduced and reduced. The percentage of pairwise p-values < 0.05 rejecting SRH conditions are given in parentheses. a) original unreduced data set (p-values < 0.05: 83.36%). b) original reduced data set (p-values < 0.05: 42.65%). Note that especially Calmella and Doris are obvious with respect to violating SRH conditions. Figure S3. Heat map visualising the information content of the final unreduced data set calculated with MARE. The information content (IC) is colour-coded in shades of blue, with darker shades representing higher IC and white squares indicating missing data. Red squares indicate gene partitions with an IC = 0. Species are displayed in rows (x-axis) and gene partitions are displayed in columns (y-axis). Supermatrix diagnostics of MARE are provided in Additional file 2: Table S11. Figure S4. Heat map visualising the information content of the final intermediate data set calculated with MARE. The information content (IC) is colour-coded in shades of blue, with darker shades representing higher IC and white squares indicating missing data. Red squares indicate gene partitions with an IC = 0. Species are displayed in rows (x-axis) and gene partitions are displayed in columns (y-axis). Supermatrix diagnostics of MARE are provided in Additional file 2: Table S11. Figure S5. Heat map visualising the information content of the final strict data set calculated with MARE. The information content (IC) is colour-coded in shades of blue, with darker shades representing higher IC and white squares indicating missing data. Red squares indicate gene partitions with an IC = 0. Species are displayed in rows (x-axis) and gene partitions are displayed in columns (y-axis). Supermatrix diagnostics of MARE are provided in Additional file 2: Table S11. Figure S6. Heat map visualising the information content of the strict SOS data set calculated with MARE. The information content (IC) is colour-coded in shades of blue, with darker shades representing higher IC and white squares indicating missing data. Red squares indicate gene partitions with an IC = 0. Species are displayed in rows (x-axis) and gene partitions are displayed in columns (y-axis). Supermatrix diagnostics of MARE are provided in Additional file 2: Table S11. Figure S7. Species-pairwise site-coverage of the final unreduced, intermediate, strict, and strict SOS data set. Heat maps indicate species-pairwise amino acid site-coverage of the sequences of 56 species in the final data sets inferred with AliStat. Low shared site-coverage is in shades of red and high shared site-coverage is in shades of green. AliStat scores are given in Additional file 2: Table S11. a) unreduced data set. b) intermediate data set. c) strict data set. d) strict SOS data set. Figure S8. Heat maps calculated with SymTest applying the Bowker’s test on the final unreduced, intermediate, strict, and strict SOS data sets. Heat maps show the results of pairwise Bowker’s test as implemented in SymTest 2.0.47 analysing the final data sets unreduced, intermediate, strict, and strict SOS. The percentage of pairwise p-values < 0.05 rejecting SRH conditions are given in parentheses. a) unreduced data set (p-values < 0.05: 82.14%). b) intermediate data set (p-values < 0.05: 63.96%). c) strict data set (p-values < 0.05: 46.17%). d) strict SOS data set (p-values < 0.05: 21.17%). Figure S9. Best ML tree of the strict data set with aLRT and aBayes support. The phylogram is identical to the phylogram in Fig. 1 without the alternative position of Embletonia pulchra. The first value displays branch support based on 10,000 SH-aLRT replicates, the second value displays support derived from the approximate Bayes test. Figure S10. Best ML tree of the intermediate data set with non-parametric bootstrap support. Statistical support was inferred from 300 non-parametric bootstrap replicates. Figure S11. Best ML tree of the intermediate data set with aLRT and aBayes support. The first value displays branch support based on 10,000 SH-aLRT replicates, the second value displays support derived from the approximate Bayes test. Figure S12. Best ML tree of the unreduced data set with non-parametric bootstrap support. Statistical support was inferred from 100 non-parametric bootstrap replicates. Figure S13. Best ML tree of the unreduced data set with aLRT and aBayes support. The first value displays branch support based on 10,000 SH-aLRT replicates, the second value displays support derived from the approximate Bayes test. Figure S14. Best ML tree of the strict unpartitioned data set analysed with a mixture model approach with non-parametric bootstrap support. Statistical support was inferred from 100 non-parametric bootstrap replicates. Figure S15. Best ML tree of the strict unpartitioned data set analysed with a mixture model approach with aLRT and aBayes support. The first value displays branch support based on 10,000 SH-aLRT replicates, the second value displays support derived from the approximate Bayes test. Figure S16. Best ML tree of the strict SOS data set with non-parametric bootstrap support. Statistical support was inferred from 100 non-parametric bootstrap replicates. Figure S17. Best ML tree of the strict SOS data set with aLRT and aBayes support. The first value displays branch support based on 10,000 SH-aLRT replicates, the second value displays support derived from the approximate Bayes test.

Additional file 4. Archive S1

: Illumina adapters used for adapter trimming.

Additional file 5. Archive S3

: This archive includes official gene sets of the three reference species Biomphalaria glabrata, Crassostrea gigas, and Lottia gigantea on translational and transcriptional level, the list of all orthologous sequence clusters (OGs) as required for Orthograph, and an exemplary Orthograph config file.

Additional file 6. Archive S4:

Unmasked multiple sequence alignments on amino acid level including Doris kerguelenensis and Calmella cavolini prior to the removal of outliers.

Additional file 7. Archive S5

: 1,767 multiple sequence alignments (FASTA format) on amino acid level, from which sequences belonging to Doris kerguelenensis and Calmella cavolini as well as ambiguously aligned sections and gap-only sites were removed. These served as the basis for compiling the final unreduced supermatrix.

Additional file 8. Archive S6

: The unreduced, intermediate, strict, and strict SOS supermatrix (FASTA format) plus respective gene partition information including the selected substitution model used in the phylogenetic analyses.

Additional file 9. Archive S7:

The tree with QS identifiers and quartet fidelity (qf) score resulting from the quartet sampling method (NEWICK format).

Additional file 10. Archive S8:

The tree topologies displaying the different positions of Embletonia pulchra that were tested using the approximately unbiased (AU) test with IQ-TREE. This archive contains four directories: intermediate, strict partitioned, strict unpartitioned, and strict SOS. Each directory contains the tree topologies that were tested for this specific data set (see Additional file 2: Table S15).

Additional file 11. Archive S9:

Data used for Four-cluster Likelihood Mapping (FcLM). This archive includes two directories [one per approach, with (a) 19 species included in Group 4 and (b) 15 species included in Group 4; see Additional file 1]. Each directory includes four subdirectories: original, permutationI, permutationII, and permutationIII. In each subdirectory, the following files that served as input for the FcLM with IQ-TREE are provided: superalignment (FASTA format) of the strict data set, partition file with gene boundaries and respective models, and the group file (NEXUS format) listing the species included in the defined groups (see Additional file 2: Table S13).

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Karmeinski, D., Meusemann, K., Goodheart, J.A. et al. Transcriptomics provides a robust framework for the relationships of the major clades of cladobranch sea slugs (Mollusca, Gastropoda, Heterobranchia), but fails to resolve the position of the enigmatic genus Embletonia. BMC Ecol Evo 21, 226 (2021). https://doi.org/10.1186/s12862-021-01944-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12862-021-01944-0

Keywords

  • Phylogenomics
  • Cladobranchia
  • RNA-Seq
  • Transcriptomes
  • Phylogeny
  • Embletoniidae