Apicomplexan Set8 is derived from animal Set8
The histone methyltransferase Set8 of apicomplexan parasites was recently shown to display strong sequence similarity to the orthologous protein in animals [16], suggesting the possibility of an ancient horizontal acquisition of the gene encoding this protein. Therefore, we were interested in more extensive phylogenetic analyses of the origins of apicomplexan Set8. Using a canonical SET domain from Homo sapiens Set8 (PR-7) as the query, we searched for the closest sequences to the Set8 domain across all major eukaryotic phyla. In many organisms, no conclusive Set8 ortholog could be identified, and in all taxa other than animals and apicomplexans, sequences retrieved using human Set8 as query proved to be most similar to other Set families in reciprocal blast searches of the complete NCBI protein database, suggesting that they might not be bona fide Set8 orthologs. Nonetheless, to avoid erroneously excluding highly divergent sequences, the nearest blastp matches to human Set8 were included in our analyses. Our complete dataset therefore represented potential Set8 sequences from all major eukaryotic lineages for use in phylogenetic analyses.
A phylogenetic tree was derived from combined maximum-likelihood (ML) and Bayesian inference using Set8 orthologs across this broad sampling of eukaryotic organisms with the results shown in Figure 1. This tree differs significantly with current consensus phylogenies derived from larger datasets [19], which clearly show the great evolutionary distances between apicomplexans and the animal and plant kingdoms. Specifically, we recovered apicomplexan Set8 sequences as a monophyletic grouping that does not occur in the expected position on the tree. Instead of grouping with closely related protists such as Perkinsus marinus and ciliates [19], apicomplexan Set8 sequences nest strongly within the animal Set8 clade. Consistent with this, no other phyla or taxa show any phylogenetic affinity for apicomplexan Set8; this includes algal and plant sequences, which represent alternative potential sources for HGT. Moreover, the other SET domain bearing sequences (the lower clade in Figure 1) do not appear to represent bona fide Set8 proteins. Tetrahymena proteins are found in this clade, for instance, and Tetrahymena is known to lack Set8 [16]; as noted above, none of these sequences retrieved Set8 when used as queries in reciprocal blastp searches.
The recent completion of the P. marinus genome provided a particularly valuable resource for investigating whether the Set8 transfer likely occurred specifically in the ancestor of extant apicomplexans. Perkinsus marinus, a parasite of oysters, is the nearest known relative of dinoflagellates and therefore serves as a representative sister group of apicomplexans for our analysis. Because dinoflagellates are predominantly free-living, Perkinsus must have evolved its parasitic lifestyle independently of the apicomplexans. The nearest SET domain bearing sequence to human Set8 from the P. marinus genome (E ≤ 10-13) did not return a match to Set8 when used as a query in a reciprocal blastp search, and nested deeply within the clade of non-Set8 sequences in phylogenetic analyses (Figure 1). While caution must always be used when concluding that a sequence is absent from any particular genome, the P. marinus genome database includes over 23,000 inferred proteins, suggesting a relatively complete data set for proteomic comparisons.
In an expanded survey of eukaryotes, including partial genomic resources available to date, blast queries with human Set8 failed to identify any significant hits (E ≤ 10-5) in other organisms closely related to the apicomplexans, namely dinoflagellates, colpodellids, and Chromera velia, a photosynthetic autotroph discovered recently. The same was true for a variety of other protists including kinetoplastids, Trichomonas and Giardia, as well as fungi (including budding yeast), and also certain amoebozoans including Acanthamoeba.
As shown in Figure 1, statistical support for apicomplexan Set8 grouping with animals is very strong (Bayesian probability of 1.0 and ML bootstrap of 86%), while the closest sequences to Set8 in all other non-animal species grouped in a separate clade (also with very strong support). The only exception is the slime mold Dictyostelium, which also groups within the animal Set8 clade. Given that complete genomes of other members of the Amoebozoa phylum (e.g. Entamoeba and Acanthamoeba) lack Set8, this provides evidence of a similar horizontal transfer of animal Set8 into slime molds. Considering the vast evolutionary distances between animals, the Apicomplexa and slime molds, and the absence of Set8 from all other eukaryotic lineages, it appears almost certain that the latter two groups acquired Set8 independently via HGT from animals. The highly unlikely alternative would be direct descent of apicomplexan Set8 from a common ancestor with animals, followed by gene loss repeatedly and independently from all other eukaryotes except some cellular slime molds.
A nematode appears to be the source of the host transfer to an ancient apicomplexan
The finding that apicomplexan Set8 is likely of animal origin raises questions about the approximate time of the transfer event, and the source of the acquired sequence. As noted above, the time of the transfer appears to be after the divergence of apicomplexans from dinoflagellates and the colopodellids [20] since these organisms appear to lack Set8, but prior to the radiation of the known apicomplexans, which all have Set8. To more accurately explore which specific animal taxon was the likely source of the gene, Set8 sequences from extant apicomplexans (P. falciparum, T. annulata, T.gondii, C. parvum and B. bovis) were used in phylogenetic analyses with only the animal and Dictyostelium sequences. To promote more accurate recovery of phylogenetic associations within the Set8 clade, we discarded highly divergent, non-orthologous SET sequences from other eukaryotes that could produce phylogenetic artifacts within the Set8 clade.
As shown in Figure 2, we consistently observed that apicomplexan sequences branch within the nematodes, and specifically with Set8 from Trichonella spiralis. The ML bootstrap support for this relationship is weak (32%) but the Bayesian posterior probability is strong (0.99). Importantly, this tree recovers the same relationship between nematodes and Apicomplexa as did the expanded analysis, indicating this topology is stable regardless of the taxa sampled.
Furthermore, it is telling that both the slime mold Dictyostelium and the Apicomplexa branch with nematodes, but separate from each other. If the positions of the Dictyostelium and apicomplexan sequences were long-branch attraction artifacts, they would be expected to attract each other as the two most divergent branches of the Set8 tree. Moreover, neither is attracted to the base of the Set8 clade, or even to the base of the nematode clade, but rather to individual sub-clades within the nematodes. This contrasts with the clear artefactual rooting of the overall Set8 clade, wherein long-branch outgroups are attracted to the rapidly evolving sequence from the parasitic trematode Schistosoma mansoni. Thus, although statistical support for a specific relationship between apicomplexan and nematode sequences is not strong, there appears to be no basis for concluding it is a typical phylogenetic artefact. While the precise timing of the emergence of nematodes is a subject of debate, all estimates place their origin between 600– 1200 million years ago [21]. Thus, nematodes were extant at the time of the apicomplexan radiation, making the proposed transfer plausible.
Analysis of the reciprocal H3K36 modifiers Set2 and JmjC1 in Apicomplexa
Similar to Set8, Set2 is a histone methyltransferase that contributes to the regulation of higher order chromatin assembly and the epigenetic control of gene expression. Set2 and its cognate demethylase JmjC1 work by adding and removing methyl groups from H3K36, respectively [22]. With the exception of the rodent plasmodia, all apicomplexan parasites, regardless of host, possess an apparently orthologous (within apicomplexans) protein similar to Set2. At least two ciliates (Tetrahymena and Paramecium) also possess putative Set2 homologs, suggesting that the Set2 family could have been present in the alveolate lineage before the divergence of the Apicomplexa from ciliates.
Aravind and colleagues have argued that a major family of transcription factors (ApiAP2) in Apicomplexa were laterally transferred from the algal endosymbiont harbored intracellularly by all members of this group of parasites [3]. This notion led us to question whether the chromatin modifiers Set2 and JmjC1 also could have originated from a similar horizontal transfer event. To determine if Set2 and JmjC1 might have been acquired from an algal endosymbiont, or any other higher eukaryotic cell, we performed bioinformatic and phylogenetic analyses as described previously for Set8.
Horizontal transfer of a H3K36 modifier into the Apicomplexa
In addition to Set8 discussed above, phylogenetic analyses clearly define Plasmodium proteins similar to the Set2 subfamily, as well as to Set1 and Set3 (Figure 3). Interestingly, although a Set2 subfamily including animal, yeast and higher plant exemplars is well resolved in our analyses (Figure 3), the putative Set2 sequence from Plasmodium (PF3D7_1322100) groups with Ashr3, a related H3K36 methyltransferase from green plants. An initial broader analysis performed with putative Set2 sequences from all apicomplexans recovered the same topology, but with weaker support (Additional file 1). To clarify the SET protein subfamily to which apicomplexan Set2 belongs, we produced a tree with more balanced sampling, using only sequences of P. falciparum along with Set family exemplars from plants, animals and yeast, and Set proteins from ciliates and P. marinus. The resulting tree is shown in Figure 3, and provides strong support in both ML and Bayesian analyses for an association between the Plasmodium sequence and Ashr3 homologs from green plants. In contrast, the nearest SET protein from P. marinus, the expected sister group to apicomplexans, is defined clearly as a member of the Set1 subfamily.
Because only the SET domain itself can be aligned across all these diverse sequences, a relatively short alignment (231 aa) is available for tree reconstruction; potentially this could lead to spurious phylogenetic associations. In this case, however, there is additional corroborating evidence outside the SET domain for a close relationship between PF3D7_1322100 and the plant Ashr3 family. In both cases, the SET domain is positioned at the C-terminal end of the protein. In contrast, Set2 orthologs from animal, yeast and green plant models have a SET domain in more N-terminal locations [23]. Potential Set2 homologs from ciliates, the nearest relatives of apicomplexans present in the Set2 sub-clade, also have SET domains positioned at their extreme N-termini. More significantly, Simple Modular Architecture Research Tool (SMART; [24]) identifies a shared plant homeodomain (PHD) as the first identifiable domain upstream of the SET domain (Figure 3). Although the complete gene from Plasmodium has undergone a dramatic expansion, this PHD region is conserved strongly enough between plant Ashr3 and Plasmodium genes to be found in reciprocal blastp searches; in contrast, no comparable match is found among numerous PHD and other ring domains in various SET proteins in the NCBI database.
The combination of this conserved PHD in comparable synteny with the SET domain, similar overall domain architecture, and strong phylogenetic support for SET domain monophyly between Plasmodium and plant Ashr3 are unlikely to be coincidental. Rather, they represent credible evidence of an orthologous relationship. The Ashr3 Set subfamily is not found in animals or fungi [23] and, based on our survey of complete NCBI protein databases, appears to be restricted to green plants and apicomplexans. Although we were unable to identify Ashr3 genes in either green or red algae, they are present in early land plants, and phylogenetic analyses, both ours and previous [23], indicate that Ashr3 is a relatively ancient SET protein family. In addition, all apicomplexan Ashr3/Set2 sequences are recovered as a monophyletic group (Additional file 1), indicating a single transfer event before the radiation of extant apicomplexans. Therefore, it is possible that Ashr3 was present in the algal ancestor of the apicoplast and moved into apicomplexans via endosymbiotic gene transfer.
At present, whether the apicoplast is descended from the same endosymbiont that gave rise to other photosynthetic alveolates remains under debate, as does the taxonomic affiliation of that algal endosymbiont [25, 26]. Given this uncertainty, and the relative paucity of genomic data from dinoflagellates and red algae (a possible source of alveolate plastids), HGT of Ashr3 from an algal endosymbiont is purely speculative at this juncture, and remains one of several possibilities for the acquisition of this protein. Nevertheless, the known presence of an algal endosymbiont in the ancestor of apicomplexans provides a reasonable biological explanation for the presence of a plant-specific chromatin remodeling protein in the lineage. Whatever the vector, the horizontal transfer appears to predate the origin of extant apicomplexans, as Cryptosporidum contains a four-PHD SET protein that groups with PF3D7_1322100 and all related apicomplexan sequences in expanded phylogenetic analyses (Additional file 1).
We also completed similar analyses for JmjC1 homologs (Figure 4). In this case, however, we recovered an apicomplexan association with sequences from ciliates, although not with the sequence from P. marinus. Although support for these relationships is reasonably strong, we should point out that it depends on how JmjC1 genes are sampled. For the analyses shown in Figure 4, we chose the closest blastp match to JmjC1 from P. falciparum or T. gondii from each taxon. In contrast, when we sampled sequences using human and yeast exemplars as queries, phylogenetic analyses tended to group apicomplexan JmjC1 with sequences from green algae. Nevertheless, we could find no compelling evidence at present to indicate that the history of JmjC1 in apicomplexans is complicated by HGT.