1. Malin is conserved in all classes of vertebrates
We recently defined the evolutionary lineage of the laforin gene [18–20]. The laforin gene is conserved in all vertebrate genomes, but it is absent in genomes of most non-vertebrate organisms including the standard model organisms yeast, flies, or worms. Surprisingly, we found that laforin is conserved in the cephalochordate Branchiostoma floridae and in the Cnidarian Nematostella vectensis as well as in the following five protozoans Cyanidioschyzon merolae, Toxoplasma gondii, Eimeria tenella, Tetrahymena thermophila, and Plasmodium tetraurelia [18, 19]. Thus, laforin possesses an ancient and unique evolutionary lineage. Given that the function of laforin and malin has been linked in vertebrates, we sought to determine if this was true in all organisms. If they are not conserved in the same species, then these data would argue for malin- and/or laforin-independent functions.
Malin is a RING-type E3 ubiquitin ligase and contains six NHL domains (Figure 1A). NHL domains form a six-bladed β-propeller and direct protein-protein interactions, similar to WD40 domains [7, 21]. In order to define the evolutionary lineage of malin, we first searched vertebrate databases using the criteria that a malin ortholog must contain a RING domain and six NHL domains, in any orientation (i.e. amino- versus carboxy-terminal), and that it cannot contain any other domains. We utilized the NCBI non-redundant (nr) protein database, organism-specific databases, and the SUPERFAMILY database [22, 23]. We performed BLASTp and PSI-BLAST searches of these databases and considered any protein with an E value < 1e-90 a probable malin ortholog and proteins with < 1e-50 a possible ortholog. We analyzed each of these sequences using PROSITE, PFAM, JPRED3, and CDD [24, 25] to define predicted secondary structure as well as determine if additional domains were present and excluded proteins with any additional domains.
We identified 28 vertebrate genomes containing a protein with a RING domain and six NHL domains (Figure 1B and additional file 1, Figure S1). For each genome that contains a malin ortholog, we found that malin was the only protein in the genome that contained only these two domains. Each malin ortholog identified had an E value < 1e-90, contained a RING domain and six NHL domains with similar boundaries as human malin, and possessed similar predicted secondary structure. We searched each of the malin orthologs for additional domains, but did not identify other putative domain in the malin orthologs. To ensure that we had identified malin orthologs, we generated a MAFFT [26] alignment of the 28 proteins to determine their percent identity and similarity to H. sapiens malin (Hs-malin) as well as compared predicted secondary structure. The similarities ranged from 40% for Actinopterygii (fish) versions of malin to 100% for other primate versions of malin (Figure 1B). Of note, we identified a malin ortholog in at least one member of each class of vertebrates. Thus, malin is conserved in all five classes of vertebrates, and it is the only protein in vertebrate genomes that contains only a RING and six NHL domains.
2. Malin is conserved in all vertebrate genomes and one cephalochordata genome
In addition to vertebrate genomes, we also probed the NCBI databases (BLASTP) for a malin ortholog in invertebrate and protozoan genomes. We did not uncover a malin ortholog in any non-vertebrate genome NCBI database. Given the lack of non-vertebrate sequences obtained in our BLASTP searches, we wondered whether the malin gene was absent in other taxa or not observed due to incomplete sequencing data.
To answer these questions, the same queries were used to perform BLASTP, TBLASTN, PSI-BLAST, PHI-BLAST, and domain searches against species-specific databases. We probed 215 species-specific databases from Bilateria, Cnidaria, Ctenophora, Placozoa, Porifera, Choanoflagellate, as well as single-celled protozoan genomes, and 1,408 bacterial genomes (additional file 2, Table S1). Only sequences including similarity at both RING and NHL domains were considered as positive hits. To increase the breadth of our search, data from ENSEMBL gene database [27] were also investigated. These efforts identified a malin ortholog in one taxon outside of vertebrates. We identified a malin ortholog in the cephalochordate Branchiostoma floridae (Figure 2A). Although we closely examined the genomes of all metazoans, protists, plants and bacteria, we did not identify a malin ortholog in any of these other genomes. We did identify proteins with either a RING domain or NHL domains in genomes as ancient as archae (additional file 3, Figure S2 and additional file 4, Figure S3). However, these proteins only contain a RING or NHL domains, and none of them contain both domains. Thus, RING and NHL domains are found independently in the most basal genomes, but they are not observed within the same protein until the emergence of malin.
One of the reasons for performing these analyses was to determine if malin shared a similar evolutionary lineage with laforin. Laforin is a glucan phosphatase that contains a carbohydrate binding module (CBM) and dual-specificity phosphatase (DSP) domain, and physically interacts with malin to form a functional complex. When we compared the species distribution of malin with that previously described for laforin [18, 19], we observed that laforin and malin do not correlate in species distribution (Figure 2B & 2C). Since laforin is present in the genome of more evolutionarily basal organisms than malin, these results suggest that laforin may perform additional functions independent of malin. It is possible that these functions are conserved from red algae to humans, but our results indicate that at least in lower eukaryotes laforin must posses malin-independent functions.
3. Malin is phylogenetically related to the TRIM family of proteins
While malin is the only protein that we found in vertebrate genomes containing only a RING and six NHL domains, our bioinformatic analyses did recover other proteins with RING and NHL domains. The protein with the highest identity to malin that we recovered corresponds to TRIM32, a E3-ubiquitin ligase that belongs to the tripartite-motif containing family of proteins [3]. TRIM32 is characterized by the presence of the conserved TRIM core domains (a RING, B-box, and coiled coil) and also by the presence of several NHL domains, which resemble the NHL domains of malin.
Given the similarity between malin and TRIM32 we decided to further investigate the phylogenetic relations between malin and TRIM proteins so there would be no confusion between TRIM and malin orthologs. Sardiello et al [28] recently defined the TRIM protein family, and classified them into Group 1 and Group 2 based on relatedness and their rates of evolution. Malin is more similar to the 34 TRIM proteins in Group 1, some of which contain NHL domains. We chose four members from Group 1 to further analyze and compare/contrast with malin. TRIM2 and TRIM32 each contain a RING, B-box, coiled-coil, and six NHL domains, and TRIM2 also contains a Filamin domain (Figure 3A). TRIM 56 and TRIM71 each contains different combinations of these five domains (Figure 3A). First, we analyzed the RING domains of TRIM2, TRIM32, and TRIM56, and found that the TRIM32 RING shares the highest degree of similarity and identity with the malin RING (Figure 3B and 3C). In fact, when we performed a BLASTP search with the human malin RING domain in the NCBI H. sapiens nr database, the RING domain from TRIM32 is the first non-malin hit.
Next, we performed BLASTP searches with human TRIM2, TRIM32, TRIM56, and TRIM71, with the aim of unveiling the phylogenetic relations between malin and this group of TRIM proteins. We verified orthologs of each protein by analyzing their domain boundaries and domain arrangement, and generated an alignment from the sequences of malin, TRIM2, TRIM32, TRIM56, and TRIM71 from H. sapiens, P. troglodytes, R. norvegicus, and M. musculus (additional file 5, Figure S4). Using the alignment, we generated a phylogenetic maximum-likelihood tree of mouse, rat, chimp and human orthologs (Figure 3D). The tree confirms that malin and TRIM32 orthologs are the closest homologs with TRIM71, TRIM56, and TRIM2 orthologs as more divergent from malin.
To gain insight into the evolution of the malin and TRIM orthologs, we analyzed the gene structure and intron-exon boundaries of each. The gene encoding malin (EPM2B) is a single exon in all mammalian, bird, fish, and amphibian genomes (27 genomes in total), but the malin gene contains two exons in reptile genome (additional file 6, Figure S5). The B. floridae malin ortholog is also a single exon. The genes encoding TRIM2 and TRIM71 both contain multiple exons (3-11 exons) in all genomes investigated, including: mammals, bird, fish, amphibian, reptiles, B. floridae, C. elegans, C. intestinalis, D. melanogaster, and N. vectensis (additional file 6, Figure S5). Therefore, it seems unlikely that the malin gene is most closely related to either TRIM2 or TRIM71. Alternatively, TRIM32 and TRIM56 are both single exon genes in all mammalian genomes. We were only able to identify TRIM56 in mammalian genomes, but it is a single exon gene in all twelve genomes where we identified it. Similarly, TRIM32 is a single exon gene in all fourteen mammalian genomes where it was identified, and in the two amphibian genomes. However the TRIM32 gene contains two exons in bird, fish, and reptile genomes (additional file 6, Figure S5). Due to the limited sequence data available for TRIM56, it is difficult to definitively determine if malin is more similar at the gene structural level to TRIM32 or TRIM56. However, TRIM56 does not contain NHL repeats in its sequence while TRIM32 does contain NHL domains and malin and TRIM32 share many similarities even at the gene level. Thus it seems likely that malin is more similar to TRIM32 at both the protein and gene level.
Regarding species representation of TRIM proteins, our analyses confirm previous reports on TRIM family conservation [28]. Although all TRIM proteins have several mammalian orthologs (Figure 4), there are some exceptions that are likely due to incomplete genome sequencing or gene loss: TRIM32 is absent in Gallus gallus and Xenopus laevis (a TRIM32 bird ortholog annotated in ENSEMBL is more similar to TRIM2). TRIM56, present in all mammalian genomes, is absent in monotremes, aves, amphibians and fish. TRIM71 and TRIM2 are clearly the most extended proteins from the analysed group. However, sequence divergence makes it difficult to define the correct orthology between Nematostella vectensis sequences similar to TRIM2 and TRIM71. In addition, TRIM71 is uniquely conserved in nematodes and insects. However, no significant identity with malin or any of these four TRIM proteins was found in protozoan (Figure 4) or fungal genomes (not shown). Cumulatively, these data define the differences between the TRIM family and the RING-NHL protein malin.
4. Malin and TRIM32 share sequence and structural features
Given the similarities between malin and TRIM32, we decided to further investigate these two proteins. An alignment of human TRIM32 and malin illustrates a high degree of identity at both the N-terminus (corresponding with the RING domain in both proteins) and the C-terminus (corresponding to the NHL repeats) (Figure 5A). Malin and TRIM32 are 27% similar overall, and 52% and 38% similar between their RING and NHL domains, respectively. However, TRIM32 possesses a portion that spans from amino acids 198 to 285 (between the B-box domain and the first NHL domain) that is absent in malin. We then mapped mutations in the malin gene found in Lafora disease patients onto the protein alignment (Figure 5A, highlighted in blue). Twenty-one of the thirty-seven Lafora disease missense mutations in malin contained a conserved amino acid in TRIM32. One of those conserved residues (D233 in malin, highlighted in red) aligns with D487 in TRIM32. Interestingly, this amino acid is mutated to malin-D233A in Lafora disease patients [29] and TRIM32-D487N in Limb-Girdle muscular dystrophy patients [30].
To further asses these observations, we generated structural models for the NHL domains of both proteins using the crystallised NHL domain of M. tuberculosis PknD (PDB:1rwl [31]) as a template, which has an identity of 23.5% with malin and 22% with TRIM32. An alignment of the predicted structures (Figure 5B) shows that both the malin and TRIM32 models contain six repeats of the characteristic three/four β-sheet found in NHL domains [21]. Malin mutations analysed in Figure 5A locate mainly in segments of the structure corresponding to β-sheets (Figure 5C). Given their location, it seems probable that deleterious mutations confer structural issues in the NHL domains and likely result in non-functional protein-protein interaction modules. It was especially interesting that residues D233 in malin and D487 in TRIM32 are located in equivalent positions not only at primary structure (Figure 5A), but also at predicted three-dimensional structure level (Figure 5C and 5D). The similarity of these two E3 ubiquitin ligases and their overall conservation for pathologically relevant amino acids prompted us to compare both proteins at a functional level.
5. Malin and TRIM32 are related at functional level
In order to determine whether malin and TRIM32 could have redundant functions, we first studied a physiological substrate of malin that is related to alterations found in Lafora disease. The PP1 regulatory subunit R5/PTG is ubiquitinated by the laforin-malin complex and labelled for degradation [9, 17]. Thus, we analysed whether TRIM32 was able to ubiquitinate R5/PTG. With this aim, we transfected HEK293 cells with His-tagged ubiquitin constructs, myc-R5/PTG and pCINEO-TRIM32 and purified ubiquitinated proteins by metal-affinity column purification [16, 32]. R5/PTG was only ubiquitinated in cells transfected with TRIM32 (Figure 6A). In order to analyze the specificity of the reaction, the same assay was conducted with a catalytically inactive form of TRIM32 with a H42A mutation in the RING domain (TRIM32 H42A). In this case no ubiquitination of R5/PTG was observed (Figure 6A). In order to discard the possibility that the action of TRIM32 on R5/PTG was mediated by endogenous malin, we repeated the ubiquitination experiments in mouse embryonic fibroblast (MEF) cells from a mouse model lacking malin (epm2b-/- mouse). As observed in Figure 6B, the expression of TRIM32 in these cells still promoted the ubiquitination of R5/PTG. All these results provided the first functional linkage between malin and TRIM32.
To determine the extent of this functional link, we decided to study the effect of TRIM32 on another known substrate of the laforin-malin complex, namely the AMP-activated protein kinase (AMPK). AMPK is composed of three subunits (α, β and γ) and we previously demonstrated the specific ubiquitination of all subunits by the laforin-malin complex [16]. Using the same methodology described above, we observed that Flag-TRIM32 produced a robust ubiquitination of both AMPKα and AMPKβ subunits, but not AMPKγ (Figure 7A).
In addition, we focused on the topology of ubiquitin chains produced by TRIM32 in the AMPK subunits. We used different ubiquitin constructs mutated in either Lys48 (K48R) or Lys63 (K63R), unable to oligomerize at those residues. We recently described that the laforin-malin complex promotes the acquisition of K63-linked polyubiquitin chains onto AMPK [16]. Conversely, TRIM32 produced a different polyubiquitin chain topology since it was able to generate ubiquitination in the presence of either K48R- or K63R-ubiquitins (Figure 7B). These results suggest a diversification in the activities of these two E3-ubiquitin ligases, despite sharing common substrates.
Since TRIM32 can ubiquitinate malin substrates, we decided to investigate whether malin could ubiquitinate TRIM32 substrates. It has been described that TRIM32 ubiquitinates, among other substrates, dysbindin (a protein involved in endosomal-lysosomal trafficking and the genetic aetiology of schizophrenia) [33], and PIASy [Protein Inhibitor of Activated STAT (Signal Transducer and Activator of Transcription) isoform y, an E3-SUMO ligase] [34]. Following a similar approach as described above but using myc-dysbindin and myc-PIASy as substrates, we observed that only wild type TRIM32 but not the laforin-malin complex was able to ubiquitinate myc-dysbindin (Figure 8A) and myc-PIASy (Figure 8B). We also tested another unrelated E3-ubiquitin ligase named Mdm2, an E3 ligase for p53, but we found that it could not ubiquitinate PIASy either, under these conditions (Figure 8B). These results indicate that the functional link between malin and TRIM32 is not completely reciprocal.