Comprehensive computational analysis of Hmd enzymes and paralogs in methanogenic Archaea

Goldman, Aaron D; Leigh, John A; Samudrala, Ram

doi:10.1186/1471-2148-9-199

Research article
Open access
Published: 11 August 2009

Comprehensive computational analysis of Hmd enzymes and paralogs in methanogenic Archaea

Aaron D Goldman^1,2,
John A Leigh^1,2 &
Ram Samudrala¹

BMC Evolutionary Biology volume 9, Article number: 199 (2009) Cite this article

5517 Accesses
14 Citations
Metrics details

Abstract

Background

Methanogenesis is the sole means of energy production in methanogenic Archaea. H₂-forming methylenetetrahydromethanopterin dehydrogenase (Hmd) catalyzes a step in the hydrogenotrophic methanogenesis pathway in class I methanogens. At least one hmd paralog has been identified in nine of the eleven complete genome sequences of class I hydrogenotrophic methanogens. The products of these paralog genes have thus far eluded any detailed functional characterization.

Results

Here we present a thorough computational analysis of Hmd enzymes and paralogs that includes state of the art phylogenetic inference, structure prediction, and functional site prediction techniques. We determine that the Hmd enzymes are phylogenetically distinct from Hmd paralogs but share a common overall structure. We predict that the active site of the Hmd enzyme is conserved as a functional site in Hmd paralogs and use this observation to propose possible molecular functions of the paralog that are consistent with previous experimental evidence. We also identify an uncharacterized site in the N-terminal domains of both proteins that is predicted by our methods to directly impart function.

Conclusion

This study contributes to our understanding of the evolutionary history, structural conservation, and functional roles, of the Hmd enzymes and paralogs. The results of our phylogenetic and structural analysis constitute datasets that will aid in the future study of the Hmd protein family. Our functional site predictions generate several testable hypotheses that will guide further experimental characterization of the Hmd paralog. This work also represents a novel approach to protein function prediction in which multiple computational methods are integrated to achieve a detailed characterization of proteins that are not well understood.

Background

The methanogens are a diverse, but phylogenetically related, group of Archaea. Methanogenic Archaea have been isolated from habitats ranging from mammalian gut flora to deep sea hydrothermal vents. Methanogens are comprised of two taxonomic classes known as class I and class II [1–3]. Class I methanogens include the orders Methanococcales, Methanobacteriales, and Methanopyrales, while class II methanogens include the orders Methanosarcinales and Methanomicrobiales.

The three known methanogenesis pathways are distinguished with regards to the electron source. These are hydrogenotrophic methanogenesis, acetoclastic methanogenesis, and methylotrophic methanogenesis [4]. Hydrogenotrophic methanogenesis involves the reduction of CO₂ to CH₄, utilizing H₂ and reduced cofactors as electron donors through a seven step pathway (Figure 1). Many hydrogenotrophic methanogens are autotrophic, requiring only CO₂, H₂, and inorganic salts to produce energy through methanogenesis and synthesize biomass through CO₂ fixation [5].

The fourth step in the hydrogenotrophic methanogenesis of class I methanogens involves the reduction of N⁵,N¹⁰-methenyltetrahydromethanopterin (methenyl-H₄MPT) to N⁵,N¹⁰-methylene-H₄MPT. Class II methanogens differ in their use of methanosarcinapterin rather than H₄MPT as the C₁ carrier. This step in class I methanogens can be carried out by either of two different enzymes. Coenzyme F₄₂₀-dependent methylene-H₄MPT dehydrogenase (Mtd) reduces methenyl-H₄MPT using reduced coenzyme F₄₂₀ as the electron donor. H₂-forming methylene-H₄MPT dehydrogenase (Hmd) reduces methenyl-H₄MPT to methylene-H₄MPT using H₂ as an electron source. Afting et al. [6] observed in Methanothermobacter marbugensis that Hmd has a specific activity greater than that of Mtd under nickel-limited, ammonia-limited, and non-limited conditions while Mtd has a specific activity greater than that of Hmd under hydrogen-limited conditions. Hendrickson et al. [7] observed in Methanococcus maripaludis that hmd is upregulated proportional to growth rate and mtd is upregulated under hydrogen limitation.

The Hmd holoenzyme is comprised of a homodimer of 38 kDa subunits, two pyridone derivative cofactor molecules, and two iron atoms [8]. Each iron atom coordinates the reduction of methenyl-H₄MPT and oxidation of H₂ while bound to both Hmd and a cofactor molecule [8, 9]. The apoenzyme of Hmd is stable and can be restored to active holoenzyme by the addition of cofactor [9]. Hmd is the only known hydrogenase that lacks an iron-sulfur cluster and is sometimes referred to as the 'iron-sulfur cluster-free hydrogenase'.

Almost all genomes of class I hydrogenotrophic methanogens contain both an hmd enzyme gene and at least one hmd paralog gene. Several species have two copies of the hmd paralog (referred to in this manuscript with arbitrary numeration as paralog₁ and paralog₂; see Additional file 1). Afting et al. [6] first showed in M. marburgensis that the protein products of hmd paralogs are present in the cell. Their study also revealed that Hmd paralog₁ is detectable at low H₂, while Hmd paralog₂ is detectable at high H₂ and that neither paralog show any observable hydrogenase activity. Recent unpublished work mentioned in a review by Shima and Thauer [10] indicates that Hmd paralog₁ from Methanocaldococcus jannaschii can competitively bind cofactor and inhibit the activation of Hmd apoenzyme. Curiously, Hmd paralog₁ in M. jannaschii was shown by Lipman et al. [11] to specifically bind prolyl-tRNA synthetase. While these results taken together constitute a partial characterization of Hmd paralogs, our understanding of these proteins and their role in methanogenesis is far from complete.

Here we present advanced computational analyses of Hmd enzymes and their paralogs from the genomes of sixteen class I hydrogenotrophic methanogens. The relationship of hmd enzyme and paralog sequences is demonstrated through phylogenetic analysis. The tertiary structures of Hmd enzymes and paralogs from five representative species are predicted using the top ranking modeling server of the last two CASP competitions [[12]; http://predictioncenter.org/casp8/]. Functional characterization of the Hmd paralogs is performed using a state of the art method recently developed by our group [13]. Taken together, these analyses form a thorough computational characterization of the Hmd enzymes and paralogs and generate several testable hypotheses regarding the molecular functions of both Hmd enzymes and paralogs.

Results and discussion

Sequence analysis

An exhaustive search for hmd genes was performed using PSI-BLAST [14] and the MetaCyc multi-genome browser [15]. This process identified thirty hmd enzyme and paralog sequences from sixteen species and strains of class I hydrogenotrophic methanogens. Several methanogen prephenate dehydrogenase genes were also identified by our search. We use these genes as a phylogenetic outgroup in the subsequent analysis. Complete genome sequences are available for eleven of the sixteen species and strains. Of these eleven, only the genomes of Methanocorpusculum labreanum and Methanobrevibacter smithii contain an hmd enzyme but not an hmd paralog. All Methanococcus spp. have only one hmd paralog gene, while Methanocaldococcus jannaschii, Methanothermobacter marburgensis, Methanothermobacter thermautotrophicus, and Methanopyrus kandleri have two hmd paralog genes. No species was found to have an hmd paralog, but not an hmd enzyme. Features of these genes, their GenInfo Identifiers, and their associated references [[16–23]; Copeland et al., unpublished data; Hartmann and Thauer, direct submission to NCBI databases 1996] are presented in Additional file 1. A ClustalW2 alignment of the protein sequences of these genes is included as Additional file 2.

Phylogenetic analysis

Phylogenetic analysis of the thirty Hmd enzyme and paralog sequences was performed by three independent methods. In each tree, the three prephenate dehydrogenase sequences were used as an outgroup. Figure 2 shows the three trees and specifies the software, calculation algorithm, amino acid substitution matrix, and confidence score calculation method used to generate them. Though branch lengths differ between trees, the overall topology is identical between the PhyML [24] and MrBayes [25] trees and differs in only three terminal nodes of the Phylip [26] tree.

In all three trees, Hmd enzymes and paralogs form two distinct monophyletic groups. Curiously, the Hmd enzyme and paralog subtrees are considerably dissimilar regarding the placement of M. jannaschii sequences. These sequences are more basal in the paralog subtree than the enzyme subtree (with the exception of Hmd paralog₁ in the Phylip tree). Bifurcation patterns in the tree suggest that paralog duplication has taken place independently in the lineages leading to M. jannaschii, M. kandleri, and the last common ancestor of M. marburgensis and M. thermautotrophicus. The two Hmd paralogs of M. jannaschii are paraphyletic in the PhyML and MrBayes trees and polyphyletic in the Phylip tree. The paralog duplicates of M. kandleri and the last common ancestor of M. marburgensis and M. thermautotrophicus both produce monophyletic topologies. It should be noted that M. marburgensis and M. thermautotrophicus were considered strains of a single species until recently [21].

These trees do not provide a conclusive explanation for the lack of a paralog sequence in M. labraenum or M. smithii. M. labraenum and M. smithii enzyme sequences are not basally branching, but were inherited from the last common ancestor of these species and the Methanothermobacter genus. Given that the M. kandleri paralog sequences appear in a subtree with the other paralog sequences, rather than branching from the base of the tree, it is likely that both M. labraenum and M. smithii lost the Hmd paralog late in evolution. It is therefore probable, but not certain, that the last common ancestor of all class I methanogens had both an Hmd enzyme and paralog.

Structure modeling

Tertiary structure models of fourteen representative Hmd enzymes and paralogs were generated with I-TASSER [27, 28], which was the best performing structure modeling server in the two most recent CASP competitions [[12], http://predictioncenter.org/casp8/]. The I-TASSER algorithm is an advanced modeling method that searches the SCOP database [29] for parent template structures, uses these parent structures to comparatively model short segments of the query protein, and connects these segments using de novo modeling techniques. Because the modeling is not dependent on comparison to a single homolog, this method can be considered a form of de novo structure modeling.

The structure of the Hmd enzyme from M. jannaschii has previously been solved by X-ray diffraction [[8]; PDB ID = 2b0j]. This structure was the most often used parent template of the top C-scoring [27] model of each protein. The next three most often used parent structures were dehydrogenases. These parent structures were arogenate dehydrogenase from Synechocystis sp., hydroxyisobutyrate dehydrogenase from Homo sapiens, and prephenate dehydrogenase from Aquifex aeolicus. The resulting I-TASSER models were evaluated by both the C-score [27] and residue-specific all-atom probability discriminatory function (RAPDF) [30] scoring functions. These scoring functions measure the relative accuracy of a given model compared to other models of the same protein. C-score is determined by clustering the thousands of intermediate models generated during the I-TASSER run. Structures in the center of the largest clusters are assumed to be the most accurate. RAPDF determines the quality of a model by calculating the sum of logodds scores for all interatomic distances within the model derived from frequencies observed in diffraction structures. The model with the highest C-score also had the best RAPDF score in the case of all five Hmd enzymes and two of the nine Hmd paralogs. Figure 3 shows all top C-scoring and RAPDF-scoring models mapped onto a PhyML [24] phylogeny of the corresponding sequences. A summary of features of these models is given in Table 1. A concatenated file of all top C-scoring and RAPDF-scoring models in PDB format is available as Additional file 3.

Table 1 Features of Hmd enzyme and paralog structure models

Full size table

All models are composed of two distinct folding regions, a 200–300 amino acid N-terminal domain which contains both α-helices and β-sheets and a ~50 amino acid C-terminal domain containing only α-helices. According to the diffraction structure of the Hmd enzyme, catalytic activity takes place within the N-terminal domains while dimerization occurs between the C-terminal domains of subunits [8, 9]. To gauge the structural conservation between Hmd enzymes and paralogs, root mean square deviations (RMSDs) between the models and the diffraction structure were calculated with respect to the whole protein, the N-terminal domain only, and the C-terminal domain only.

The RMSD between model and diffraction structure is significantly lower with respect to C-terminal domains than N-terminal domains for 10 out of 21 models. These models are Hmd enzyme_RC from M. kandleri, Hmd paralog_1-R, Hmd paralog_1-C, and Hmd paralog_2-R from M. thermautotrophicus, Hmd paralog_2-R and Hmd paralog_2-C from M. marburgensis, Hmd paralog_R from M. maripaludis, Hmd paralog_1-R, Hmd paralog_2-R, and Hmd paralog_2-C from M. jannaschii, and Hmd paralog_2-R and Hmd paralog_2-C from M. kandleri. The RMSD of the C-terminal domains of the Hmd enzyme_RC from M. maripaludis and the diffraction structure of Hmd was higher than that of the N-terminal domain. ClustalW2 multiple sequence alignments [31] of the query protein with its I-TASSER parent structures are available as Additional file 4. Visual analysis of these alignments suggests that the modeling is not biased towards one of the two domains due to sequence similarity with the parent structures. These results therefore indicate that the C-terminal domain is more structurally conserved between Hmd enzyme and paralog than the N-terminal domain.

Function prediction by Protinfo MFS comparison

The Meta-Functional Signature score (MFS) was used in conjunction with multiple sequence alignment to predict functional sites and functional similarity between Hmd enzymes and paralogs. MFS is part of the Protinfo suite of algorithms http://protinfo.compbio.washington.edu/ and predicts the functional sites of a protein with higher accuracy than other currently available algorithms [13]. For a given protein, the MFS algorithm quantifies and measures multiple orthogonal features of each amino acid pertaining to either the evolutionary conservation of the amino acid, the contribution of the amino acid to structural integrity, or the frequency in which the residue type itself is found in known functional sites. These features are combined to give the MFS score, which represents the probability that a given amino acid contributes directly to function.

MFS scores were calculated for each model summarized in Table 1. The raw MFS data are available as Additional file 5. Any residue with an MFS score in the top ten out of the whole protein was considered a putative functional residue. A ClustalW2 multiple sequence alignment [31] was used to tally the number of putative functional sites that appear in the same alignment position across multiple species (Figure 4). This analysis served two purposes. First, the comparison of putative functional sites across either Hmd enzymes or paralogs provided an ad hoc bootstrapping of the MFS predictions. Second, the comparison of putative functional sites between Hmd enzymes and paralogs was used to ascertain whether they share common functional attributes. The unabridged superimposition of MFS data onto a full ClustalW2 alignment of all modeled Hmd proteins is available as Additional file 6.

In fifteen such alignment positions, putative functional residues were predicted in at least 40% of either Hmd enzymes or paralogs. In five of these fifteen alignment positions, putative functional sites were predicted in at least 40% of Hmd enzymes and at least 40% of Hmd paralogs. Figure 5A shows representative residues from these fifteen alignment positions mapped onto the diffraction structure of Hmd enzyme [8] and the structure model of Hmd paralog₁ from M. jannaschii. All fifteen residues are located within the N-terminal domain of the protein. The paucity of these residues in the C-terminal domain of either protein is most likely due to its involvement in dimerization rather than enzymatic function.

Four of the five alignment positions in which multiple putative functional residues are conserved between Hmd enzymes and paralogs cluster into a single distinct region (Figure 5B). This cluster is comprised of H174, C176, T177, and H201 in Hmd enzyme and N125, C127, T128, and H154 in Hmd paralog₁ from M. jannaschii (Figure 5C). In the Hmd enzyme from M. jannaschii, C176 was previously demonstrated to bind the cofactor and coordinate the iron and substrate [8, 9]. This cluster of putative functional sites therefore represents the active site of the Hmd enzyme. The H174 residue of the Hmd enzyme corresponds to the N125 residue of Hmd paralog₁. Thus the functional importance of this site appears to be conserved while the residue type itself is not. These results are consistent with the independent observations that the Hmd paralog₁ of M. jannaschii is able to competitively bind the Hmd cofactor [10] and that both Hmd paralogs of M. marburgensis are unable to catalyze a hydrogenase/dehydrogenase reaction [6] (see Background). A second predicted common functional site between Hmd enzymes and paralogs is comprised of a single amino acid, D143 in Hmd enzyme and E94 in Hmd paralog₁ (Figure 5D). The functional relevance of this region is yet unknown. There is no experimental evidence that all Hmd paralogs are functionally equivalent. Our analysis however is not dependent on all Hmd paralogs having a single common function. Rather all Hmd paralogs are predicted here to have a common ancestral function and still maintain common features of function, such as the locations of functional sites.

Lipman et al. [11] demonstrated that Hmd paralog₁ from M. jannaschii specifically binds prolyl-tRNA synthetase. The biological significance of this binding has not been examined in a published study since this initial work. Lipman et al. observed that mutations V248A and L252A reduced this binding 4-fold. In our MFS calculation for Hmd paralog₁ from M. jannaschii, V248 has a score of 0.05 and L252 has a score of 0.22. Val and Leu are typically not conserved within protein-protein binding "hot spots" [32]. It may be the case that V248 and L252 represent structurally important residues in Hmd paralog₁ that do not contribute directly to function. Thus, our MFS analysis cannot confirm the biological relevance of Hmd paralog₁ binding to prolyl-tRNA synthetase in M. jannaschii.

Conclusion

This study offers an in depth computational analysis of the relationship between the sequences, structures, and functional features of Hmd enzymes and paralogs in class I hydrogenotrophic methanogens. Phylogenetic analysis of thirty hmd enzyme and paralog genes from sixteen species and strains confirms that the genetic predecessors of modern Hmd enzymes and paralogs were present in the last common ancestor of all class I hydrogenotrophic methanogens. Structural modeling of fourteen representative Hmd enzymes and paralogs reveals a common structural arrangement comprised of one large N-terminal domain containing α-helices and β-sheets and one smaller C-terminal domain containing only α-helices.

Functional site prediction was performed by the calculation of Meta-Functional Signature (MFS) scores for the fourteen modeled Hmd enzymes and paralogs [13]. MFS comparison across a multiple sequence alignment revealed five functional sites conserved between Hmd enzymes and paralogs. The superimposition of these sites onto representative structures of the Hmd enzyme and paralog showed that the enzyme active site is maintained as a functional site in the paralog. One of the four functionally conserved residues in this functional site is a His in Hmd enzymes and an Asn in most Hmd paralogs. We conclude from these observations that the molecular function of the Hmd paralog is similar but not identical to the enzyme. Our analysis also predicted a second site of common function between Hmd enzymes and paralogs that is yet uncharacterized. Our MFS data did not substantiate the observation of Lipman et al. [11] that Hmd paralog₁ in M. jannaschii specifically binds to prolyl-tRNA synthetase.

Previous experimental work has demonstrated that Hmd paralogs do not enzymatically catalyze hydrogenase/dehydrogenase reactions [6], but are able to competitively bind the Hmd enzyme cofactor [10]. Our results indicate that the catalytic site of the Hmd enzyme is conserved as a functional site in Hmd paralogs, but that the molecular function of the paralog differs from that of the enzyme due to at least one key amino acid substitution. Given these observations, it is possible that the Hmd paralog is responsible for acting as a reservoir for the Hmd enzyme cofactor when H₂ is low and the Mtd reaction is favored over the Hmd reaction (see Background). Alternatively, the Hmd paralog may act as a scaffold for cofactor synthesis. These hypotheses warrant experimental verification.

The datasets and predictions generated in this study provide a guide for future experimental characterization of the Hmd protein family. This work also serves as an example of detailed protein function prediction that can be achieved by the combination of multiple independent computational techniques. We are currently working to optimize and generalize the method presented here. Such an approach will increase the accuracy of protein function prediction and help to guide the early steps of experimental protein characterization.