University of Huddersfield Repository A phylogenomic profile of hemerythrins, the nonheme diiron binding respiratory proteins

Background: Hemerythrins, are the non-heme, diiron binding respiratory proteins of brachiopods, priapulids and sipunculans; they are also found in annelids and bacteria, where their functions have not been fully elucidated. Results: A search for putative Hrs in the genomes of 43 archaea, 444 bacteria and 135 eukaryotes, revealed their presence in 3 archaea, 118 bacteria, several fungi, one apicomplexan, a heterolobosan, a cnidarian and several annelids. About a fourth of the Hr sequences were identified as N- or C-terminal domains of chimeric, chemotactic gene regulators. The function of the remaining single domain bacterial Hrs remains to be determined. In addition to oxygen transport, the possible functions in annelids have been proposed to include cadmium-binding, antibacterial action and immunoprotection. A Bayesian phylogenetic tree revealed a split into two clades, one encompassing archaea, a signal peptide and 2 introns (81862), a signal peptide and 3 introns (174822), and no signal peptide and 3 introns (100875). The seven residues involved in coordina-tion with the two Fe are starred.


Background
Three types of respiratory proteins occur in present day metazoans: hemoglobin, ubiquitous among vertebrates and found in most prokaryotes and eukaryotes [1,2], hemocyanin, present mostly in arthropods and molluscs [3], and hemerythrin (Hr) [4]. The latter occurs in coelomocytes in circulating coelomic fluid and in muscle tissue as MHr, and was originally thought to be limited to three minor protostome phyla, the Sipuncula, Brachiopoda and Priapulida, and one annelid species [4][5][6]. Over the last twenty years, cytoplasmic Hrs have been reported in all three annelid groups, polychaetes [7][8][9], oligochaetes [10], and hirudinae [11][12][13]. A recent molecular phylogenetic study of sipunculan Hrs has shown them to have a close relationship to annelid Hrs [14]. A Hr sharing > 43% identity with annelid Hrs, was found in a search for antigenrelated genes expressed in the heterolobosan Naegleria fowleri, the causative agent of primary amoebic meningoencephalitis [15,16]. In the last few years, Hrs have been found in bacteria, as a single domain protein in the γ-proteobacterium Methylococcus capsulatus [17], and as a C-terminal domain of a chimeric, methyl accepting chemotaxis protein in the sulfate-reducing δ-proteobacterium Desulfovibrio vulgaris [18].
The crystal structures of metazoan Hrs and MHrs are very similar [19,20], a four helix bundle of antiparallel α-helices (A through D) formed by polypeptide chains of 113aa and 118aa, respectively. The active site consists of two oxo-/hydroxo-bridged Fe atoms (Fig. s1 in Additional file 1). Fe1 is coordinated to three His side-chain groups in helices C and D, and Fe2 is coordinated to two His sidechain groups in helices A and B; the carboxylate side-chain groups of a Glu in helix C and an Asp in helix D, bridge both irons. Although the D. vulgaris Hr domain is somewhat longer than metazoan Hrs, 130aa, it has a very similar structure [21].
We report below the results of an exhaustive search for putative Hrs within the available genomes from the three kingdoms of life and the isolation of Hr genes in several annelids. Furthermore, we describe for the first time the intron-exon structure of metazoan Hr genes, provide evidence for an extracellular occurrence of leech Hr, and discuss the implications of the phylogenomic distribution of Hrs.

Eukaryote Hrs
The previously known and the newly identified Hr sequences are listed in Additional file 1 in Table s1, together with their manual alignments shown in Fig. s2. In addition to the metazoan Hrs identified earlier [14], we have sequenced putative Hr genes from the sipunculan S. nudus (Hr: AM886444 and MHr AM886445), the deep-sea hydrothermal vent vestimentiferan R. pachyptila (AM886446) and the polychaete S. armiger (AM886447). Blastp searches revealed putative Hrs in the apicomplexan Plasmodium yoelii, the heterolobosan Naegleria gruberi, the cnidarian Nematostella vectensis (Radiata), the oligochaete Lumbricus rubellus, the polychaete Periserrula leucophryna, and the hirudineans Haementeria depressa and Helobdella robusta. Although most eukaryotes have one or two Hrs, the genomes of N. gruberi and H. robusta have 5 and 13 Hrs, respectively. No Hrs were found in the genome of the polychaete Capitella sp.I http://www.jgi-psf.org/Capca1/ Capca1.info.html. Putative Hrs were also found in 10 Ascomycota and 3 Basidiomycota, out of a total of > 50 fungal genomes: all have very similar sequences, substantially different from other Hrs. We have used FUGUE, which recognizes sequence-structure homology using environment-specific substitution tables and structuredependent gap penalties [22] to define whether they should be considered to be Hrs. Although their FUGUE Z scores range from 6 to 8, interpreted as a certain assignment [22], they all share the following alterations in the Hr motif ( Fig. s2 in Additional file 1): absence of the conserved Trp in the pre-helix A and of the Asp in helix A, substitution of Asp for His in helix C, and of Glu for Asp in helix D. Of the WLV triplet in helix D, only the Leu residue (corresponding to L103 in the eukaryote sequences), which is known to play an important role in Hr function [23], is conserved. It remains to be determined whether the foregoing alterations compromise the structural or functional integrity of the fungal Hrs.

Intron-exon structure of metazoan genes
Since the intron-exon structure of Hr genes was unknown, we determined the locations of introns in Hr genes from the cnidarian N. vectensis (XP_001622541.1|GI:1563515 02), R. pachyptila, and S. nudus. An alignment of the sequences showing the different intron locations is given in Fig. 1. There are 2 introns in S. nudus Hr and MHr, the first one located just prior to helix A and the other at the end of helix B, both in phase 0. The annelid Hr genes have 2 or 3 introns: the locations of the first two introns are identical in the polychaete vestimentiferan R. pachyptila and the hirudinae H. robusta, and correspond to the locations in S. nudus Hr. A third intron (in phase 2) occurs in the middle of helix D in some members of the multigenic Hr family of H. robusta (Fig. 1). Although two introns are also found in the N. vectensis Hr gene, they occur at different locations (Fig. 1). No introns were found in the apicomplexan and protozoan Hrs.

Signal peptide identification
SignalP 3.0 http://www.cbs.dtu.dk/services/SignalP was employed to locate probable signal peptide cleavage sites [24]. Of the 13 putative Hrs found in the genome of the leech H. robusta, 8 appear to have atypical N-terminals with a clearly identifiable signal peptide cleavage site (  Table 1 shows the distribution of single-domain and chimeric Hrs in the main bacterial groups that have Hrs: 242 (74%) are single-domain Hrs and 84 (26%) are domains in chimeric proteins. No Hrs were found in the genomes of Bacteroidetes/Chlorobi, Chlamydiae/Verrumicrobia, Chloroflexi, Deinococcus/Thermus, Fusobacteria, Nitrospirae and Thermotogales. The number of Hrs per genome varies widely, from 1 to as many as 31 in

Intron 2 Annelids and sipunculans
Helix A Helix B Helix D Helix C  Table s3 (Additional file 1) did not reveal any correlation with Hr presence: only 9 were host associated. Table s4 in Additional file 1 lists the Hr sequences found to deviate from the canonical Hr sequence, either through alteration of one or more residues involved in iron coordination or loss of a helical segment: 58 out of the 327 bacterial sequences (18%) in 34 genomes, and one annelid. The alterations are listed in Table s5 in Additional file 1. Of the 59 deviant sequences, 11 have alterations in two helices and 4 lack a helical segment. The number of alterations in each of the four helices A, B, C and D, is 10, 11, 45 and 6, respectively. The overwhelming majority are substitutions of one of the 5 His residues whose sidechain groups coordinate the Fe atoms; only 5 alterations in the two acidic residues are evident. Most are found in helix C (45/71 = 63%), with several co-occurring with alterations in one other helix. The most common His substitutions are by Gln (24/71 = 34%), by a hydrophobic residue (A/V/L/I/M/Y) (19/71 = 27%), by Asn (7/71 = 10%) and by Glu/Asp (7/71 = 10%).

Molecular phylogeny
A global Bayesian phylogenetic tree of 92 Hr sequences, comprising 42 metazoan, 3 protozoan, 16 fungal and 31 prokaryote Hrs, is shown in Fig. 2. Independent clusters are formed by the prokaryote and fungal Hrs on one hand, and the apicomplexan, heterolobosan and metazoan Hrs on the other, supported by a posterior probability of 0.88. In the prokaryote clade there is extensive polytomy which does not allow discrimination between archaea and bacteria. Furthermore, the putative fungal Hrs are closely clustered with the prokaryote Hrs with a posterior probability of 1. In the eukaryote branch, the apicomplexan (Plasmodium yoelii) and the heterolobosan (N. fowleri and N. gruberi) Hrs are basal to the protostome phyla, also with high posterior probabilities. The annelid, sipunculan, brachiopod and cnidarian (N. vectensis) Hrs are not resolved into individual clades. Furthermore, there is also a polytomy at the base of the metazoan clade, including the cnidarian Hr, expected to occur at the base of the Bilateria, together with several annelid Hrs. It should be pointed out that Bayesian phylogenetic trees constructed using subsets of the total number of Hr sequences, also gave topologies identical to that obtained above (see Figs. s3 and s4 in Additional file 1).

Distribution and function in eukaryotes
The distribution of Hrs in eukaryotes is limited to fungi, the apicomplexan Plasmodium yoelii, the heterolobosan Naegleria and five metazoan phyla-the cnidarian N. vectensis, annelids and three minor phyla, the sipunculans, brachiopods and priapulids. The presence of Hrs in all three major annelid groups, the hirudinae, oligochaetes and polychaetes, suggests that they may be ubiquitous in Annelida. However, given their absence in the genome of the polychaete Capitella sp.I, the extent of Hr occurrence in annelids remains to be determined.
The intron-exon structures of the MHr and Hr genes of S. nudus suggest that they emerged via a duplication event.
Although no oligochaete Hr gene structure is known to date, the identical polychaete and hirudinean intron locations supports the notion of a common Hr ancestor to the sipunculans and annelids [14]. The presence of a third intron in some members of the H. robusta multigenic Hr family suggests an intron gain during the emergence of this species. The presence of two introns in N. vectensis Hr, inserted in positions different from the other metazoan Hrs (Fig. 1), indicates a different evolution of Hr genes in the Radiata relative to the Bilateria. Overall, it appears that the Hr gene was lost in the ancestor to the deuterostomes and conserved only in a few protostomes after the Radiata-Bilateria transition and the protostome-deuterostome split. The unexpected identification of signal peptide cleavage sites in some Hrs from the leech H. robusta (Fig.  1), implies that these Hrs are directly released into coelomic or vascular compartments, similar to the extracellu- lar annelid hemoglobins [25]: to our knowledge this is the first known instance of possible extracellular Hr location.
The Hrs in circulating, nucleated coelomocytes within the coelomic and tentacular fluid compartments and the cytoplasmic MHrs in Sipuncula, Brachiopoda, Priapulida and the polychaete Magelona papillicornis, have O 2 binding properties consonant with physiological roles of O 2 transport and storage [4,26]. Since annelids generally have intracellular or extracellular Hbs or both [27], their Hrs are likely to have functions other than O 2 transport. The Hrs of the polychaete N. diversicolor and the oligochaete A. caliginosa have been proposed to function as scavengers of heavy metals, such as Cd [8,9,28] and an antibacterial function has been proposed for the former [29]. In the leech Hirudo medicinalis, Hr occurs in neural and other tissues and is upregulated in response to septic injury [12]. A Hr was identified as a major component of mature oocytes in the leech T. tessulatum [30]: its presence throughout oogenesis suggests a more complex function than just a nutrient for the embryo, perhaps in iron storage and detoxification. In the leech H. medicinalis, Hr plays a role in the innate immune response of the nervous system to bacterial invasion [11]. The binding of sulfide by the Hr in the hemolymph of the priapulid Halicryptus spinulosus [31], suggests a possible role in sulfide detoxification. Hrs are also antigenic [32]: the Hr in the amoeba N. fowlerii was discovered in a search for the antigenrelated activity of this parasite [12].

Distribution and function in prokaryotes
Our survey demonstrates the presence of putative Hrs in < 10% of archaeal genomes (4 out of 43) and in < 30% of bacterial genomes (118 out of 444). In Archaea, Hrs occurs only in one of the two major groups, the Euryarchaea, and only in the Halobacteria, Methanococci and Methanomicrobia. In Bacteria, about 80% of the genomes containing Hrs belong to the Proteobacteria. Furthermore, we find that one of 6 archaeal and about one fifth (18%) of the putative bacterial Hr sequences have one or more alterations potentially affecting the integrity of the diiron binding site. Although we do not know how many of the altered sequences listed in Table s4 in Additional file 1 retain their function, we are left with a very sparse and episodic distribution of Hrs among the prokaryotes, of which one fifth appear to have mutated away from the canonical Hr motif. The overwhelming majority of the altered sequences are single domain Hrs, implying that their function may be less important to the survival of the organism than the chimeric Hrs.
Karlsen et al. [17] have cloned the gene for a 131aa Hr from the methanotrophic γ-proteobacterium M. capsulatus, and found that its in vivo expression increased with increase in the copper content of the growth medium, implying a possible function as O 2 -provider to the O 2requiring, membrane-associated methane monooxygenase, the enzyme responsible for oxidizing methane in M. capsulatus grown at high copper concentrations. Although nothing is known about the role of other SDHrs in bacteria, the 959aa ChHr from the sulfate-reducing δ-ptoteobacterium D. vulgaris, has been shown to be a chemotactic protein with a C-terminal Hr domain [34]. Chemotactic proteins generally comprise a periplasmic N-terminal sensor domain, linked via a trans-membrane domain to a Cterminal cytoplasmic transmitter domain. A phosphorylation/methylation cascade triggered by an environmental stimulus is transduced from the sensor to the transmitter domain, resulting in an alteration of the flagellar motion, allowing movement up or down a concentration gradient of the stimulus [35,36]. D. vulgaris is microaerobic and prefers to swim to a specific O 2 concentration range [37]. On the basis of a crystal structure of the expressed Hr domain of DcrH, and consistent with its cytoplasmic localization, Kurtz et al. [18] proposed that DcrH functions as an anaerotactic O 2 sensor. There appear to be at least three more putative chimeric proteins with C-terminal Hr domains as well as two SDHrs in D. vulgaris (Table  s3 in Additional file 1).
One final interesting observation resulting from our survey, is the presence of multiple SDHrs and ChHrs in the genomes of several magnetotactic bacteria, e.g. Magnetococcus sp., Magnetospirillum magneticum and M. magnetotacticum, with 14 (6SDHrs, 8ChHrs), 37 (27SDHrs, 10ChHrs) and 31 (22SDHrs, 9ChHrs) Hrs, respectively (Table s3 in Additional file 1), also observed earlier [33]. There are however, many magnetotactic bacteria which apparently do not have Hrs. Magnetotaxis, the ability to align and move along geomagnetic field lines, enables bacteria to be more efficient in locating a desired position in the vertical O 2 concentration gradient in their aquatic environments: it depends on the presence of specialized organelles, magnetosomes, comprised of Fe 3 O 4 /Fe 3 S 4 crystals enclosed in a lipid bilayer membrane derived from the cytoplasmic membrane [38,39]. It remains to be determined whether Hrs have any role in magnetosome formation or function.
Overall our results are in agreement with the results of a very recent review of bacterial Hrs by French et al. [33], published while this manuscript was in preparation. These authors suggest that single domain Hrs may function in the delivery of O 2 to oxygenases and respiratory oxidases, implied by the findings of Karlsen et al. [17] and consonant with the retention by the bacterial Hrs of the complete molecular signature of the O 2 binfing Hrs in sipunculans and brachiopods.

Molecular phylogeny and evolution of Hrs
The global Bayesian phylogenetic tree shown in Fig. 2, shows that the Opisthokont (animal and fungal) Hrs do not cluster together, as would be expected according to the consensus phylogeny of Baldauf [40]. Furthermore, the metazoan Hrs group together with two evolutionarily distant groups, the Alveolates (Apicomplexa) and the Discicristates (Heterolobosa) [41]. The clustering of fungal Hrs with the bacterial sequences suggests the possibility of horizontal gene transfer from bacteria to fungi. Alternatively, the Long Branch attraction effect during the molecular phylogeny reconstruction process could have resulted in an artefactual clustering with bacteria [42]. The radial phylogenetic tree representation with distances provided in Fig. 3, clearly shows the long distance separating the fungal and prokaryote clusters.
It is plausible to assume that α-helical bundles were among the earliest protein folds to emerge since the beginning of life, well-adapted to the binding of metal ions and small organic molecules. Consequently, both Hrs and globins are two very ancient protein families, which emerged as adaptations to possible environmental challenges to the last universal common ancestor (LUCA) or populations of microbial organisms representing LUCA. These adaptations would include the need to sequester reduced iron, which was probably abundant on early Earth, the ability to control locally excessive O 2 concentrations , which would have been lethal to anaerobic life, and the need to detoxify nitric oxide produced in O 2rich environments [43]. Another, equally plausible early function, would have been chemotactic sensing, enabling anaerobic organisms to avoid high O 2 concentrations; both aerophilic and aerophobic responses would have survival value throughout bacterial evolution (K. Van Holde, personal communication). This alternative is supported by the presence of chemotactic Hr-containing proteins and of globin-coupled sensors capable of eliciting either an aerophilic or aerophobic response [44]. However, only 39 of 118 (33%) Hr-containing bacterial genomes have ChHrs (Table s4 in Additional file 1) and 93 of 264 (35%) globin-containing bacterial genomes have globin-coupled sensors [43]. Thus, in extant prokaryotes, chemotactic sensing appears not to be a major function in the two protein families; what then is the function of the single domain Hrs in prokaryotes? The similarity of the amino acid sequences of the prokaryote and metazoan Hrs indicates that O 2 binding is likely to be involved in the function of the former, mentioned earlier [33].
Comparison of the phylogenomic profile of Hrs and globins (2), underscores the contrast in the evolutionary fates of the two protein families: presence in < 10% versus 25% of archaeal genomes, < 20% versus ~60% of bacterial genomes and ~13% versus > 80% of eukaryote genomes, respectively. In particular, the ~13% Hr presence in eukaryotes is greatly exaggerated because of the overrepresentation of fungi in the sequenced eukaryote genomes. Furthermore, unlike Hrs, globins are found in every major bacterial group, occur widely in eukaryotes and are ubiquitous among plants and vertebrates. Compared to globins, Hrs have barely maintained a foothold in living organisms, particularly multicellular ones. The apparent lack of evolutionary success of Hrs versus globins could be due to the greater probability of potentially damaging mutations in the former relative to the latter: seven residues binding the two Fe versus only the proximal His binding to the heme group. Alterations affecting one or more of the Fe-coordinating amino acid residues as well as the structure of the O 2 -binding cavity can be expected to have a direct deleterious effect on Hr function [45].

Conclusion
A survey of putative Hrs demonstrated a limited occurrence in bacteria and archaea and a marked absence in the vast majority of multicellular organisms. Among the metazoa, Hrs have survived in a cnidarian and in a few protostome groups; hence, it appears that in metazoans the Hr gene was lost in deuterostome ancestor(s) after the radiata/bilateria split. Signal peptide sequences in several Hirudinea Hrs suggest for the first time, the possibility of extracellular localization. Since the α-helical bundle is likely to have been among the earliest protein folds, Hrs represent an ancient family of iron-binding proteins, whose primary function in bacteria may have been that of an oxygen sensor, enabling aerophilic or aerophobic responses. Although Hrs evolved to function as O 2 transporters in brachiopods, priapulids and sipunculans, their function in annelids remains to be elucidated. Overall Hrs exhibit a considerable lack of evolutionary success in metazoans.

Identification of Hr squences
Two approaches were used to identify putative Hrs in the genomes of 37 archaea, 440 bacteria and 135 eukaryotes. In one, we examined the gene assignments based on a library of hidden Markov models [46], listed on the SUPERFAMILY site http://supfam.mrc-lmb.cam.ac.uk, discarding sequences shorter than 100aa. In the other, we performed blastp and tblastn (version 9.2.2) and psiblast searches, using the improved version with composition based statistics [47], of completed and unfinished genomes in the GenBank http://www.ncbi.nlm.nih.gov/ BLAST/. In cases of borderline sequences, searches employing PFAM [48]http://pfam.sanger.ac.uk and FUGUE [22]http://tardis.nibio.go.jp/fugue were used to determine whether they should be accepted as a Hr.
Radial representation of the Bayesian phylogenetic tree of the reduced set of Hr sequences, comprising 31 bacterial, 16 fungal, 3 protozoan and 42 metazoan Hrs