BMC Evolutionary Biology BioMed Central BMC 22002, Evolutionary Biology

Background The complete genomes of three animals have been sequenced by global research efforts: a nematode worm (Caenorhabditis elegans), an insect (Drosophila melanogaster), and a vertebrate (Homo sapiens). Remarkably, their relationships have yet to be clarified. The confusion concerns the enigmatic position of nematodes. Traditionally, nematodes have occupied a basal position, in part because they lack a true body cavity. However, the leading hypothesis now joins nematodes with arthropods in a molting clade, Ecdysozoa, based on data from several genes. Results We tested the Ecdysozoa hypothesis with analyses of more than 100 nuclear protein alignments, under conditions that would expose biases, and found that it was not supported. Instead, we found significant support for the traditional hypothesis, Coelomata. Our result is robust to different rates of sequence change among genes and lineages, different numbers of taxa, and different species of nematodes. Conclusion We conclude that insects (arthropods) are genetically and evolutionarily closer to humans than to nematode worms.


Background
Pyridine-2,6-bis(thiocarboxylic acid) (pdtc) is an ironregulated metabolite produced by certain pseudomonads that is highly reactive towards metals by virtue of a pair of thiocarboxylate groups and ring nitrogen that combine to form a non-specific, high affinity, tri-dentate ligand (Fig-ure 1) [1,2]. The formation constants for Co, Cu, and Fe complexes of pdtc were determined to be greater than 10 33 [3]. Uptake studies using 59 Fe showed that addition of pdtc to the growth medium of two different pdtc-producing pseudomonads increased the efficiency of iron uptake by 20%) [1]. The contribution of pdtc to iron uptake may be due to the fact that the copper complex of pdtc (Cu:pdtc) has the ability to reduce both soluble and mineral forms of ferric iron) [1]. Not surprisingly, in light of its high affinity for essential metals, pdtc also exhibits antimicrobial activity [4].
MoeB, the molybdopterin synthase activating enzyme (MSAE), is part of the well-characterized molybdopterin cofactor (MPT-Mo) synthesis pathway. A dimer of MoeB adenylates the MoaD subunits of molybdopterin synthase that, after the conversion of these AMP moieties to thiocarboxylates by a sulfurtransferase, goes on to convert precursor Z to molybdopterin (MPT) [5,6]. Molybdopterin synthase is a heterotetrameric protein composed of two units each of MoaD and MoaE. The crystal structure of the E. coli MoeB-MoaD complex clearly shows the interaction between these two proteins and unambiguously confirms the adenylation role of MoeB [7]. After a thiocarboxylate moiety is formed on the terminal glycine of each MoaD subunit, the two sulfur atoms are transferred to precursor Z as sulfhydryl groups in a dithiolene configuration. [8][9][10][11]. Molybdenum is added to MPT in subsequent steps to generate MPT-Mo, the active form of the molybdopterin cofactor. Since MPT-Mo is required for nitrate reductase activity [12], the activity of this enzyme can be used as a diagnostic indicator for the presence of the MPT-Mo synthetic pathway [7].
There are a number of genomic sequences deposited in GenBank as MoeB proteins, although only a few have been physiologically or genetically proven to function in MPT-Mo synthesis. Two moeB sequences with evidence for function include a knockout of the open reading frame encoding GenBank accession number BAA35521, which was shown to cause the accumulation of precursor Z in E. coli [13] and accession number Z95150, which was found to be located within a cluster of other MPT-Mo synthesis genes of Mycobacterium tuberculosis [14]. MSAEs from eukaryotic organisms have also been characterized: MOCS3 (GenBank accession number NM_014484) from humans and cnx5 from Arabidopsis thaliana [15], respectively, have been sequenced, and cnxF from Aspergillus nidulans [16] has been both sequenced and physiologically characterized.
MoeZ is one of several MSAE-like conceptually translated protein sequences identified by the M. tuberculosis genome sequencing group at Sanger Center [14]. Although coding for a putative protein with high sequence similarity to MoeB, moeZ was so named because it had no genetic linkage to other MPT-Mo synthesis genes. Cole et al. [14] state in their GenBank entry (accession number Z95120) that MoeZ is possibly involved in the synthesis of molybdopterin, but there has been no biochemical or physiological characterization of this protein. An open reading frame (ORF-F) identified as a necessary part of the Pseudomonas stutzeri pdt locus coding for the synthesis pathway of pdtc ( Figure 1) was found to have 57% identity to M. tuberculosis MoeZ at the amino acid (AA) level. Further analysis revealed that a 521 base pair region of P. stutzeri ORF-F shares 80% homology with M. tuberculosis moeZ [17]. The products of the pdt locus probably function by activating, then sulfurating, a carboxylic acid precursor. The end product of this synthesis is pdtc, a molecule with two thiocarboxylic acid groups that is secreted from the cell.
Recently, data obtained from genome sequencing projects have made it clear that bacteria exchange genetic information by means of transducing bacteriophages, conjugative plasmids, and natural transformation [18] at a much higher rate than previously thought [19]. There seem to be few insurmountable barriers to interspecies gene transfer, with transfers being documented between such diverse organisms as archea and eubacteria [20]. Synechococcus sp. [21] and some P. stutzeri strains [22] are two examples of the many bacterial species known to undergo natural transformation. Of the completely sequenced genomes, all show some level of lateral gene transfer (LGT). For example, 3.3% and 16.6% of the M. tuberculosis and Synechocystis sp. PCC6803 genomes, respectively, are estimated to be composed of sequences acquired by LGT [19].
We report here on the structural, functional, and evolutionary relationships of moeZ and discuss the possibility that the discontinuous distribution of moeZ is the result of a LGT event between unrelated bacteria.

Structure
We assembled a set of sequences similar to the product of Pseudomonas stutzeri pdt locus ORF-F (Ps-moeZ) ( Table 1). Protein BLAST searches using Ps-MoeZ as the query yielded more than 100 sequences with significant similarity (BLAST E value < 0.01). Interestingly, Ps-MoeZ had higher similarity to seven sequences of mycobacterial, streptomycete, and cyanobacterial origin than to the most similar pseudomonad sequence (Table 2). NA BLAST searches showed that Ps-moeZ had higher homology, as measured by length of homologous overlap, to four of these same sequences than to pseudomonad sequences (  (Table 2). To characterize the structure, function, and evolution of Ps-MoeZ, we chose to study these seven genes along with the P. stutzeri sequence. Ec-moeB was included in the study because its function has been extensively      smegmatis with significant NA homology to Ps-moeZ, but these were not included in this study because of their potentially provisional nature, pending completion of the genome sequencing projects. Inclusion of these sequences in the analyses presented here did not change any of our conclusions.
Using each protein sequence listed in Table 1 as a query to search the Conserved Domain Databases, we found that each contained an N-terminal domain with high similarity (BLAST E value < 6 -30 ) to the ThiF family domain (Pfam00899) ( Table 2, Figure 2c), suggesting that this portion of all the proteins in this study performed a similar function. The ThiF protein adenylates ThiS prior to sulfuration of that protein by a separate sulfurtransferase [23,24]. Pfam00899 encompasses the nucleotide binding site of ThiF. All 12 sequences in this region have high similarity to the derived consensus sequence GXGXXG(7X)A(SX)GXGXXG(4X)D(9X)R, which defines a nucleotide binding motif [16,25]. In addition, all sequences have five residues, or conservative substitutions in the case of Pa-MoeB, shown to participate in the adenylation reaction catalyzed by Ec-MoeB ( Figure 3) [7]. The fact that such high similarities are retained in the ThiF coding region across three bacterial divisions suggests that this domain functions more or less autonomously. If the ThiF domain interacts only with dinucleotides and had evolved to a high degree prior to the speciation of the strains in our study, then there would be little selective pressure to fix additional modifications. The high degree of AA similarity that all 12 sequences have to the ThiF family domain probably indicates that these proteins, at least in part, perform a similar function, most likely activating a substrate by adenylation.
All of the predicted amino acid sequences except Ec-MoeB, Mlo-MoeZdR, and Pa-MoeB also have C-terminal similarity (BLAST E value < 1 -12 ) to a rhodanese-like domain (Pfam00581) (Figure 2c, Table 2) [26]. Rhodanese, a thiosulfate:cyanide sulfurtransferase (EC 2.8.1.1), is composed of two domains that are similar in three-dimensional structure but divergent in sequence [27]. All residues contributing to the active site of rhodanese are contained within its C-terminal domain. The cysteine that covalently bonds to the transferred S atom is conserved in all nine of the sequences that contain the rhodanese-like domain included in this study (data not shown). The similarity scores to the rhodanese-like domain, for those sequences that contain it, were lower than the ThiF similarity scores. However, since the rhodanese domain is characterized by conservation in terms of three dimensional structure rather than primary sequence, these lower similarity scores are not surprising.
Examination of the Clustal X-alignment of all the protein sequences revealed significant structural differences (Figure 3). In addition to the ThiF family domain, all sequences except Mlo-MoeZdR also possessed a region similar to the C-terminal half of ThiF. Ec-MoeB, Mt-MoeBR, Pa-      MoeB, Pa-MoeBR, and Sc-MoeBR all contain two pairs of cysteine residues, each arranged in a CXXC motif and spaced 68-70 residues apart, like those in ThiF proteins (arrows in Figures 2c and 3). But the Ps-MoeZ sequence, along with Mlo-MoeZdR, Mle-MoeZ, Mt-MoeZ, Nos-MoeZ, Syc-MoeZ, and Syy-MoeZ, lacks these four cysteine residues. In all these sequences except Mlo-MoeZdR, the first CXXC motif is replaced with NYRD. In Mlo-MoeZdR the first CXXC is substituted with GYRD. All these sequences except Mlo-MoeZdR also have a gap in place of the second CXXC. Mlo-MoeZdR ends prior to the position where the second CXXC would occur. This dual CXXC motif, found in MoeB, ThiF, and HesA proteins, forms a metal center with a zinc atom [7,9]. The presence of all four of these cysteines has been shown to be required for activity in MoeB). [5]. Therefore, the proteins lacking them are unlikely to function as MSAEs. We designated the region containing these two cysteine pairs as 2X CXXC for the purposes of our discussion ( Figure 2C).
MoeBR, MoeZdR, and MoeZ proteins can be further differentiated from MoeBs by the inclusion of a polyproline motif immediately following the first CXXC location (boxed area in Figure 3). This polyproline tract is immediately adjacent to a disordered surface loop found in MoeB proteins thought to be involved in protein-protein interactions [7]. The consensus for this motif (RXLYPEPPPP) closely corresponds to a sequence (RSLYPPPPPP) found in rat and human metabotropic 1 glutamate receptors [28,29]. This motif is reminiscent of protein-protein interaction domains found in some proteins [30]; however, interactions of this type have not been shown to be facilitated by the polyproline motif in rat and human metabotropic 1 glutamate receptors.
An AA substitution found in MoeBRs and MoeZs may alter the substrate specificity of these enzymes. In Ec-MoeB, β sheet 5 (residues 149-155 of Ec-MoeB, Figure 3) forms part of the pocket which accepts the C-terminal portion of MoaD, of which the terminal glycine is subsequently adenylated [7]. Ec-MoeB I155, located at the C-terminal end of this beta loop, is substituted in MoeBR proteins by Y and in MoeZ proteins by F or Y. These bulky substitutions may restrict access of the C-terminus of MoeD-like proteins and may indicate that a different type of substrate is adenylated by MoeBR and MoeZ.
The presence or absence of two features, the two pairs of cysteines in the 2X CXXC domain and the rhodanese-like domain, form the primary basis for differentiation between the four different protein structures among the sequences found in It is included in this study only because of its homology to Ps-MoeZ.) Once these structural definitions were defined, we decided to add Nos-MoeZ and Syy-MoeZ to our data set since they were MoeZs that we had not included in our dataset by reason of homology. We also included Mt-MoeBR because we were interested in the relationships between MoeBR and MoeZ sequences that were coresident in the same genome.
The Conserved Domain Database was screened for additional sequences related to the Ps-MoeZ sequence. This search yielded 19 proteins structurally similar to MoeBR and MoeZ proteins. Five of these were eukaryotic sequences of unknown function; five others were eukaryotic MSAE proteins; and one was a ThiF ORF within an operon containing two other thiamin biosynthesis genes in Erwinia amylovora [31]. The remaining eight sequences were of bacterial origin and were already included in our dataset. A similar search using Pa-MoeB as the query yielded 146 sequences with a domain structure similar to MoeB, most of which were entered into the database as MoeB, ThiF, HesA, or ubiquitin-activating enzymes from archaea, bacteria, and eukaryota. In terms of distribution, the MoeB structure (146 sequences) was much more common than the MoeBR structure (12 sequences), while MoeZ was represented by only six sequences in the GenBank non-redundant (nr) database.
Based on our sequence analysis, some moeZ sequences have been misidentified. Syc-MoeZ, Syy-MoeZ, and Nos-MoeZ are identified as a MoeBs in GenBank accession numbers CAA76284, BAA18438, and NP_486946, respectively. The lack of all essential cysteines in both copies of the CXXC motifs suggest that these MoeZs probably do not function like MSAE proteins.

Conserved NA sequences
When the regions of greater than 75% NA homology between each individual NA sequence and Ps-moeZ were plotted, it became apparent that these homologies clustered in defined regions (Figure 2a). Five sequences had a homologous region within the N-terminal half of the ThiF domain; six sequences had a homologous area in the Cterminal half of the same domain; and four sequences showed homology to areas of the 2X CXXC domain. Five sequences with greater than 200 bp of more than 75% NA acid identity to Ps-moeZ (Mle-moeZ, Mt-moeZ, Pa-moeBR, Sc-moeBR, and Syc-moeZ) along with Ps-moeZ were designated the "high homology group" and used in subsequent analyses of the homologous regions.
Using a codon-based alignment of the NA sequences in the high homology group, we analyzed the occurrence of synonymous and nonsynonymous substitutions over the entire coding region. The plot of the ratio of synonymous/ nonsynonymous substitutions (dS/dN) identified five regions where the synonymous substitution rate exceeds the nonsynonymous substitution rate (dS / dN > 1), suggesting that homology in these areas was driven by the need for AA conservation (Figure 2b). Analysis of a plot of the degree of identical amino acid residues among the high homology group showed that a high degree of AA similarity existed in these same regions (data not shown). Visual inspection of the NA alignment of the high homology group revealed that substitutions in the areas where dS exceeded dN tended to involve the second and third bases of the codons, confirming the previous suggestion (data not shown). There was also a correlation between the regions of homology to Ps-moeZ and the areas where synonymous substitutions exceeded nonsynonymous substitutions (Figure 2a and 2b). These correlations support the supposition that the NA homologies among the sequences are driven by functional constraints of the proteins and not by the need to maintain secondary nucleic acid structure. Analysis of predicted secondary mRNA structure did not reveal any conserved stems or loops in the regions of NA conservation among the high homology group (data not shown).

Function
The functionality of MoeB, MoeBR, and MoeZ are known. MoeB, exemplified by Ec-MoeB, and MoeBR, exemplified by cnxF of Aspergillis nidulans [16], participate in the synthesis of MPT-Mo as MSAEs. This is a housekeeping function for most organisms. MoeZ, on the other hand, has been shown to be necessary for the synthesis of pdtc, an unusual metabolite. However, since CTN1, a mutant of P. stutzeri, missing a 171 kb genomic fragment that includes moeZ [17], reduces nitrate as does the wild-type (data not shown), it is evident that Ps-moeZ is not necessary for MPT-Mo synthesis.
Both moeBR and moeZ coexist in the M. tuberculosis genome, which exhibits nitrate reductase activity, but moeBR is missing from M. leprae, which has no nitrate reductase activity. The fact that these two separate genes are maintained in M. tuberculosis supports the supposition that they perform different functions. On the other hand, the sequencing of the nitrate-reducing cyanobacteria Synechocystis sp. PCC6803 and Nostoc sp. PCC7120 is complete, but no MSAE sequences have been found. Perhaps MoeZ fulfills MSAE function as part of a modified MPT-Mo pathway in these species.
MoeBR incorporates a rhodanese-like domain, but MoeB does not. Examples of both MoeB (E. coli) and MoeBR (CNXF of A. nidulans) have been characterized as MSAEs, indicating that the C-terminal rhodanese-like domain addition does not inhibit MSAE activity. The presence of a rhodanese domain in MoeBR and MoeZ suggests that these members of the family may perform an additional function compared to MoeB. P. aeruginosa has both a moeB and a moeBR, also suggesting that the products of these two genes have different functions or perform the same function in a slightly different manner. However, there is no physiological evidence for the functional significance of the rhodanese domain in MoeBR. Rhodanese extracts a sulfur from sulphane sulphur containing anions (e.g., thiosulfate), forms a temporary covalent linkage between the S atom and a conserved cysteine residue, then transfers the sulfur to a thiophilic acceptor (e.g., cyanide). Whether sulfur transferase functionality is retained in the rhodanese-like domains of MoeBR and MoeZ and whether these proteins can function to activate a substrate as well as transfer a sulfur-containing moiety is unknown.

Evolution
Because of the definitive structural features shared among MoeZs and the homologies among their coding regions,

Figure 4
Maximum likelihood, neighbor joining trees of moeZ ThiF domain and associated SSU sequences used in this study. The incongruent placement of P. stutzeri between the trees is highlighted with a dotted line. Confidence estimates are placed near the branches that they apply to. we decided to investigate the phylogenetic relationships of moeZ sequences. To infer baseline interspecies evolutionary relationships we performed a phylogenetic analysis of the 16S rDNA (SSU) sequences from the seven MoeZ-containing strains included in our study. We obtained sequences for all SSU alleles for each strain from GenBank. Analysis of all the SSU alleles for strains with multiple copies revealed that there was no substantial variation among them. We chose one representative fulllength SSU sequence for each strain for our analyses (Table 1). The tree of SSU sequences shows the two proteobacteria (P. stutzeri and M. loti) at one end of the tree with cyanobacteria (Synechococcus sp. and Synechocystis sp.) and mycobacteria (M. leprae and M. tuberculosis) clades branching off from a common node (Figure 4). This tree topology agrees with those that have been previously published [32].
The moeB/BR/Z sequences under study are a mixture of orthologs and paralogs, and a combined study of their phylogeny is not possible, so we limited our study to moeZ sequences. We constructed the tree from a gapless region of the NA alignment spanning the entire ThiF domain except for the last two bases. The topology of the ThiF domain tree matches the SSU tree except that Ps-moeZ, which was adjacent to M. loti in the SSU tree, appeared between the cyanobacteria and mycobacteria clades ( Figure 4). Parsimony, likelihood, and distance methods all resulted in trees with the same topology. Moreover, the placement of Ps-moeZ in the ThiF tree was not changed when full-length MoeZ AA sequences were analyzed using the same methods.
Additional support for the topology of the ThiF domain tree was sought by conducting a manual analysis of the insertions and deletions (indels) in the alignment of fulllength moeZ sequences. Ps-moeZ contains five indels in the codon-based NA alignment (Table 3). Of these, one is unique to Ps-moeZ; four are shared with members of the mycobacteria clade; and none are shared with members of the cyanobacteria clade or Mlo-moeZdR. Thus the results of the indel analysis supported the position of Ps-moeZ in the ThiF domain tree.
The incongruence between the SSU and the ThiF domain trees suggests that Ps-moeZ may have been acquired by P. stutzeri strain KC by lateral gene transfer (LGT) from an ancestor more closely related to mycobacteria and cyanobacteria than to proteobacteria. The results of the indel analysis also suggest that Ps-moeZ is more closely related to the mycobacteria moeZ sequences than to the cyanobacteria and proteobacteria moeZ sequences. BLAST homology results also support this close relationship to the mycobacteria.
Although they were not included in the analysis presented here, the homologies of Ps-moeZ to sequences in the unfinished microbial genomes of M. avium, M. bovis, and M. smegmatis were also very high and thus provide additional support for the placement of Ps-moeZ in the ThiF tree and suggest that the direction of LGT was from a mycobacterial species to P. stutzeri.
Comparison of the GC/AT percentages and codon bias profiles of Ps-moeZ to data from other known P. stutzeri strain KC coding regions indicates that acquisition of Ps-moeZ was ancient. It is not unreasonable that a pseudomonad coexisting in the soil with cyanobacteria, mesorhizobium, streptomycete, and free-living mycobacteria could acquire moeZ from one of these species. P. stutzeri has been characterized as having an exceptionally plastic genome with high genotypic diversity within the species [33].
LGT events are thought to contribute to this plasticity and, consistent with this, many P. stutzeri strains also exhibit a natural transformation ability [34]. The fact that incorporation of DNA into the genome through homologous recombination requires a region of homology usually limits LGT mediated by natural transformation to closely related species [34,35]. The ThiF domains of the high homology group do provide a region of homology suitable for homologous recombination. This homology is shared across a wide range of species and thus could have contributed to a natural transformation-mediated LGT event in P. stutzeri. This mechanism of acquisition is supported by the fact that no phage genes, transposase coding sequences, IS elements, or repeat sequences were found in the 25.7 kb cosmid clone that contains Ps-moeZ [17].
Three other P. stutzeri isolates, obtained from the American Type Culture Collection, have been shown by Southern analysis to lack pdtc synthesis genes. These same strains plus P. aeruginosa PAO1, P. putida mt2, and P. fluorescens F113 were also shown not to produce pdtc [17]. Pdtc production in bacteria other than P. stutzeri strain KC is limited to P. putida strain To8 and one other unclassified pseudomonad strain [36]. P. stutzeri and P. putida are closely related, both belonging to rRNA homology group I [37], so they may be descendants of the original pseudomonad that acquired moeZ by LGT. The relative rarity of pdtc production among pseudomonads, especially the P. stutzeri strains, is consistent with the hypothesis that Ps-moeZ was acquired through LGT by strain KC. Alternatively, a pseudomonad ancestor could have contained the genes for and produced pdtc, but this function could have been lost from most strains. Loss of this function is plausible if pdtc production is a highly specialized function or is only required for survival in extreme or competitive environments.

Conclusions
It is not known what function MoeZ performs in the mycobacteria and cyanbacteria species that contain it. Indications are that in mycobacteria, MSAE function is performed by MoeBR. But in the fully-sequenced cyanobacteria there are only MoeZs and no MSAE, so in these organisms MSAE function may be fulfilled by MoeZ. In either case, MoeZ is essential to pdtc synthesis in P. stutzeri strain KC, and pdtc is a potent and versatile molecule. Its capabilities of enhancing iron uptake and iron reduction make it a good candidate for a pathogenicity factor. The role of MoeZ in the pathogenic mycobacteria and its impact on virulence of these strains should be investigated.

Materials and Methods
Sequences were obtained from the GenBank non-redundant (nr) sequence database at the National Center for Biotechnology Information. BLAST searches were used to obtain the initial set of sequences similar to P. stutzeri strain KC ORF-F [38]. The program, BLAST 2 Sequences, was used to obtain the degrees of similarity and homology among amino acid (AA) and nucleic acid (NA) sequences [39]. Protein domains, as codified by Pfam [26], were identified by searching the Conserved Domain Database using the Domain Architecture Retrieval Tool provided by GenBank.
Protein alignments were generated using the Clustal X program and checked manually [40]. SSU NA sequences were aligned using Clustal W as implemented by the OMI-GA package and checked manually [41,42]. NA alignments of full-length moeZ sequences for indel analysis were aligned manually based on the Clustal X protein sequence alignment.
Codon bias tables and GC/AT percentages were calculated using the Countcodon program available from the Codon Usage Database [43].
Synonymous and non-synonymous substitution rates were calculated by the Synonymous/non-synonymous Analysis Program (SNAP) provided by Los Alamos National Laboratory [http://hiv-web.lanl.gov/] using the codon-based NA alignment of the high homology group. Substitution rates were averaged over an 18 codon window to smooth the data for graphing.
Phylogenic trees were inferred using PAUP 4.0 [44]. Three data sets were analyzed: those portions of the NA alignment coding for the ThiF domains of the MoeZ conceptual proteins, the alignment of the complete MoeZ sequences, and the alignment of the full length 16S rDNA (SSU) sequences. Maximum likelihood, parsimony, and distance methods were used for analysis. Confidence estimates were determined by bootstrap analysis based on 1000 replicates.
RNA secondary structure of the ThiF domain coding regions of the sequences in the high homology group was