Genome size and arrangement
Both pseudoscorpion mitochondrial genomes are circular and encode the typical set of 37 mitochondrial genes. The mt genome arrangements of both pseudoscorpions differ from the ancestral chelicerate genome arrangement (Figure 1). The mt genome of Pseudogarypus banksi (Feaelloidea; Pseudogarypidae) [GenBank accession number: JQ040544] differs in the location of three of its 13 protein-coding genes (ND4, ND5, Cytb) and the SSU rRNA gene, relative to the ancestral arrangement. Additionally, 9 of the tRNA genes are in different locations (tRNAs specific for Arg, Glu, Val, Leu(CUN), Gln, His, Pro, Ser(UCN), and Phe). Of these tRNA genes, those coding for tRNA-Arg, tRNA-Glu, and tRNA-Gln are on the opposite strand as that in the ancestral condition. We also found a large region (~2650 nts) of repeated sequence located between the tRNA-Leu(CUN) and ND4 genes. Much of this region appears to consist of repeat units similar to the tRNA-Lys gene, although this gene is not located in the adjacent region. This duplicated region is responsible for a substantial increase in genome size compared with other chelicerates, which typically have genomes close to 15 Kb. The largest chelicerate mt genomes have been found in acariform mites in the family Trombiculidae (the largest mt genome is from Leptotrombidium pallidum at 16,779 nt) [34], and the genome of Pseudogarypus approaches this size at 16,546 nts.
In contrast, the size of the mt genome of Paratemnoides (Cheliferoidea; Atemnidae) [GenBank accession number: JQ040543] is 14,368 nts in length -- similar to the sizes of spider and mite mt genomes, but about 2200 nt shorter than that of Pseudogarypus. The mt genome of Paratemnoides is rearranged only with respect to six tRNA genes (tRNAs specific for Glu, Thr, Pro, Ser(UCN), Gln or Met, and Tyr). Of these, the genes coding for tRNA-Pro and tRNA-Tyr are located on the strand opposite of that of the ancestral condition.
None of the gene boundaries or inversions of genes onto the opposite strand that differ from the ancestral arrangement are shared between these two species (see Figure 1). This indicates that translocation of the genes must have occurred independently in these lineages, after their divergence from an ancestral pseudoscorpion.
The typical model to explain how mitochondrial gene rearrangements occur in animals is via tandem duplication of a part of the genome followed by random deletion of genes [35]. However, tandem duplication followed by random deletion cannot explain how genes come to be encoded on the opposite strand of a mitochondrial genome. Shao et al. hypothesized that inter-mitochondrial DNA recombination may explain how genes of chigger mites came to be encoded on the opposite mitochondrial strand [8, 34]. Likewise, excision of a piece of the mitochondrial genome, followed by circularization, breakage of the circle, then recombination back into the original mitochondrial genome has been suggested to explain the current gene arrangement for the harvestman Phalangium opilio [14]. Recombination could also explain the current gene arrangements found in both of the pseudoscorpion mt genomes in the present study. Although recombination has not been thought to be important in the evolution of vertebrate mitochondrial genomes, it appears that it must occur among at least some chelicerate lineages.
Most chelicerate mitochondrial genomes maintain the same arrangement of protein-coding genes [13, 14]. However, all mitochondrial genomes of acariform mites possess rearranged genes, many with rearranged protein-coding genes [21]. The pseudoscorpion Pseudogarypus also possesses rearranged protein-coding genes. It shares this feature with acariform mites, but not with all pseudoscorpions, as Paratemnoides has the same protein-coding gene order as the ancestral chelicerate.
tRNA genes
All mt tRNA genes of Paratemnoides lack sequence to encode a cloverleaf tRNA secondary structure. Instead, 21 of these genes are inferred to encode tRNAs that lack a T-arm (Figure 2a), while the gene coding for tRNA-Ser(AGN) lacks the sequence to encode a D-arm.
In Pseudogarypus, only 3 or 4 of the mt tRNA genes have the potential to form a cloverleaf tRNA. It is possible that the tRNA-Leu(CUN) gene encodes a tRNA with a cloverleaf secondary structure, but if so, there are two mismatches in the acceptor stem, and a weak 2-bp T-arm stem. Alternatively, the tRNA-Leu(CUN) gene encodes a tRNA that lacks a T-arm, but has a perfectly paired acceptor stem. Both alternatives are illustrated in Figure 2b. If tRNA-Leu(CUN) lacks a T-arm, then a total of 13 tRNA genes are inferred to encode tRNAs that lack T-arms. The other 6 mt tRNA genes are inferred to encode tRNAs that lack D-arms, including the gene coding for tRNA-Ser(AGN) (Figure 2b).
Although some arachnid lineages contain species whose mitochondria possess many tRNA genes without sequence to encode a cloverleaf-shaped tRNA, the pseudoscorpions in this study show some of the most widespread losses of tRNA arms yet documented among metazoans. The only other chelicerate lineages to show this extent of loss from their tRNA genes are acariform mites. Some acariform mites have been found to possess mitochondrial genomes in which all 22 of their tRNA genes are inferred to encode tRNAs that lack either a D-arm or a T-arm [10, 21, 36]. Paratemnoides joins the short list of taxa with mt genome-wide losses of canonical tRNA genes. In addition to some chelicerates and secernentean nematodes, gall midge insects have also been found to have severely truncated tRNA genes [37], similar to those of opisthothele spiders. So far, genome-wide losses of T- or D-arm-encoding mitochondrial tRNA sequences have been restricted to the Ecdysozoa.
Within spiders, there is variation among taxa in the identity and number of tRNA genes that lack T-arms. When this variation was traced onto a phylogenetic tree, it was found that once a tRNA gene loses sequence to encode a D- or T-arm, sequence to encode these arms is not regained [5]. We find variation among these two pseudoscorpion lineages in whether the D- or T-arm sequence has been lost from a given tRNA gene. These two pseudoscorpions show shared losses of T-arms from 12 or 13 tRNA genes that are specific for Asp, Glu, Gln, Gly, His, Leu(UUR), Pro, Ser(UCN), Thr, Trp, Tyr, Val, and likely Leu(CUN). However, in Pseudogarypus the tRNA genes specific for Ala, Asn, Cys, Ile, and Phe all lack D-arms. The loss of D-arm sequence from these 5 tRNA genes must have occurred after the divergence of the major pseudoscorpion lineages, because the tRNA homologues in Paratemnoides possess D-arm sequence, but lack T-arm sequence. The tRNA-Arg, tRNA-Lys, tRNA-Met, and perhaps the tRNA-Leu(CUN) genes, code for cloverleaf-shaped tRNAs in Pseudogarypus, therefore we can infer that the ancestral pseudoscorpion also possessed full-length tRNA genes for these tRNAs. In sum, based upon these two genomes from the deeply diverged pseudoscorpion lineages Feaelloidea and Cheliferoidea, we can infer that the common ancestor of extant pseudoscorpions had likely lost sequences to encode canonical secondary structures for multiple different mt tRNAs.
It is not clear why some lineages of chelicerates have lost sequence to encode either the D-arm or T-arm from their mt tRNA genes. In general, chelicerates that lack canonical mt tRNA genes also tend to have rearranged mt genomes. For example, all opisthothele spiders and acariform mites have mt tRNA genes that are rearranged with respect to the ancestral chelicerate, and all possess many mt tRNA genes that lack the ability to encode cloverleaf tRNAs. However, while this trend is also true within pseudoscorpions, it does not explain the variation in tRNA gene reduction. The more rearranged mt genome of Pseudogarypus has fewer non-canonical tRNA genes (19 out of 22), whereas Paratemnoides exhibits less rearrangement, yet all its tRNA genes lack sequence to encode cloverleaf tRNAs.
rRNA genes
The ribosomal RNA genes of both pseudoscorpions are extremely short for chelicerates and other arthropods. Pseudogarypus has a smaller mt LSU rRNA than Paratemnoides (about 986 nts versus 1013 nts). This is one of the shortest mt LSU rRNA lengths reported for any chelicerate; only some acariform mites and some spiders possess equally short LSU rRNA genes. Typically, chelicerates have LSU rRNA genes of 1200 to 1300 nts. The three orders of chelicerates that have historically been inferred as most closely related to pseudoscorpions (based on morphology) are harvestmen, solifuges, and scorpions. The mitochondrial genomes of representative taxa from these lineages have LSU rRNA genes ranging from about 1150 nts (Scorpiones: Buthus occitanus) to 1250 nts (Opiliones: Phalangium opilio; Solifugae: Eremobates palpisetulosus group).
Decreases in size of the LSU rRNA genes could be due to reductions spread throughout the gene, or reductions concentrated in specific areas, such as the ends of the gene. Reductions that occurred throughout the gene would result in decreases of the sizes of many to most of the helices in the secondary and tertiary structures of the rRNA. Reductions concentrated in specific areas could cause losses of entire helices, similar to the inferred losses of helices that have occurred in pseudoscorpion mt tRNAs. We inferred the secondary structures of both pseudoscorpion LSU rRNAs, to determine where within the molecule the reductions had occurred. These structures are depicted in Figures 3A and 3B. To assess the location and extent of reductions, we compared the secondary structures we inferred for the LSU rRNA genes from both pseudoscorpions to that of the opilionid Phalangium opilio [14]. We found that both of the pseudoscorpions have lost helices D14-D15 (also referred to as helix 38 of domain II in the bacterial LSU rRNA), which should be located between helices D13 and D16 (Figures 3A and 3B). This two-part D14-D15 helix is typically present in the arthropod LSU rRNAs secondary structures that have been inferred [38]. Within chelicerates, helices D14-D15 are present in the amblypygid Damon diadema [39] and in the harvestman Phalangium opilio [14]. These helices are missing in some of the acariform mites that have been analyzed (e.g. Panonychus citri [10]), but appear to be present in other Acariformes (e.g. Dermatophagoides pteronyssinus [21] and Leptotrombidium [8]).
The crystal structure of the bacterial ribosome of Thermus thermophilus has been used to infer the contact sites between the RNA core of the ribosome and tRNAs during translation [11]. Based on our comparison of inferred LSU rRNA secondary structures of pseudoscorpions to the structure of the LSU rRNA of T. thermophilus, the nucleotides at the end of helix D15 should make contact with the D and T loops of the A-site tRNA within the ribosome during translation. The D14-D15 helices are lacking in both pseudoscorpion LSU rRNAs, suggesting either that this contact point has been lost and is not essential for protein synthesis, or that this tRNA-ribosome contact site has moved to another location.
The G4 helix of the LSU rRNA is also not present in either pseudoscorpion (Figures 3A and 3B). This helix should be encoded immediately downstream of the G3 helix, but instead, only a short G3 helix is present. The G4 helix (referred to as helix 77 of domain V in the bacterial LSU rRNA) is located at the core of the bacterial ribosome, and makes contact with the D and T loops of the E-site tRNA as the tRNA exits the ribosome during translation [11].
Both pseudoscorpions lack a large region of the 5'end of the LSU rRNA gene, which typically codes for a series of helices, referred to as domain I in the bacterial LSU rRNA, or following the notation in [40], the B region. The secondary structure for this region was difficult to infer accurately, so we have less confidence in the accuracy of our reconstruction. However, it appears that helices B3, B10, and B12 have all been lost relative to those of Phalangium. Nucleotides in the B10 helix (or helix ll, domain I in the bacterial LSU rRNA) base pair with the E-site tRNA as the tRNA exits the ribosome [11].
Two other regions of helices have been lost in one or both pseudoscorpions LSU rRNAs, but all of the other tRNA-rRNA contact sites appear to be present in both pseudoscorpions. Both pseudoscorpions lack the D23 helix, which should be located immediately adjacent to the D22 helix, or it may instead be very reduced compared to that of Phalangium. The LSU rRNA of Pseudogarypus lacks helices E1-E8 that are present in Paratemnoides. This helix is present in harvestmen [14], and in amblypygids [39].
In summary, we find that the reduction in length of the LSU rRNA gene was accompanied by the loss of some helices from the secondary structure of the LSU rRNA. Both pseudoscorpions share the losses of the same 4 helices, at least two of which appear to be RNA-RNA contact points within the ribosome. We infer that the loss of these helices must have occurred after the divergence of the pseudoscorpion ancestor from harvestmen, as Phalangium opilio possesses these helices.
The lengths of the small ribosomal RNA genes are also reduced in pseudoscorpions, although proportionally less than the LSU rRNA genes. The absolute length of these genes in chelicerates can be more difficult to determine than that of the LSU rRNA gene, because the 5' end of the SSU rRNA gene typically abuts the non-coding region (although this is not the case in Pseudogarypus), which is poorly conserved in chelicerates. Additionally, while the 5' end of the SSU rRNA is structurally fairly conserved, its sequence is not conserved. Therefore, only approximations of gene length are possible when we have only DNA sequence data as a guide. Pseudogarypus has a smaller mt SSU rRNA than Paratemnoides (about 687 nts versus 727 nts). These are somewhat shorter lengths than the mt SSU rRNA genes of harvestmen, solifuges, and scorpions. The mt SSU rRNA genes in those taxa range from 725 nts in solifuges (Eremobates palpisetulosus group) to 768 nts in harvestmen (Phalangium opilio) to about 790 nts in scorpions (Uroctonus mordax and Buthus occitanus).
During protein synthesis, the SSU rRNA region of the ribosome makes contact primarily with the anticodon region of the tRNA, while the LSU rRNA has contact sites with the T-arm and with the D-arm of the tRNA (for specific details, see [11]). Therefore, if the tRNAs of pseudoscorpions are structurally coevolving with the ribosome that they interact with, we may predict that we would see length differences primarily in the LSU of the ribosome, but not necessarily in the SSU of the ribosome. Additionally, we would predict that the length differences would primarily be concentrated in the rRNA-tRNA contact sites of the LSU of the ribosome.
It has previously been shown that the substitution rate for the nucleotide sites in the LSU rRNA gene scale with their distance from the center of the ribosome, across all domains of life [40]. This finding corroborates how important the core of the ribosome is for all organisms. The loss of tRNA-ribosome contact sites in pseudoscorpion LSU rRNA suggests that pseudoscorpion mt ribosomes may have evolved some remarkable changes in their structure and interactions with their tRNAs.
We have previously suggested that shortened ribosomes are correlated with shortened tRNA genes in chelicerates [6, 41]. The finding of shortened tRNA and rRNA genes in pseudoscorpions provides an additional example of such a co-occurrence. The co-occurrence of short RNA-encoding genes may be due to pressures to decrease genome size overall, or due to compensatory evolution of the tRNAs with the ribosome with which they interact. If tRNA-ribosome contact sites were lost or mutated in the ribosome, it would have allowed the tRNA genes to accrue mutations at these contact sites. Non-functional sites could be deleted from the tRNA genes via random genetic drift. Alternatively, if there is a replicative advantage to having a small genome, then selection could act to eliminate non-essential regions of the genome.
However, the compact tRNA and LSU rRNA genes of Pseudogarypus are not mirrored by a compact genome. Its mt genome possesses an apparently non-coding region that increases the size of the genome substantially over that of Paratemnoides. It is plausible that this non-coding region is due to a relatively recent insertion, and that selection for small genome size has not occurred long enough for us to see the evolutionary reduction in, or loss of, this insert. Alternatively, it is possible that the insert is more ancient, and that selection has acted to a greater degree to eliminate RNA helices in the ribosome and tRNAs in a concerted manner. Our findings of losses of tRNA-rRNA contact sites in both the tRNAs and the LSU rRNA suggest that natural selection has acted such that these structures have coevolved to maintain function during protein synthesis.
Phylogenetic analyses
The data set used for phylogenetic analyses consisted of 2907 amino acids, with about 73% of the initial alignment of 3975 amino acids retained after Gblocks trimming of the 13 protein-coding alignments. The level of sequence conservation varied among genes. The most conserved genes retained over 90% of their amino acids (CO1, CO3, and Cytb), many genes retained 60-76% of their amino acids (ATP6, CO2, ND1, ND3, ND4, and ND5), and the ATP8, ND2, ND4L, and ND6 genes possessed the lowest levels of sequence conservation (44-54% amino acids retained).
Phylogenetic analyses of all 13 mitochondrial protein-coding gene sequences from a diverse group of chelicerates (see Additional file 1 for a taxonomic summary) yield phylogenetic trees that consistently recover the pseudoscorpions as a monophyletic group (Figure 4). Surprisingly, pseudoscorpions are found as sister group to acariform mites, with 100% bootstrap support. The arachnid order Acari is found to be paraphyletic, with one well-supported clade of Acariformes and one clade of Parasitiformes. This result is in agreement with other recent analyses that have recovered Acari as diphyletic [21, 32, 42]. We evaluated the alternative hypothesis, that Acari is a monophyletic group, but this hypothesis was rejected by the Approximately Unbiased (AU) test (P = 0.0005), with a Ln likelihood score of -182198.24 for the unconstrained tree, and a Ln likelihood score of -182247.47 for the monophyletic Acari topology. The other orders for which we have data from multiple mitochondrial genomes also are recovered as monophyletic: the spiders (Araneae), scorpions (Scorpiones), sea spiders (Pycnognida), and ticks and parasitic mites (Parasitiformes). Hence, except for Acari, our analyses always recover all named orders of arachnids as monophyletic. Our analyses also always recover the subphylum Chelicerata with high support. In contrast, we do not find Arachnida to be monophyletic, due to the placement of the Xiphosurans (horseshoe crabs) and Pycnogonida (sea spiders) amongst the arachnids. We examined the alternative hypothesis of arachnid monophyly, but this hypothesis was rejected by the AU test (P = 0.001; Ln likelihood of -182338.12 for the monophyletic Arachnida tree). We also did not find support for Pseudoscorpiones as the sister lineage to Solifugae, as the AU test strongly rejected this hypothesis (P < < 0.001; Ln likelihood score of -182358.64).
We found that the protein-coding genes of both pseudoscorpions and acariform mites are characterized by high numbers of substitutions, resulting in long branches in phylogenetic analyses. Additionally, we know that variation in amino acid skew is found among chelicerates [43]. Therefore, we undertook a series of analyses to determine whether the placement of these two lineages as sister groups could be caused by an artificial grouping due to saturation of the sequences or due to similarities in amino acid skew.
To determine whether long-branch-attraction due to sequence saturation could be causing Acariformes to group with Pseudoscorpiones, we first eliminated acariform taxa with particularly long branches from our analyses. We found that this had no effect on the placement of pseudoscorpions as sister to acariform mites (trees not shown). In addition to likelihood analyses, we also undertook a series of Bayesian analyses, using the same models of evolution as for likelihood. All Bayesian analyses recovered Acariformes as the sister clade to Pseudoscorpiones with high posterior probabilities.
We also recovered the sister-group relationship of Acariformes with Pseudoscorpiones in all 34 chains we ran with Phylobayes using a CAT site-heterogeneous mixture model. In 18 of these chains, pycnogonids were recovered as the sister group to other chelicerates, whereas in 16 analyses they placed as derived within the arachnids. In no analysis did we recover a monophyletic Arachnida, as horseshoe crabs always were recovered as the sister lineage to solifuges (trees not shown).
Coding mitochondrial nucleotides as either purines or pyrimidines (RY recoding) has been shown to recover deep-level relationships and to increase phylogenetic signal [44, 45]. We implemented RY recoding at 3rd codon positions only and at 1st and 3rd codon positions, and in both cases we recovered Acariformes as the sister clade to Pseudoscorpiones with 92% and 70% bootstrap support, respectively (trees not shown). We found 62% bootstrap support for this grouping when we implemented a variation of RY recoding termed the Neutral Transitions Excluded (NTE) method [46].
It was striking that both Acariformes and Pseudoscorpiones show long branches on the amino acid phylogenetic trees. This indicates that a large number of amino acid substitutions have occurred within these lineages. To reduce the possibility that amino acids are saturated and causing long-branch-attraction artifacts, we recoded amino acids into 6 physiochemical functional groups. We have previously found this method successful at recovering some nodes among chelicerate lineages [43]. The resultant phylogenetic tree (not shown) still recovered Acariformes and Pseudoscorpiones as sister lineages with 100% bootstrap support.
The mitochondrial genome comprises only a portion of an organism's overall genetic makeup, and some researchers advocate using a combination of both organelle and nuclear loci when making phylogenetic inferences (e.g. [47]). When using only a region of an organism's genetic makeup to infer its evolutionary history, several potential problems could exist. The genetic marker may show a pattern of retention of an ancestral polymorphism if it is used to infer recent divergences, and therefore may not accurately reflect species divergence. However, incomplete sorting of ancestral polymorphisms should not be a problem when making phylogenetic inferences among ancient lineages, such as among chelicerates. Genetic markers for inferring ancient divergences may be problematic if the sequences are so diverged that they are saturated, or alternatively, if mutational bias or natural selection has influenced sequence composition. If any of these occur, it may lead to incongruence among different data sets, and to incorrect phylogenetic hypotheses. Although mtDNA has generally been viewed as evolving neutrally, and therefore not likely to cause incorrect phylogenetic inferences due to selection, pronounced convergent selection has been found to influence the evolution of some mitochondrial genomes [48].
Base composition and nucleotide bias
The nucleotide composition of both pseudoscorpion mt genomes shows an overall AT bias. The mt genome composition of the major coding strand of Pseudogarypus possesses an AT frequency of 76.9%, while that of Paratemnoides is 73.8%. Both mt genomes also show CG nucleotide skews on their major coding strands. The overall CG skews for each of the major coding strands of the mitochondrial genomes are 0.23 for Paratemnoides and 0.28 for Pseudogarypus. This is almost identical to the CG skews for all 13 of the protein-coding genes, which showed a CG skew of 0.23 in Paratemnoides, and 0.27 for Pseudogarypus. The CG and AT skews of each of the protein-coding genes, linearly arranged along the lengths of their genomes, is graphically depicted in Figure 5. Third-codon positions tend to show a greater skew than either first or second positions (Figure 5), with an average CG skew of 0.46 for Paratemnoides and 0.55 for Pseudogarypus. However, there is gene-by-gene variation in CG skew for each codon position (Figure 5 and Additional file 2).
Distribution and evolution of nucleotide bias among chelicerates
A CA bias on the major strand of the mt genome is typical of many chelicerates, and of arthropods in general. The amount of variation in nucleotide skew that has been found within Chelicerata is among the greatest in any arthropod lineages. Some distantly related orders of arachnids -- such as spiders, scorpions, and some acariform mites -- possess mt genomes with a pronounced GT bias e.g. [21, 43, 49]. However, many chelicerate lineages -- including xiphosurans, amblypygids, vinegaroons, and camel spiders -- possess a CA nucleotide bias on their major coding strand [43]. Within the chelicerate group Pycnogonida (sea spiders), there exists wide variation in mt genome nucleotide skew, ranging from a pronounced GT bias to a CA bias [19].
To better understand the distribution of nucleotide skew, we plotted its distribution onto a phylogenetic tree. To more fully comprehend the nuances in nucleotide skew among different protein-coding genes, we examined skew separately for each of the 13 genes, and at each of the three codon positions. Because this is an immense amount of information, we present a visual overview of the skew distribution for each codon position and for each gene in Figure 6. Because gene order varies among the different chelicerate taxa, the genes are arranged in alphabetical order, to allow gene-by-gene comparisons between taxa. Additional file 2 provides the data that was used to create these graphics, and the protein-coding gene arrangement found in these taxa.
We find that skew varies dramatically among chelicerate taxa. Some taxa show a clear CA bias (colored blue in Figure 6) for all of their protein-coding genes located on the major strand. These include the taxa Walchia and Pseudocellus from the divergent Acariformes and Ricinulei lineages. Other taxa have a strong GT bias (colored red in Figure 6) on their major coding strand, including all scorpions and all true (opisthothele) spiders. Most taxa, including both pseudoscorpions, exhibit a CA bias at their 1st and 3rd codon positions, but a CT bias at 2nd positions. Some taxa show little bias in their nucleotide composition at 3rd codon positions (branches colored purple in Figure 6).
Among many chelicerate lineages, there appears to be a seemingly random phylogenetic distribution of nucleotide skew. The Pycnogonida display everything ranging from a strong CA bias to a marked GT bias to an almost non-biased nucleotide distribution. This degree of variation exists even within a single genus, with one species of Nymphon possessing a strong CA bias and another a strong GT bias. Acariformes also show a range of skew values, from CA-biased to GT-biased to very little nucleotide bias.
In contrast, skew seems to be conserved in some chelicerate lineages. A GT bias apparently arose after the divergence of true spiders from their common ancestor with mesothele spiders, and the opisthothele lineage has remained GT-biased since then. Likewise, all scorpions sequenced so far exhibit a GT bias, suggesting that this trait was inherited from the common ancestor of scorpions, and retained throughout scorpion's evolutionary history.
These patterns of skew distribution do not seem to coincide with gene rearrangements (see Additional file 2). It has been suggested that inversions of the origin of replication could have caused changes in mutational biases on the major strand from CA to GT in multiple arthropods [46, 50]. If this is the case, then it may be expected that genomes prone to rearrangements are also the ones that tend to have major coding strands that are no longer CA-biased. Additionally, if changes in nucleotide skew on the major strand are due to inversion of the control region or origin of replication onto the minor strand, then recombination must have occurred. Given the high degree of variation in nucleotide skew in chelicerates, this would suggest that recombination has occurred fairly frequently within chelicerate mitochondrial genomes. It is not known with certainty where the origin of replication is located within chelicerate mitochondrial genomes, or even whether there are multiple origins. The origin of replication was determined for multiple species of insects, and found to differ in location between holometabolous and hemimetabolous insects [51]. Additional studies such as these are needed in chelicerates in order to understand how the origin of replication influences the nucleotide biases of the mt genome.
It has previously been found that bacteria show a mutational bias toward AT [52]. It is argued that variation in AT nucleotide content found among bacterial genomes is best explained either by selection acting on the probability of fixation of mutations, or by selection favoring biased gene conversion [52, 53]. Mitochondria are derived from bacteria, so it may not be surprising that most mitochondrial genomes show a similar AT bias. Likewise, the variation in nucleotide content found among chelicerate mitochondrial genomes on the major coding strand either must be due to natural selection affecting which mutations becomes substituted, or must reflect neutral substitutions governed by mutational bias. Previous reports have assumed that the variation in nucleotide skew found among chelicerates is due to changes in the mutational bias within the mitochondrial genome [46]. However, it is also plausible that selection has helped to shape nucleotide use in genomes that do not have a CA skew. A phylogenetic pattern of seemingly random distribution of a trait is consistent with a pattern selection may leave if it has acted upon that trait. The evolutionary reversal of a CA bias to a GT bias found in some lineages of chelicerates may be due to selection, rather than to neutral processes, although it is not clear what the selective force may be.
Phylogenetic implications
It is somewhat surprising that all our phylogenetic analyses of mt protein-coding genes are in agreement that pseudoscorpions and acariform mites are sister lineages. Previous analyses using nuclear-encoded genes have found either conflicting placements of pseudoscorpions depending upon the method of analysis, or less than 50% bootstrap support for the placement of pseudoscorpions [32, 33]. While a plethora of different phylogenetic hypotheses for the relationships of pseudoscorpions have been proposed by previous workers, morphological hypotheses have tended to place pseudoscorpions as most closely allied to solifuges, scorpions, and harvestmen. No phylogenetic hypothesis proposed thus far has had a high level of assessable, statistical support. However, a quandary exists in how to interpret our phylogenetic data, as our results would require independent origins of some morphological traits that may be expected not exhibit homoplasy.
The structure of sperm and male genitalia suggests that Pseudoscorpiones are not closely related to Solifugae [54], but instead, the similarities of sperm structures in Solifugae suggest that they are the sister group to Acariformes [55]. Recent analyses of nuclear rRNA sequences also support a sister-group relationship of Solifugae with Acariformes [32]). These authors also discuss several morphological synapomorphies that Solifugae shares with Acariformes.
We do not recover such a grouping with our mitochondrial data, and instead repeatedly find strong support for an Acariformes + Pseudoscorpiones clade, as well as a distantly related Xiphosura + Solifugae clade. In light of the well-supported relationship between pseudoscorpions and Acariformes, we are led to ask whether any morphological characters might also support this surprising relationship. Both pseudoscorpions and mites possess trachea, but it is known that other chelicerate lineages, such as Opiliones, also possess trachea. In fact, there is some historical precedent for grouping the tracheate arachnids, but this grouping has been largely dismissed by arachnologists because trachea are known to have arisen independently within some arthropod lineages. Dunlop and Alberti [56] summarized the similarities of the parasitiform and acariform lineages with other arachnids. They describe two features that are shared among these lineages and pseudoscorpions. These are the presence of a ventrally moving finger on the chelicera, and the presence of an anterior projection termed the epistomal plate, which possesses flap-like lips. However, these features are also shared with Solifugae, a group that we recover as most closely related to horseshoe crabs.
There are features at the molecular level that are shared by Acariformes and Pseudoscorpiones. Intriguingly, the mt genomes of acariform mites also exhibit widespread loss of arms from their tRNA genes. In fact, no acariform mites or pseudoscorpions so far sequenced possess a complete set of "typical" tRNA genes, but instead some to all of their tRNA genes do not encode either D-arm or T-arm sequences. As discussed previously, these taxa vary as to which tRNAs are missing an arm, and whether it is a T-arm or D-arm. Therefore, it is not likely that acariform mites and pseudoscorpions share the loss of a specific tRNA helix as a synapomorphy. Instead, it is more likely that they share the propensity to lose helices from their tRNA genes.
Both pseudoscorpions and acariforms have extremely short ribosomal RNA genes. We have shown that losses of helices of specific regions of the LSU rRNA are shared by both pseudoscorpions, and therefore these losses likely pre-date the loss of sequence to encode cloverleaf shaped tRNAs. Further analyses of additional pseudoscorpion and acariform mite rRNA secondary structures will allow insight into whether these lineages share the loss of helices that may be involved in tRNA D- or T-loop contact with the ribosome.
Acariform mites and pseudoscorpions also exhibit extensive gene rearrangements. However, these taxa do not share the same gene arrangement, and even the two pseudoscorpions differ in their genome organization. To date, the only other chelicerates that have been found to have genome rearrangements of protein-coding genes are some Parasitiformes, and pycnogonids in the family Nymphonidae. Intriguingly, these orders were recovered as a monophyletic group in our phylogenetic analyses (Figures 4 and 6).
Rate of molecular evolution
The pseudoscorpion amino acid sequences are on long branches on our phylogenetic trees. Because branch length is proportional to the number of substitutions, it indicates that many substitutions separate Pseudogarypus from Paratemnoides, and from their most recent common ancestor with other chelicerates. These branches are longer than those found among any other chelicerate lineage, except for Acariformes (see Figure 4). This indicates either that pseudoscorpions and Acariformes are more ancient lineages than other chelicerate lineages, or that they have elevated rates of sequence evolution. These two alternate scenarios are difficult to tease apart.
Fossil pseudoscorpions dated to about 380 million years old have been found in Devonian deposits [1]. Even older acariform mite, opilionid, and scorpion fossils have been found, and are dated to 410-428 mya (reviewed by Dunlop and Selden [57]). Therefore, fossil data alone does not allow us to infer that pseudoscorpions have accumulated more mutations than scorpions and spiders simply because they are older. The lack of an older pseudoscorpion fossil does not necessarily indicate that the lineage is not older, as it is possible that older fossils remain to be discovered.
Pseudoscorpion mt genomes show some gene rearrangements, including rearranged protein-coding genes in Pseudogarypus. Some studies have linked the degree of gene rearrangements with the rate of molecular evolution [58]. The degree of mitochondrial protein-coding gene rearrangements and the rate of evolution may indeed be somewhat interrelated in chelicerates. The acariform mites, pycnogonids in the family Nymphonidae, and pseudoscorpions all have protein-coding genes that are rearranged from the ancestral condition, and all show many amino acid substitutions, i.e. long branches, on the phylogenetic trees we reconstructed, consistent with elevated rates of substitution.