Ferns have generally been neglected in studies of chloroplast genomics. Before this study, only one polypod and two basal ferns had their complete chloroplast (cp) genome reported. Tree ferns represent an ancient fern lineage that first occurred in the Late Triassic. In recent phylogenetic analyses, tree ferns were shown to be the sister group of polypods, the most diverse group of living ferns. Availability of cp genome sequence from a tree fern will facilitate interpretation of the evolutionary changes of fern cp genomes. Here we have sequenced the complete cp genome of a scaly tree fern Alsophila spinulosa (Cyatheaceae).
The Alsophila cp genome is 156,661 base pairs (bp) in size, and has a typical quadripartite structure with the large (LSC, 86,308 bp) and small single copy (SSC, 21,623 bp) regions separated by two copies of an inverted repeat (IRs, 24,365 bp each). This genome contains 117 different genes encoding 85 proteins, 4 rRNAs and 28 tRNAs. Pseudogenes of ycf66 and trnT-UGU are also detected in this genome. A unique trnR-UCG gene (derived from trnR-CCG) is found between rbcL and accD. The Alsophila cp genome shares some unusual characteristics with the previously sequenced cp genome of the polypod fern Adiantum capillus-veneris, including the absence of 5 tRNA genes that exist in most other cp genomes. The genome shows a high degree of synteny with that of Adiantum, but differs considerably from two basal ferns (Angiopteris evecta and Psilotum nudum). At one endpoint of an ancient inversion we detected a highly repeated 565-bp-region that is absent from the Adiantum cp genome. An additional minor inversion of the trnD-GUC, which is possibly shared by all ferns, was identified by comparison between the fern and other land plant cp genomes.
By comparing four fern cp genome sequences it was confirmed that two major rearrangements distinguish higher leptosporangiate ferns from basal fern lineages. The Alsophila cp genome is very similar to that of the polypod fern Adiantum in terms of gene content, gene order and GC content. However, there exist some striking differences between them: the trnR-UCG gene represents a putative molecular apomorphy of tree ferns; and the repeats observed at one inversion endpoint may be a vestige of some unknown rearrangement(s). This work provided fresh insights into the fern cp genome evolution as well as useful data for future phylogenetic studies.
The chloroplast (cp) genome has long been a focus of research in plant molecular evolution and systematics due to its small size, high copy number, conservation and extensive characterization at the molecular level . More recently, with technical advances in DNA sequencing, the number of completely sequenced cp genomes has grown rapidly. Aside from providing information on genome structure, gene content, gene order and nucleotide composition, complete cp genome sequences also offer a unique opportunity to explore the evolutionary changes of the genome itself. In general, cp genomes are structurally highly conserved across land plants. However, structural rearrangements, e.g. gene loss, inverted repeat (IR) loss or expansion and inversion, do occur in certain lineages and have been shown to be extremely informative in resolving deep phylogenetic relationships because they may exhibit less homoplasy than sequence data . For example, a 30-kb inversion shared by all vascular plants except lycopsids identifies the lycopsids as the basal lineage in the vascular plants . Two inversions and an IR expansion can be used to clarify basal nodes in the leptosporangiate ferns [3, 4].
Currently, one limiting factor in comparative chloroplast genomics is the sparse taxon sampling in spore-bearing land plants. The representation of genome sequencing almost always favors plants of economic interest . Complete cp genomes have been sequenced for more than one hundred seed plants. Amongst these, more than 10 completed sequences each are from cereals (13), crucifers (12) and conifers (12) respectively (see Additional file 1). But for other land plants, excluding seed plants, only 10 cp genome sequences have been achieved in total, of which only 3 are from ferns prior this study (see Additional file 1). For further insights into the evolutionary dynamics of cp genome organization, more data from plant species representative of other crucial evolutionary nodes is needed .
Ferns (monilophytes), with more than 10,000 living species, are the most diverse group of seed-free vascular plants [6, 7]. Previous studies have uncovered considerable genomic rearrangements in fern cp genomes, but the details and exact series of these events have not yet been fully characterized [3, 4, 8]. The completed cp genome sequence of the polypod fern Adiantum capillus-veneris shows some unusual features not seen in vascular plants before, including tRNA gene losses, which had only been observed in cp genomes of non-photosynthetic plants [9, 10]. For example, a putative tRNA-selenocysteine (tRNA-Sec) gene in Adiantum  replaces the typical trnR-CCG gene. Unfortunately, because Adiantum is the only sequenced representative of leptosporangiates, the most diverse fern lineage, it is difficult to tell which characteristics are unique to Adiantum or diagnostic of a much larger clade. Therefore, complete cp genome data from more fern clades are necessary to better resolve these issues.
As part of an effort to shed more light on the cp genome evolution in ferns, we have sequenced the complete plastid genome of a scaly tree fern Alsophila spinulosa (ab. Alsophila) (Cyatheaceae). This taxon was chosen because it is an easily available representative of an ancient lineage – tree ferns, for which no cp genome has been sequenced before. In addition to tree ferns, heterosporous and polypod ferns are the other two main lineages within the "core leptosporangiates" . The three major lineages of "core leptosporangiates" were thought to have originated from a Late Triassic diversification . Recent phylogenetic studies further demonstrated a sister relationship between tree ferns and polypods [6, 11, 12]. After the Late Triassic diversification, polypods remarkably re-diversified along with angiosperms in the Cretaceous [6, 11]. Similarly, the scaly tree ferns (Cyatheaceae) also radiated very recently and diversified at an exceptionally high rate . A comparison of the complete cp genome sequences between Alsophila and the polypod fern Adiantum will aid interpretation of unusual characters observed in Adiantum, such as some missing and novel genes [9, 10].
Moreover, sequences of all four published fern cp genomes (including that of Alsophila) will enable more detailed comparisons of the organization and evolution of the chloroplast genomes in ferns. Our comparative analyses corroborate that fern cp genomes have undergone substantial changes in gene orders during evolution: two main rearrangements contribute to major differences between "higher" and basal ferns. In addition, the comparisons also identify some unique characteristics in the Alsophila cp genome including a novel tRNA, interesting pseudogenes and a highly repeated 565-bp-region spanning one endpoint of an ancient inversion.
Results and Discussion
The chloroplast (cp) genome of Alsophila spinulosa [GenBank: FJ556581] is 156,661 base pairs (bp) with a large single copy (LSC) region of 86,308 bp separated from a 21,623-bp small single copy (SSC) region by two inverted repeats (IRs), each of 24,365 bp (Figure 1). The genome is the largest amongst the four sequenced fern cp genomes (Table 1), but is smaller than previous estimates of other Cyatheaceae species, e.g. Alsophila bryophila (165 kb), Cyathea furfuracea (179.2 kb) and Sphaeropteris cooperi (164.3 kb), using the mapping method . When the IR is considered only once, the Alsophila cp genome contains 117 genes, encoding 85 proteins, 4 rRNAs and 28 tRNAs (Table 1). Pseudogenes of ycf66 and trnT-UGU were also detected in this genome (Figure 1). More than half of the Alsophila cp genome is composed of coding regions (92,691 bp, 59.17%) with the protein-coding regions accounting for the major portion (81,111 bp, 51.77%) followed by rRNA genes (9,086 bp, 5.80%) and tRNA genes (2,494 bp, 1.59%) (counting both IRs).
The Alsophila cp genome has an overall GC content of 40.43%, which is lower only than Adiantum capillus-veneris amongst the four sequenced fern cp genomes (Table 1) and is the fourth highest amongst sequenced land plant cp genomes (see Additional file 1). Like other land plants [15, 16], GC content is unevenly distributed across the Alsophila cp genome by location, functional group and codon position. The GC content in rRNA genes (55.18%) and tRNA genes (54.55%) is much higher than in protein coding regions (40.87%). The GC percentage in IRs is the highest (Table 1), reflecting the high GC content of rRNA genes. Amongst the protein genes, photosynthetic genes possess the highest GC content (43.85%), followed by genetic system genes (40.80%), whilst NADH genes have the least (39.54%). The GC content also varies by codon position with the first (47.75%) > second (40.94%) > third (33.91%) position in turn.
The start codons of 85 protein genes were inferred by comparisons with previously annotated land plant cp genomes. Sixty-three of these genes start with AUG, 20 with ACG and 2 with GUG (psbC and rps12). An ACG codon may be restored to a canonical start codon (AUG) by RNA editing, whereas a GUG initiation codon has been reported in other cp genomes [17, 18]. Inferring translation start positions based only on genome sequences is merely hypothetical . Future determination of sequences from complementary DNA (cDNA) and/or proteins will help to substantiate the putative translation start positions as well as RNA editing sites.
There are in total 27,046 codons in all protein coding regions (including coding regions in both IRs) (Table 2), representing the total coding capacity of the Alsophila cp genome; of these, 2771 (10.25%) are for leucine, 2365 (8.74%) for serine, 2154 (7.96%) for isoleucine, and 1847 (6.83%) for glycine. One third of the total codons are represented by these four amino acids. The codon usage of the Alsophila cp genome reflects an apparent AT bias. Most codons end in A or U (66.13%). As shown in figure 2, both codon numbers and RSCU (Relative Synonymous Codon Usage) values are negatively correlated with codon GC content (represented by the number of G+C in a given codon). It appears that nucleotide composition bias has a significant influence on codon usage.
The Alsophila cp genome shares three key inversions with other ferns relative to bryophytes (Figure 3): 1) a 30-kb inversion at the beginning of LSC (close to IRA) ; 2) an approximately 3 kb inversion involving trnT, psbD, psbC, trnS, psbZ and trnG [8, 10, 19]; and 3) a minor inversion containing a single gene trnD-GUC. The first of these inversions is also shared by all vascular plants except lycophytes, whereas the latter two are restricted to ferns.
To our knowledge, the trnD-GUC inversion has not been previously documented. Three conserved and consecutive tRNA genes, trnD-GUC, trnY-GUA and trnE-UUC, have been identified in all land plant cp genomes. Excluding ferns, the three genes have the same directions of transcription. However, in ferns trnD is inverted relative to trnY and trnE (Figure 3). The simplest interpretation of this change is a single minor inversion involving only trnD. Based on current data, it remains unknown whether the 3-kb inversion or the trnD inversion occurred first in ferns.
Overall, the Alsophila cp genome shows a high degree of synteny with the previously sequenced cp genome of Adiantum (Figure 4A). In contrast, there exist striking differences between Alsophila and Angiopteris (Figure 4B) as well as between Alsophila and Psilotum (Figure 4C). A set of complex rearrangements in the IRs, involving a rare duplication of psbA gene, was found in "higher" ferns using physical mapping [3, 4]. The IR gene orders of "higher" ferns, such as Adiantum, Cyathea and Polystichum, are highly rearranged in comparison to that of basal leptosporangiate Osmunda [3, 4, 20]. Complete cp genome data from Angiopteris, Psilotum, Adiantum and Alsophila detail these rearrangements. The IR gene order in Alsophila appears to be the same as that in Adiantum, while Angiopteris and Psilotum have the Osmunda gene order. To explain the complex rearrangements, a "two inversions" hypothesis was proposed . Figure 5 illustrates the great gene order changes within these rearrangements and the updated version of the "two inversions" model incorporating gene order data from the Alsophila and Angiopteris cp genomes. Recently, Wolf and Roper  indicated that the two major inversions did occur in turn and the second inversion (Figure 5, Inversion II) took place on the branch leading to the common ancestor of the heterosporous fern clade and its sister group. Thus, it seems reasonable to hypothesize that the Adiantum gene order represents a common feature of the three lineages within core leptosporangiates (including heterosporous ferns, tree ferns and polypod ferns).
Interestingly, in the Adiantum cp genome, an intron-containing trnT-UGU was identified between trnR-ACG and ndhB (Table 3) . The Alsophila cp genome possesses no intact intron-containing trnT. However, two fragments that are similar to the two exons within the Adiantum trnT were annotated as a ΨtrnT-UGU in this study (Table 3). This new trnT or ΨtrnT is just at one endpoint of the Inversion II (Figure 5). Therefore, the generation of intron-containing trnT-UGU may be associated with the IR rearrangements.
Alsophila and Adiantum share another rearranged region between rpoB and psbZ in LSC relative to Angiopteris and Psilotum (Figure 3). For the latter two, gene order in this region is "rpoB-trnC-petN-psbM-trnD-trnY-trnE-trnG-psbZ", whereas in Alsophila and Adiantum it is "rpoB-trnD-trnY-trnE-psbM-petN-trnC-trnG-psbZ" (Genes with boldface are unchanged) (Figure 3). Roper et al.  noted that this gene order change is not caused by a single inversion. Two alternative pathways may account for this rearrangement (Figure 6), but more data are needed to determine the order of the two inversions.
A total of 117 different genes are present in the Alsophila cp genome (Table 1). This gene content is similar to that of most land plants . However, there are some interesting differences amongst the four sequenced fern cp genomes (Table 3). The Alsophila cp genome possesses the least number of tRNA genes due to 5 missing tRNA genes in comparison to basal ferns (Psilotum nudum and Angiopteris evecta). Its protein gene number is equal to that of Angiopteris, but higher than that of both Adiantum and Psilotum. Details of these differences are discussed below.
Novel tRNA gene
A unique trnR-UCG gene, encoding tRNA-Arg, is found between rbcL and accD in the Alsophila cp genome (Figure 1; Table 3). Another type of tRNA-Arg gene trnR-CCG resides in the same locus in non-flowering land plants including Angiopteris  and Psilotum . In Adiantum, an apparent tRNA gene is annotated as trnSeC . It is uncertain whether the occurrence of trnR-UCG in the Alsophila cp genome represents a unique feature for this species or is an apomorphy for a larger clade such as Cyatheaceae or tree ferns. To address this question, we collected all fern rbcL-accD intergenic sequences deposited in GenBank and examined the tRNA genes within them using ARAGORN . The results indicate that trnR-UCG is restricted to tree ferns, whereas trnR-CCG is widespread in non-core leptosporangiates and basal ferns (Table 4). However, neither trnR-UCG nor the trnR-CCG gene is identified at this locus in polypod ferns. Therefore, the existence of trnR-UCG may reflect a putative molecular apomorphy of tree ferns.
Sequence alignment indicates that trnR-UCG and trnR-CCGs have quite similar primary sequences with 44 of 74 nucleotides unchanged across 7 representative land plants (Figure 7A). In addition, the Adiantum trnSeC shares 51, 41 and 40 identical nucleotides with the Alsophila trnR-UCG, the Psilotum trnR-CCG and the Angiopteris trnR-CCG respectively (Figure 7A). Due to their similarities and conserved loci, we propose that Alsophila trnR-UCG, Adiantum trnSeC as well as trnR-CCGs in other land plants are orthologous. Tree fern trnR-UCG can transfer arginine even though its anticodon alters from CCG to UCG. However, Adiantum trnSeC has undergone major changes: 1) its anticodon is UCA (unmatchable for an Arg codon), and 2) it contains up to 18 nucleotide differences relative to all other land plant trnR genes (Figure 7A). Our findings imply that the trnR-UCG is derived from the trnR-CCG by the alteration of one anticodon base; then the Adiantum trnSeC evolves from the trnR-UCG by altering one anticodon base further, becoming a trnR-UCG pseudogene (Figure 7B). If this is the case, the Adiantum trnSeC should be annotated as ΨtrnR. Sugiura and Sugita  argued that the trnR-CCG is not essential for plastid function although it is conserved in non-flowering plants. The evolutionary scenario of trnR-CCG in ferns (Figure 7B) tends to support this view.
At the locus between rps4 and ndhJ, the Alsophila and Adiantum cp genomes encode a trnL-CAA (tRNA-Leu) rather than a trnL-UAA gene (Table 3). However, they lose another trnL-CAA gene (Table 3), which is found at the 3' downstream of ndhB in almost all other land plant plastid genomes. Consequently, Alsophila and Adiantum only possess the trnL-CAA, whereas the Angiopteris and Psilotum cp genomes contain both the trnL-UAA and the trnL-CAA. In the Adiantum chloroplast, the missing trnL-UAA could be provided for the heavily used UUA codon by a partial C-to-U edit in the trnL-CAA anticodon . Since the UUA is also a preferred leucine codon for the Alsophila cp genome (RSCU = 1.70), the same editing event might occur in the Alsophila chloroplast as well.
Missing tRNA gene
Only 28 tRNA genes are encoded in the Alsophila cp genome, whereas 29, 32 and 33 are annotated in Adiantum, Angiopteris and Psilotum, respectively (Table 1). For cp genomes, it is believed that a set of 30 tRNA species is sufficient for the translation of chloroplast mRNAs . In the Angiopteris and Psilotum chloroplasts, tRNAs can read all codons by using two-out-of-three and wobble mechanisms . However, in Alsophila and Adiantum chloroplasts, both lysine codons lack a corresponding tRNA-Lys (encoded by trnK) (Table 2; Table 3). The loss of trnK suggests cytosolic tRNAs may be imported into chloroplasts, despite a lack of experimental evidence . As an incidental consequence of the trnK loss, the matk open reading frame (ORF) is not nested in the trnK intron (Figure 1).
Apart from the trnK and the trnL-CAA, the Alsophila cp genome also shares other 3 tRNA gene losses, including the trnS-CGA, the trnV-GAC and the trnT-UGU (intron-free), with Adiantum relative to basal ferns Angiopteris and Psilotum (Table 3). The shared absence of tRNA genes between Alsophila and Adiantum suggests that they may derive from a common ancestor.
The Alsophila cp genome contains a psaM gene encoding photosystem I reaction center subunit M. This gene has been detected in Psilotum  and Angiopteris , but not in Adiantum  (Table 3). Besides ferns, psaM also exists in bryophytes, lycophytes and gymnosperms, but not in angiosperms, implying its independent loss in ferns and angiosperms . Alsophila and Adiantum represent tree ferns and polypods, respectively. Due to their sister relationship, we speculate that the loss of psaM in ferns occurred after the split of polypods and tree ferns.
A putative pseudogene of ycf66 is identified in the Alsophila cp genome (Figure 1; Table 3). The 5' ends of its two exons are both destroyed. In the four sequenced fern cp genomes, only Angiopteris contains an intact ycf66 gene . For other land plants, this gene only occurs in Marchantia polymorpha (liverworts), Physcomitrella patens subsp.patens (mosses),Syntrichia ruralis (mosses) and Huperzia lucidula (lycophytes). The findings suggest that ycf66 is lost independently in multiple clades of land plants including hornworts, ferns and seed plants.
Inversion Endpoint as Hotspot for Repeats
A total of 133 pairs of repeats (≥ 30 bp) were identified in the Alsophila cp genome by using REPuter , of which 106 are direct and 27 are inverted repeats. This number of repeats is less than are found in some highly rearranged cp genomes (e.g. Trachelium caeruleum) but more than are present in unrearranged ones (e.g. Nicotiana tabacum) [29, 30]. Up to 66 direct repeats (no inverted repeat) are restricted to a region spanning only 565 bp (153,682–154,246 bp in IRA or 88,724–89,288 bp in IRB) between trnR-ACG and ψtrnT-UGU in the IRs (Figure 1). The GC content of this 565-bp-region (35.93%) is lower than that of IRs and the overall GC content of the whole genome. Detailed sequence analyses revealed that this region is composed of tandem iterations of 11 similar segments ranging from 40 to 58 bp (Figure 8). The core repeated motif is AAAATCCTAGTAGTTAgaGCTTTATCcaGGGtaTaGgACT (the lowercase letters denote variable bases) with variant lengths of heads and/or tails (Figure 8).
In contrast to Alsophila, dispersed repeats (≥ 30 bp) are rare in the Adiantum cp genome, with only 5 short inverted repeats (30–36 bp); and none of these occurs between the trnR-ACG and the trnT-UGU. In the Alsophila cp genome, the length of the intergenic region between trnR-ACG and ψtrnT-UGU is 1467 bp, whereas in Adiantum it is 913 bp, the difference being 554 bp. We noted that this length is very similar to that of the highly repeated 565-bp-region, and speculate that the difference is caused by the presence of the highly repeated region. To test this hypothesis, we extracted the sequence from trnR-ACG to ψtrnT-UGU in Alsophila and from trnR-ACG to trnT-UGU in Adiantum. The sequence alignment indicates that the highly repeated 565-bp-region is indeed lost in the Adiantum cp genome (Figure 9).
In the Alsophila cp genome, the location of the highly repeated 565-bp-region is exactly at the endpoint of the second inversion of the IR rearrangements (Figure 5, Inversion II). Dispersed repeated sequences have been reported from several cp genomes. These are associated with numerous DNA rearrangements, particularly inversions [31–33]. In extensively rearranged cp genomes, the endpoints of rearranged gene clusters are usually flanked by repeated sequences [29, 30, 34]. If repeat-mediated recombination is the major mechanism generating inversions in cp genomes [35, 36], the preservation of repeats would destabilize the genome structure. After inversions, the repeats should be deleted to guarantee genome stability (like the situation in Adiantum). The repeats observed at the endpoint of the ancient inversion (Figure 5, Inversion II) may be a vestige of recent rearrangement(s) that are undiscovered. The existence of these repeats implies that the region is a potential hotspot for genomic reconfiguration.
In this study, we present the first complete cp genome sequence from a tree fern and provide a comprehensive comparative analysis of cp genomes in ferns. The cp genome size of Alsophila is larger than that of Adiantum, Psilotum and Angiopteris. Besides 117 genes, two pseudogenes Ψycf66 and ΨtrnT-UGU are also detected in the Alsophila cp genome. An intact ycf66 is identified in Angiopteris, while an intron-containing trnT-UGU is found in Adiantum. Based on the findings, we speculate that Ψycf66 reflects an intermediate during ycf66 gene loss, and the genesis of trnT-UGU may be associated with the IR rearrangements. A trnR-UCG gene was detected between rbcL and accD in Alsophila, and this seems to be a molecular apomorphy of tree ferns. In the Adiantum cp genome, the trnR-UCG gene degenerates to a pseudogene. The Alsophila cp genome shares several unusual characteristics with the previously sequenced Adiantum (a polypod fern) cp genome, such as five missing tRNA genes and two major rearranged regions. These common characters probably derive from their common ancestor. In the Alsophila cp genome, a highly repeated 565-bp-region, which is composed of tandem iterations of 11 similar segments, occurs at one endpoint of an ancient inversion, but it is not detected in the genome of Adiantum. Nonetheless, the origin and function of these repeats remain to be characterized in future studies.
Genome sequencing and assembly
Young leaves of Alsophila spinulosa were collected from a plant growing in the greenhouse in Wuhan Botanical Garden, Chinese Academy of Sciences. A voucher specimen was deposited at Wuhan Botanical Garden. Total DNA was extracted using the CTAB-based method . The cp genome was amplified using polymerase chain reaction (PCR). In brief, the coding sequences were extracted from known chloroplast genomic sequences of three ferns [GenBank: NC_003386, NC_008829 and NC_004766], three bryophytes [GenBank: NC_001319, NC_005087 and NC_004543] and one lycophyte [GenBank: NC_006861] according to their annotations in GenBank. PCR primers were developed from alignments of the above coding sequences. Overlapping regions of each pair of adjacent PCR fragments exceeded 150 bp. We did not clone two inverted repeats (IRs) separately, but designed primers to amplify the regions spanning the junctions of LSC/IRA, LSC/IRB, SSC/IRA and SSC/IRB. Using these primers, we covered the entire cp genome of Alsophila with PCR products ranging in size from 500 bp to 5 kb. All PCR reactions were performed using TaKaRa LA taq (TaKaRa Bio Inc, Shiga, Japan). Amplified cp genome fragments were cloned into TaKaRa pMD19-T plasmids (TaKaRa Bio Inc, Shiga, Japan), which were then used to transform E. coli DH5α. Multiple (≥ 6) clones were randomly selected and commercially sequenced using ABI 3730xl DNA Analyzer (Applied Biosystems, Foster City, CA). For long fragments (> 1.4 kb), walking primers were designed based on acquired sequences and used for sequencing remaining sequences step by step. Gap regions (caused by unsuccessful PCR amplification or failed primer walking sequencing) were amplified using primers that flank the gaps, then cloned and sequenced as above. From the individual reads we excluded vector, primer and low-quality sequences, then we assembled the reads using Phrap . Since automated assembly methods cannot distinguish two IRs, we input the reads as two parts and acquired two large contigs, with each contig including one IR and its adjacent partial large and small single copy (LSC and SSC) regions. Then the two large contigs were manually assembled into the complete circular genome sequence. Inverted repeats were identified through alignment of the final complete genome sequence against itself via Blast 2 sequences at the National Center for Biotechnology Information . We accumulated 1,415,559 bp sequences, which is about 9-fold coverage.
Annotation and related study
Annotation of the Alsophila cp genome was performed using DOGMA (Dual Organellar GenoMe Annotator) . Genes that were undetected by DOGMA, such as ycf1, ycf2, rps16, ndhF, ndhG and matK, were identified by Blastx http://blast.ncbi.nlm.nih.gov/Blast.cgi. From this initial annotation, putative starts, stops, and intron positions were determined by comparisons with homologous genes in other cp genomes and by considering the possibility of RNA editing, which can modify the start and stop positions. tRNA genes were annotated using DOGMA and ARAGORN v1.2 http://188.8.131.52/ARAGORN/, and then confirmed by ERPIN http://tagc.univ-mrs.fr/erpin/ and TFAM Webserver v1.3 . The circular gene map of the Alsophila cp genome was drawn by GenomeVx  followed by manual modification.
Overall GC content was calculated for 118 land plant plastid genomes (see Additional file 1). For the Alsophila cp genome, GC content was farther determined for three groups of genes, protein-coding genes (85), rRNA genes (4) and tRNA genes (28), respectively. For protein-coding genes, GC content was calculated for the entire gene and the first, second and third codon positions, respectively. Protein-coding genes were partitioned into three main functional groups: photosynthetic genes, genetic system genes and NADH genes. GC content of the three groups of genes was then determined. The genes included in each of these three groups were: (1) photosynthetic genes (rbcL, atp*, pet*, psa* and psb*); (2) genetic system genes (rpl*, rps*, rpo*, clpP, infA and matK); and (3) NADH genes (ndh*).
Direct and inverted repeats in the Alsophila and Adiantum [GenBank: NC_004766] cp genomes were determined by using REPuter  at a repeat length ≥ 30 bp with a Hamming distance of 3. The entire genome was used to detect repeats in order to map them in both copies of the IR, but numbers of repeats were based on results from only one IR copy.
large single copy
small single copy
relative synonymous codon usage.
Raubeson LA, Jansen RK: Chloroplast genomes of plants. Plant diversity and evolution: genotypic and phenotypic variation in higher plants. Edited by: Henry RJ. 2005, London: CABI Publishing, 45-68.
Stein DB, Conant DS, Ahearn ME, Jordan ET, Kirch SA, Hasebe M, Iwatsuki K, Tan MK, Thomson JA: Structural rearrangements of the chloroplast genome provide an important phylogenetic link in ferns. Proc Natl Acad Sci USA. 1992, 89: 1856-1860. 10.1073/pnas.89.5.1856.
Wolf PG, Roper JM: Structure and evolution of fern plastid genomes. Biology and evolution of ferns and lycophytes. Edited by: Ranker TA, Haufler CH. 2008, Cambridge, UK: Cambridge University Press, 159-174.
Wolf PG, Rowe CA, Sinclair RB, Hasebe M: Complete nucleotide sequence of the chloroplast genome from a leptosporangiate fern, Adiantum capillus-veneris L. DNA Res. 2003, 10: 59-65. 10.1093/dnares/10.2.59.
Pryer KM, Schuettpelz E, Wolf PG, Schneider H, Smith AR, Cranfill R: Phylogeny and evolution of ferns (monilophytes) with a focus on the early leptosporangiate divergences. Am J Bot. 2004, 91: 1582-1598. 10.3732/ajb.91.10.1582.
Pryer KM, Schneider H, Smith AR, Cranfill R, Wolf PG, Hunt JS, Sipes SD: Horsetails and ferns are a monophyletic group and the closest living relatives to seed plants. Nature. 2001, 409: 618-622. 10.1038/35054555.
Kuroda H, Suzuki H, Kusumegi T, Hirose T, Yukawa Y, Sugiura M: Translation of psbC mRNAs starts from the downstream GUG, not the upstream AUG, and requires the extended shine-dalgarno sequence in tobacco chloroplasts. Plant Cell Physiol. 2007, 48: 1374-1378. 10.1093/pcp/pcm097.
Wakasugi T, Tsudzuki T, Sugiura M: The genomics of land plant chloroplasts: Gene content and alteration of genomic information by RNA editing. Photosynth Res. 2001, 70: 107-118. 10.1023/A:1013892009589.
Wolf PG, Rowe CA, Hasebe M: High levels of RNA editing in a vascular plant chloroplast genome: analysis of transcripts from the fern Adiantum capillus-veneris. Gene. 2004, 339: 89-97. 10.1016/j.gene.2004.06.018.
Shinozaki K, Ohme M, Tanaka M, Wakasugi T, Hayashida N, Matsubayashi T, Zaita N, Chunwongse J, Obokata J, Yamaguchi-Shinozaki K, et al: The complete nucleotide sequence of the tobacco chloroplast genome: its gene organization and expression. EMBO J. 1986, 5: 2043-2049.
Chumley TW, Palmer JD, Mower JP, Fourcade HM, Calie PJ, Boore JL, Jansen RK: The complete chloroplast genome sequence of Pelargonium × hortorum: organization and evolution of the largest and most highly rearranged chloroplast genome of land plants. Mol Biol Evol. 2006, 23: 2175-2190. 10.1093/molbev/msl089.
Ogihara Y, Terachi T, Sasakuma T: Intramolecular recombination of chloroplast genome mediated by short direct-repeat sequences in wheat species. Proc Natl Acad Sci USA. 1988, 85: 8573-8577. 10.1073/pnas.85.22.8573.
Pombert JF, Lemieux C, Turmel M: The complete chloroplast DNA sequence of the green alga Oltmannsiellopsis viridis reveals a distinctive quadripartite architecture in the chloroplast genome of early diverging ulvophytes. BMC Biol. 2006, 4: 3-10.1186/1741-7007-4-3.
Pombert JF, Otis C, Lemieux C, Turmel M: The chloroplast genome sequence of the green alga Pseudendoclonium akinetum (Ulvophyceae) reveals unusual structural features and new insights into the branching order of chlorophyte lineages. Mol Biol Evol. 2005, 22: 1903-1918. 10.1093/molbev/msi182.
Cai Z, Guisinger M, Kim HG, Ruck E, Blazier JC, McMurtry V, Kuehl JV, Boore J, Jansen RK: Extensive reorganization of the plastid genome of Trifolium subterraneum (Fabaceae) is associated with numerous repeated sequences and novel DNA insertions. J Mol Evol. 2008
We give special thanks to Paul G. Wolf for his constructive comments and suggestions. We are grateful to Bobby A Brown and George Littlejohn for significantly improving English, to Ling-Ling Gao, Cynthia Gleason and Su-Min Guo for helpful comments on the manuscript, and to Chen Lu and Gong Xiao for providing valuable assistance on the use of computer hardware and software. This work was supported by the "100 Talent Project"of the Chinese Academy of Sciences (Grant No.: 0729281F02), the "Outstanding Young Scientist Project" of the Natural Science Foundation of Hubei Province, China (Grant No.: O631061H01) and the Open Project of the State Key Laboratory of Biocontrol, China (Grant No.: 2007-01). We also gratefully acknowledge the valuable comments by three anonymous reviewers.
Authors and Affiliations
Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, Hubei, 430074, PR China
Lei Gao, Xuan Yi, Yong-Xia Yang & Ting Wang
Graduate School, Chinese Academy of Sciences, Beijing, 100039, PR China
Lei Gao, Xuan Yi & Yong-Xia Yang
State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510275, PR China
LG participated in the conception of this study, carried out part of the genome sequencing, performed all sequence analyses, annotated the genome, generated tables and figures, and drafted the manuscript. XY and YXY participated in genome sequencing. YJS and TW conceived and supervised the project, contributed to the interpretation of the data and prepared the manuscript. All authors read and approved the final manuscript.
This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License (
), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.