Evolutionary directions of single nucleotide substitutions and structural mutations in the chloroplast genomes of the family Calycanthaceae
BMC Evolutionary Biology volume 20, Article number: 96 (2020)
Chloroplast genome sequence data is very useful in studying/addressing the phylogeny of plants at various taxonomic ranks. However, there are no empirical observations on the patterns, directions, and mutation rates, which are the key topics in chloroplast genome evolution. In this study, we used Calycanthaceae as a model to investigate the evolutionary patterns, directions and rates of both nucleotide substitutions and structural mutations at different taxonomic ranks.
There were 2861 polymorphic nucleotide sites on the five chloroplast genomes, and 98% of polymorphic sites were biallelic. There was a single-nucleotide substitution bias in chloroplast genomes. A → T or T → A (2.84%) and G → C or C → G (3.65%) were found to occur significantly less frequently than the other four transversion mutation types. Synonymous mutations kept balanced pace with nonsynonymous mutations, whereas biased directions appeared between transition and transversion mutations and among transversion mutations. Of the structural mutations, indels and repeats had obvious directions, but microsatellites and inversions were non-directional. Structural mutations increased the single nucleotide mutations rates. The mutation rates per site per year were estimated to be 0.14–0.34 × 10− 9 for nucleotide substitution at different taxonomic ranks, 0.64 × 10− 11 for indels and 1.0 × 10− 11 for repeats.
Our direct counts of chloroplast genome evolution events provide raw data for correctly modeling the evolution of sequence data for phylogenetic inferences.
Genome evolution is a major theme of biology in the genomics era. The topics cover patterns, directions and rates of substitutions, repeats, rearrangements and recombinations, hybridization and polyploidy, lateral gene transfer, gene families, etc., with varying depth and scope [1, 2]. Genome evolution can be easily demonstrated by genome structure mutations and nucleotide substitutions. The mutations of genome structure include insertions/deletions (indels) and inversions. The nucleotide substitutions are classified into transition (Ts) and transversion (Tv). A genome consists of coding regions and noncoding regions (including introns and intergenic spacers). The nuclear genome is usually very large and complicate. Moreover, plant nuclear genome sequencing is still a bottle-neck due to high costs, bacteria or fungi contaminations, high heterozygosity, etc. [3, 4]. Only economically important or model plants have their complete nuclear genomes sequenced. Instead, the chloroplast genomes unique to plants are very much smaller and easier to manipulate, and it is more likely to give a complete picture of plant genome evolution. Therefore, chloroplast genomes are currently a right choice for genome evolution studies.
Considerable attention has been paid to the evolution rate variations among genes or lineages [5, 6]. The chloroplast genes such as ndhF, matK, rbcL and trnL-F have been well studied [7, 8]. Chloroplast DNA shows a biased transition (Ts) mutation toward A and T. For example, the frequency of A and T at the 3rd codon position is fourfold of other nucleotides in the rbcL gene of angiosperms . This explains why the chloroplast genomes are usually A/T rich. The rates of the single nucleotide mutations are not uniform among different genes. Transition/transversion ratios (Ts/Tv) are 0.9 for rbcL and 1.4 for matK [9, 10]. Among the eight transversion mutation possibilities, from A to T and C to G are significantly less frequent than the other four possibilities [11,12,13]. Whether this observation is a general pattern or only a special case remains to be tested at genome level.
Structural mutations of genomes convey important evolution information of organisms. The indels and inversions are rich in chloroplast genomes and can be reliably identified and used to reveal the evolution of organisms [14, 15]. Structural mutations are not randomly distributed throughout the chloroplast genome . Tandem repeat-induced indels showed a statistically significant bias towards A/T-rich and the indel mutation rate was estimated to be approximately 0.8 ± 0.04 × 10− 9 per site per year in Poaceae . Short inversions also have a widespread occurrence in chloroplast genomes and often form stem-loop structures [17,18,19]. Understanding the evolution of such structural mutations is crucial for making full and correct use of the genome information .
The rates of DNA mutation is one of the core questions in molecular evolution . The mutation rates can be estimated from either mutation accumulation (MA) lines [21, 22] or phylogenetic inference . For the latter method, if the branch age is known, the absolute substitution rate can be calculated. The branch age is usually dated using calibrated molecular clocks. It is a common practice to infer the directions of mutations according to a phylogeny which, unfortunately, is usually based on the same dataset. It would be better if the phylogeny is independent to dataset . The family Calycanthaceae serves as an ideal reference because the phylogenetic relationships within the family are self-evident. Calycanthaceae is a small family holding a position at the base of Laurales . There are in total about 10 species belonging to three genera of two subfamilies (see below for more details). The phylogenetic topology of the family has only one possibility at subfamily, genus and species ranks (Fig. 1), which enables us to infer the chloroplast genome evolution in this family at different taxonomic ranks. In this study, we use Calycanthaceae as a model to empirically observe the directions and to estimate the rates of different mutations imprinted in the chloroplast genomes at different taxonomic ranks. More specifically, we are going to answer (1) if there are significant mutation rate differences between the two subfamilies, between the two genera within subfamily Calycanthoideae, and among species within Calycanthus and within Chimonanthus; and (2) what the evolutionary patterns, directions of nucleotide substitutions and structural mutations in the Calycanthaceae chloroplast genomes are like. Such kind of empirical data is of values for precisely modeling the chloroplast genome evolution.
The family Calycanthaceae and sampling strategies
Calycanthaceae holds a basal position in Laurales . The family is subdivided into two subfamilies: subfamily Idiospermoideae [one genus, one species, Idiospermum australiense (Diels) S. T. Blake] and subfamily Calycanthoideae (two genera, ca. nine species). Subfamily Idiospermoideae was sometimes considered a distinct family  but it is more natural in Calycanthaceae. There are three species in Calycanthus, one in China (C. chinensis Cheng & S. Y. Chang) and two in USA (C. floridus L. and C. occidentalis Hook. & Arn.). Calycanthus chinensis was once separated and put in the monotypic genus Sinocalycanthus. There are about six species in Chimonanthus. There are deciduous species like Ch. praecox L., and evergreen species like Ch. nitens Oliv. (see Zhou et al. 2006 for details).
We selected five species, C. chinensis, C. floridus, Ch. nitens, Ch. praecox and Idiospermum australiense, to represent two subfamilies, three genera and deciduous and evergreen species within Chimonanthus. Thus their phylogenetic relationships are intuitively quite clear (Fig. 1). The chloroplast genome of C. floridus has been determined  and the genomes of other four species need to be determined. Young and healthy leaves of these species were collected. The voucher details of the samples are given in Supplementary Table S1. According to APG IV , more basal species Liriodendron tulipifera L. and Magnolia kwangsiensis Figlar & Noot. are used as outgroups.
Chloroplast genome sequencing and annotation
Genomic DNA was extracted from the silica gel-dried leaves of four species using mCTAB method  and purified using the Wizard DNA Clean-Up System (A7280, Promega). The fragments covering the whole chloroplast genomes were amplified using the universal primers provided by Dong et al. . Specific primers were designed based on the chloroplast genome of C. floridus using Primer Premier v. 5.0 (Premier Biosoft International, CA, USA) and Oligo v. 6.71 (Molecular Biology Insights, CO, USA) to bridge the gaps (Supplementary Table S2).
The genomes were assembled using Sequencher ver. 4.7 (Gene Code) with the genome of C. floridus as a reference. The resulting genomes were annotated using Dual Organellar Genome Annotator (DOGMA) . All tRNA genes were further verified using the corresponding structures predicted by tRNAscan-SE 1.21 .
Codon usage analysis
Codon usage was determined for all protein-coding genes. The frequency of codon usage and the amino acid composition was determined using MEGA version 7 .
Chloroplast genome alignments and molecular dating
To estimate the divergence times of subfamilies, genera and species within Calycanthaceae, we conducted molecular dating analyses by adding 22 more genome sequences from GenBank (Supplementary Table S3) representing the major monophyletic branches of basal angiosperms, monocots, rosids, Saxifragales, and asterids. Sequences of the eighty-three coding genes were extracted from the genomes, aligned and concatenated into a super matrix. Rare insertions and unreliably aligned regions were excluded from analyses. The dating methods were the same as Xue et.al . For calibration, three constraints were used: (i) the angiosperm crown group was set to a minimum age of 131.8 Mya ; (ii) the eudicots crown was set to a minimum age of 125 Mya; (iii) the crown group of Calycanthaceae and Magnoliaceae was set to a maximum age of 140 Mya based on the onset of angiosperm radiation, and the minimum age of Calycanthaceae was set to a minimum age of 90 Mya according to the fossil record of Jerseyanthus calycanthoides .
Genome partition and mutation identification
The genome sequence data were subject to a series of grouping to test the variations of mutation rates and directionality: (1) coding regions, introns and intergenic spacers; (2) transition and transversion of coding genes; (3) synonymous (dS) and nonsynonymous (dN) substitution of coding genes; and (4) gene clusters of the same functions. Hierarchical outgroup usage was adopted to infer the directions of changes. At subfamily rank, L. tulipifera and M. kwangsiensis was used as outgroups; at genus rank within subfamily Calycanthoideae, I. australiense was used as an outgroup; and at species rank within Calycanthus and Chimonanthus, they were used as outgroups reciprocally. The directions of mutations were identified according to the outgroup(s). If a state is the same as the outgroup(s), the state is considered plesiomorphic, and accordingly, the state different from that of outgroup(s) is considered apomorphic.
We classified structural mutations into four categories: insertions or deletions (indels, Fig. 2a), repeats (Fig. 2b), microsatellites (Fig. 2c & d), and inversions (Fig. 2e). Although microsatellites are also indels or repeats, we classified them out as a distinct category for their very high variabilities. Inversions were sought out using the REPtuter program  first and then confirmed by reexamining the alignments. Each inversion is always accompanied by an inverted repeat at the opposite franking end, which forms a stem-loop structure. Conventional statistics were used to give estimates and to test the significance of the numbers and ratios of the mutations. The Chi-square test, Fisher’s exact test, T-test, significance test of correlation coefficient, and Wilcoxon signed rank test were done with R packages. Genetic distances between any taxa were calculated using MEGA 5 .
Estimation of mutation rate
The rate of mutation per site per year (μ) and its variance (ν) were estimated using the following formula .
Where m is the number of observed mutation, n is the number of total sites, and T is the divergence time of a node.
The μ and ν values of structural mutations were calculated using the method of Saitou and Ueda . In their method, the total number of structural mutations was divided by the additive time based on branch lengths and by the length of the nucleotide sequences.
Availability of supporting data
The data sets supporting the results of this article are given as supporting files. All nucleotide sequences were deposited in the NCBI GenBank repository (GenBank accession numbers: MH377056- MH377059).
General features of the Calycanthaceae chloroplast genomes
The structures of the four new chloroplast genomes were very similar to that of C. floridus (Supplementary Figure S1, Table 1). The genome sizes varied from 153,250 bp (Ch. praecox) to 154,746 bp (I. australiense). The overall G + C content was 39.23–39.30%, and the coding regions accounted for 58.99 to 59.24%. All five genomes encoded 114 unique genes with identical gene order and gene clusters, 17 or 18 of them were duplicated in the inverted repeat (IR) region (Supplementary Table S4 and Figure S2). Each chloroplast genome comprised four rRNA genes (6S, 23S, 5S, 4.5S), 30 tRNA genes and 80 protein-coding genes including ycf1, ycf2, ycf3 and ycf4. Each of 16 genes contained a single intron, and each of two genes (ycf3 and clpP) contained two introns. The mRNA gene of rps12 was generated by transsplicing.
The transition areas between the large single copy region (LSC) and the IR (LSC/IRb and IRa/LSC) were identical in Calycanthoideae but different in Idiospermoideae. In Calycanthoideae, the rpl2 gene was in the IR region and the rps19 gene was in LSC region (Supplementary Figure S2). However, in Idiospermoideae, both the rpl2 and rps19 genes were in IR region.
The codon usage and amino acid composition frequencies were not significantly different among the five Calycanthaceae chloroplast genomes (Supplementary Figure S3). The total protein coding genes comprised 68,394–68,463 bp that encoded 22,798–22,821 codons. Of these codons, the ATT (3.47%) was the most frequent codon and the TGC (0.30%) was the least. About 10.24% codons encoded leucine, whereas only 1.14% codon encoded cysteine, which were the most and the least frequently used amino acids in the chloroplast genome, respectively.
Evolution of the single nucleotide polymorphic loci
There were 2861 single nucleotide polymorphism (SNP) in the five Calycanthaceae chloroplast genomes. Among them, 2781 were biallelic, 22 (0.79%) SNPs were triallelic, and 58 (2.03%) were parallel mutation SNPs at different taxonomic ranks. The trialletic and parallel mutation SNPs were excluded from subsequent analyses. Of the 2781 sites, the mutation directions of 2573 sites were inferable according to the outgroups. The ratios of Ts to Tv were high at subfamily and species ranks (1.53 and 1.53 or 1.91, respectively) but low at genus rank (1.09), with an overall value of 1.61. The proportions of two transitional mutations and four transversional mutations varied significantly among one another at family, genus and species ranks (G test, P < 0.01). Among the four transversion mutations, A → C + T → G mutations were much more common than other three possibilities (Fig. 3). A → T + T → A and G → C + C → G were much fewer. However, the overall ratio of nonsynonymous mutations to synonymous mutations (dN/dS) in the coding regions nearly equaled 1 (Table 2). When the SNPs of the eight functional gene clusters (atp, ndh, pet, psa, psb, rpl, rpo, and rps) were sorted into photosynthetic metabolism- (atp and ndh), apparatus- (pet, psa and psb), and ribosomal protein-related (rpl, rpo and rps) groups, there was a tendency that the photosynthetic apparatus-related genes (pet, psa and psb) had the highest Ts/Tv (= 2.86 ~ 4.83) but the lowest dN/dS (= 0.17 ~ 0.38) values (Fig. 4), indicating these genes were under high selection pressure.
The 2781 biallelic SNPs were subdivided into coding, intron and intergenic spacer regions and sorted into subfamily, genus and species ranks (Table 2). At subfamily rank, there were 1996 SNPs in total: 980 in coding regions, 185 in intron regions, and 831 in intergenic spacer regions. The percentages of SNP to the total lengths were 1.43, 1.31 and 1.93%, respectively, indicating sequences of intergenic spacers were relatively more variable, but not the intron regions. In the coding regions, the ratio of 489 nonsynonymous mutations to 491 synonymous mutations (dN/dS) was very close to 1.0.
The SNPs at genus rank within Calycanthoideae or at species rank within Calycanthus and within Chimonanthus were much fewer than those at subfamily rank (Table 2). At genus rank, there were 269 mutations, 118 (43.9%) in coding regions, 30 (11.1%) in intron regions, and 121 (45.0%) in intergenic spacer regions. Deviation of dN/dS (=0.82) to synonymous mutations was obvious at genus rank. At species rank, sequence variability was similar to genus rank, with 283 mutations within Calycanthus and 233 within Chimonanthus (Table 2). Again the intergenic spacers were more variable than introns and coding regions. The ratios of dN/dS were larger than 1, indicating nonsyonymous mutations had been fixed in the genomes.
Evolution of structural mutations
There were 178 indels in total (Table 3). Among them 143 (80.3%) were observed at the subfamily rank, 11 (6.2%) at the genus rank, 11 (6.2%) within Calycanthus, and 13 (7.3%) within Chimonanthus. Most indels occurred in intergenic spacers. Only 10 indels were observed in coding regions (seven in ycf1, one in rpl33, rpoC1 and accD, respectively). The length of the indels ranged from one to 364 base pairs, but 75.8% of them were not longer than 10 bp. The direction of indel mutations was deletion-biased (Fisher’s exact test, P < 0.001). The deletions were about 8.5 times as much as the insertions (Table 3).
There were 116 repeating events (Table 3). Among them 89 (76.7%) were observed at the subfamily rank, 17 (14.7%) at the genus rank, 7 (6.0%) within Calycanthus, and 3 (2.6%) within Chimonanthus. Most repeats occurred in intergenic spacers. Only 6 repeats were observed in coding regions (three in ycf1, one in ndhF, rpoC1, and rpoC2, respectively). The length of the repeats ranged from three to 22 base pairs, and four to six base pairs were the most common (84.48%, Fig. 3). Unlike indels, the direction of repeat mutations was insertion-biased (Fisher’s exact test, P < 0.001). Approximately 85.3% of the total repeats were insertions.
There were 129 microsatellites, 96 were poly A/T, 13 were poly C/G, eight were 2-bp loci and two were 3-bp loci. All microsatellite loci located in noncoding regions and were A/T-rich indels (Fisher’s exact test, P < 0.001). Their mutation directions were not inferable.
There were 13 inversions of 2-bp to 92-bp in length in the five Calycanthaceae chloroplast genomes (Table 4). All inversions were accompanied by a pair of inverted repeats immediately flanking the inversion (Fig. 2e). Seven inversions occurred at the subfamily rank, two at the genus rank, three at the species rank, and one at both the subfamily and the species ranks. Eight inversions occurred in the LSC region, three in the SSC region, and two in the IR region. A 4-bp inversion occurred within the coding region of ndhG, an 8-bp inversion happened in the ndhA intron, and 11 inversions were found in intergenic spacers. The franking inverted repeats were from 7 bp to 36 bp in length. No correlation was inferred between the length of the inversions and the inverted repeats (p = 0.31). Convergence happened on inversions. For example, both C. chinensis and Ch. nitens shared an inversion at trnNGUU-trnRACG, and C. floridus and Ch. praecox shared the inversion at psbC-trnSUGA (Table 4).
Association of structural mutations to nucleotide substitutions
There would exist certain relationships between structural mutation (S) and nucleotide mutations (N). We calculated genetic distances (gd) between regions with and without structural mutations using SNP at three taxonomic ranks (Table 5). It was obvious that the ratio of N to S (N/S) was the smallest at subfamily rank and the largest at species rank. On the contrary, the genetic distances between regions with S (gdS) and without S (gdN) were the largest at subfamily rank and smallest at species rank. Similar patterns existed when only coding or noncoding regions were considered. Noticeably, the gdS was always larger than gdN, suggesting that the nucleotides in the regions with structural mutations were more variable than those in the regions without structural mutations.
Evolutionary rates of nucleotide and structural mutations
The divergence times of the Calycanthaceae were estimated using a matrix of 83 genes from 27 species of angiosperms (Supplementary Table S3). The total length of the matrix was 63,525 bp. The divergence time of the family was estimated to be 110 Mya from other members in Laurales, 108.8 Mya between the Calycanthoideae and the Idiospermoideae, 18.7 Mya between Calycanthus and Chimonanthus, 7.6 Mya between the two species of Calycanthus, and 6.2 Mya between the two species of Chimonanthus (Supplementary Figure S4). The evolutionary rates of the genomes were calculated using the lengths of the genomes, the number of substitutions and the times since divergence. The rates of nucleotide substitution were 0.14 × 10− 9 per site per year at the subfamily rank, 0.19 × 10− 9 per site per year at the genus rank, 0.34 × 10− 9 per site per year in Calycanthus, 0.32 × 10− 9 per site per year in Chimonanthus, and 0.25 × 10− 9 per site per year in the Calycanthaceae on average. The mutation rates of indels and repeats were estimated to be 1.73 × 10− 11 per site per year. Indels had a rate of 0.64 × 10− 11 per site per year and repeats had a rate of 1.0 × 10− 11 per site per year. The rate of microsatellites and the inversions were not considered because these mutations lacked phylogenetic information and the directions were not inferable.
Mutations are at least the raw materials if not a drive in evolution . It is tedious to count the molecular mutations even in the small genomes such as chloroplast genomes of plants and most of our understanding of evolution is based on small regions instead of the whole genome. What seems more difficult is to determine the directions of mutations, a critical concept in evolution. The inherently obvious phylogenetic relationships within Calycanthaceae serve as an ideal reference for studying the chloroplast genome evolution at subfamily, genus and species ranks.
General patterns of chloroplast genome evolution
Mutations in Calycanthaceae had experienced a long history of natural selection and exhibited some unique patterns that changed our general notion on genome evolution. It is believed that there are sites which mutate more frequently than other sites. Such kind of sites were very rare in Calycanthaceae and perhaps in other species as well. About 98% of single nucleotide mutant sites had two alleles and only 22 (0.79%) sites had more than two alleles. Reverse mutations should be even rarer (2.03%). This implies that modeling the evolution of such uncertain sites does not add much knowledge to the study.
Nucleotide mutations are of unequal possibilities. Changes from C or G to A or T were more common in the chloroplast genomes of Calycanthaceae (and other seed plants) [12, 13]. This explains why A or T is about 10% more than G or C. However, such kind of biased mutations happened only in the noncoding regions [12, 44]. The mutations biased to the opposite side in the coding regions. The G and C content of the coding genes was approximately 60%. Of course there are a few exceptions, esp. non-seed plants, for example, Selaginella . It is postulated that the mutation rate of noncoding regions is higher than that of the coding regions. In our case, it is true for intergenic spacer regions. The nucleotide mutation rate in introns was almost the same as that of coding regions. Synonymous substitutions which do not change the function of a gene, are believed to be more frequently present than nonsynonymous substitutions. However, in the coding genes of the chloroplast genomes of Calycanthaceae, dN/dS equaled one, indicating a balanced synonymous and nonsynonymous mutations.
There are mutation peaks and valleys in the plastid genomes owing to functional constrains , and nonsynonymous mutations are believed to be a genetic force that generates heterogeneity . The dN/dS, the ratio indicating natural selection pressure, varies across the plastid genomes of the Calycanthaceae. Photosynthetic apparatus genes have the lowest ratio of dN/dS, but the rpo-genes show relatively high dN/dS (Fig. 4). Therefore, “lucky genes” might sometimes be found even from coding genes for phylogeny .
Structural mutations are not as common as nucleotide substitutions. Structural mutations occur mostly in noncoding regions, especially for microsatellite mutations (Table 2 and Table 3). Motifs of a single nucleotide repeats are much more common than the typical motifs of two to six nucleotides [14, 49, 50]. Structural mutations are characterized by insertions, deletions, repeats, inversions and microsatellites of small length, usually shorter than 10 bases or base pairs .
Direction of mutations
Although there are eight possible transversional changes and only four transitional changes, transitions are much more common in the chloroplast genomes. Among the eight transversions, their frequencies are different. Mutations of A → C + T → G and C → A + G → T were much more common than other mutations in the chloroplast genomes of Calycanthaceae. There is a nucleotide T bias in chloroplast genomes. Methylation-deamination and uncorrected post-replicative transition mutation lead to the replacement of C with T. The same phenomenon was observed on nuclear genome data [21, 51] and animal mitochondrial genome data .
Structural mutations are potential informative phylogenetic characters of low homoplasy if handled properly. For example, Arachis , yam , and Cynara  have 85, 43, 34 indels in their whole chloroplast genomes, respectively. However, the direction of structural mutations are of less focus . Same indels and repeats do not occur by chance in different taxa and data of correctly coded indels and repeats are of very low homoplasy in phylogeny (Table 3). In Calycanthacee the direction of indels and repeats were easily inferable. For indels, deletions were much more common than insertions; and for repeats, repeat gains were much more than repeat losses. Microsatellites are sometimes used in phylogenetic analyses and a repeat number increase was observed in maize short term evolution . However, in this study the directions of microsatellite mutations in the chloroplast genomes of Calycanthaceae did not have a general tendency either to increase or to decrease at family, genus or species ranks. Therefore, inclusion of the gaps induced by microsatellites above species rank is likely to introduce homoplasy into datasets. Parallel mutations of inversions among individuals were observed in Calycanthaceae. Inversions often happen due to repetitive structures by parallel or back mutations during chloroplast genome evolution [17, 19]. Inversions must be identified when doing alignment but the gaps induced by inversions should be used with caution above species rank. In short, the structural mutations are often a mixture of phylogenetic signals and noises and only phylogenetic informative structural mutation information is considered when nucleotide data are insufficient to resolve a phylogeny [57, 58].
Taxonomic rank- and genome position-related mutations
It is now a consensus that mutation rate varies among lineages . Many studies showed that annual plants evolve faster than perennials and trees [5, 60]. Our study demonstrated an accelerated evolution from subfamily to species in Calycanthaceae. The mutation rate is the lowest at subfamily rank and highest at species rank. Such a heterogeneity of mutation rates among taxonomic ranks is probably due to life history, time of selection and random mutations [5, 61]. In Calycanthaceae, the ancestors were likely to be large trees of very long life history such as I. australiense. Shrubs of relatively short life history are a derived character state . From the branch lengths it is easy to understand that ancestors of higher taxonomic ranks experienced longer natural selection that might have eliminated quite many variations. Evolution might accelerate in one historical period and decelerate in the other.
Although it is well known that mutation rates vary significantly among genes of genomes . Some regions have been proven more variable than others [63,64,65], for example, ycf1, trnH-psbA, rpl32-trnL. Generally ribosomal protein-related and photosynthetic metabolic genes had higher mutation rates than photosynthetic apparatus genes in Calycanthaceae chloroplast genome (Fig. 4). There are two explanations. The first one is that the ribosomal protein-related and photosynthetic metabolic genes have relaxed evolutionary constraint. For example, the rpl32 have been completely lost from the chloroplast genome in multiple lineages within the land plants [66, 67]. The second one is the functions of some of these genes have been taken over by some genes in the nuclear genome and strong coevolution between plastid- and nuclear-encoded subunits that have accelerated mutation rates [67,68,69,70]. The regions of rich structural mutations also had higher nucleotide diversity in the chloroplast genomes [44, 71, 72].
A detailed understanding of the characteristics of the chloroplast genome evolution is helpful for correct use of chloroplast genome data in evolutionary biology. We observed the taxonomic and genomic distributions of mutations existed in the five chloroplast genomes of Calycanthaceae, inferred their directions, and estimated the rate of evolution. These direct observations provide raw data for computer simulation and modeling true evolution of sequence data.
Availability of data and materials
The datasets supporting the results of this article are included within additional files. All nucleotide sequences were deposited in the NCBI GenBank repository (MH377056 - MH377059).
Million years ago
Cetyl trimethylammonium bromide
Dual organellar genome annotator
Polymerase chain reaction
Large single copy
Small single copy
Number of substitutions per nonsynonymous site
Number of substitutions per synonymous site
Soltis PS, Marchant DB, Van de Peer Y, Soltis DE. Polyploidy and genome evolution in plants. Curr Opin Genet Dev. 2015;35:119–25.
Yang Z, Rannala B. Molecular phylogenetics: principles and practice. Nat Rev Genet. 2012;13(5):303–14.
Li Z, Baniaga AE, Sessa EB, Scascitelli M, Graham SW, Rieseberg LH, Barker MS. Early genome duplications in conifers and other seed plants. Sci Adv. 2015;1(10):e1501084.
Wu GA, Terol J, Ibanez V, Lopez-Garcia A, Perez-Roman E, Borreda C, Domingo C, Tadeo FR, Carbonell-Caballero J, Alonso R, et al. Genomics of the origin and evolution of citrus. Nature. 2018;554(7692):311–6.
Gaut B, Yang L, Takuno S, Eguiarte LE. The patterns and causes of variation in plant nucleotide substitution rates. Annu Rev Ecol Evol Syst. 2011;42:245–66.
Lynch M, Ackerman MS, Gout J-F, Long H, Sung W, Thomas WK, Foster PL. Genetic drift, selection and the evolution of the mutation rate. Nat Rev Genet. 2016;17(11):704–14.
Morton BR. Neighboring Base composition and Transversion transition Bias in a comparison of Rice and maize chloroplast noncoding regions. Proc Natl Acad Sci U S A. 1995;92(21):9717–21.
Clegg MT, Gaut BS, Learn GH, Morton BR. Rates and patterns of chloroplast DNA evolution. Proc Natl Acad Sci U S A. 1994;91(15):6795–801.
Hilu KW, Liang HP. The matK gene: sequence variation and application in plant systematics. Am J Bot. 1997;84(6):830–9.
Manen JF, Cuenoud P, Martinez MDP. Intralineage variation in the pattern of rbcL nucleotide substitution. Plant Syst Evol. 1998;211(1–2):103–12.
Bakker FT, Culham A, Gomez-Martinez R, Carvalho J, Compton J, Dawtrey R, Gibby M. Patterns of nucleotide substitution in angiosperm cpDNA trnL (UAA)-trnF (GAA) regions. Mol Biol Evol. 2000;17(8):1146–55.
Dong W, Liu H, Xu C, Zuo Y, Chen Z, Zhou S. A chloroplast genomic strategy for designing taxon specific DNA mini-barcodes: a case study on ginsengs. BMC Genet. 2014;15(1):138.
Dong W, Xu C, Li D, Jin X, Li R, Lu Q, Suo Z. Comparative analysis of the complete chloroplast genome sequences in psammophytic Haloxylon species (Amaranthaceae). PeerJ. 2016;4:e2699.
Borsch T, Quandt D. Mutational dynamics and phylogenetic utility of noncoding chloroplast DNA. Plant Syst Evol. 2009;282(3):169–99.
Morrison D. A framework for phylogenetic sequence alignment. Plant Syst Evol. 2009;282(3):127–49.
Yamane K, Yano K, Kawahara T. Pattern and rate of indel evolution inferred from whole chloroplast intergenic regions in sugarcane, maize and rice. DNA Res. 2006;13(5):197–204.
Kim KJ, Lee HL. Widespread occurrence of small inversions in the chloroplast genomes of land plants. Molecules Cells. 2005;19(1):104–13.
Kelchner SA, Wendel JF. Hairpins create minute inversions in non-coding regions of chloroplast DNA. Curr Genet. 1996;30(3):259–62.
Catalano SA, Saidman BO, Vilardi JC. Evolution of small inversions in chloroplast genome: a case study from a recurrent inversion in angiosperms. Cladistics. 2009;25(1):93–104.
Kelchner SA. The evolution of non-coding chloroplast DNA and its application in plant systematics. Ann Mo Bot Gard. 2000;87(4):482–98.
Ossowski S, Schneeberger K, Lucas-Lledo JI, Warthmann N, Clark RM, Shaw RG, Weigel D, Lynch M. The rate and molecular Spectrum of spontaneous mutations in Arabidopsis thaliana. Science. 2010;327(5961):92–4.
Haag-Liautard C, Dorris M, Maside X, Macaskill S, Halligan DL, Charlesworth B, Keightley PD. Direct estimation of per nucleotide and genomic deleterious mutation rates in Drosophila. Nature. 2007;445(7123):82–5.
Lanfear R, Welch JJ, Bromham L. Watching the clock: studying variation in rates of molecular evolution between species. Trends Ecol Evol. 2010;25(9):495–503.
Wolfe KH, Li WH, Sharp PM. Rates of nucleotide substitution vary greatly among plant mitochondrial, chloroplast, and nuclear DNAs. Proc Natl Acad Sci U S A. 1987;84(24):9054–8.
Zhou S, Renner SS, Wen J. Molecular phylogeny and intra- and intercontinental biogeography of Calycanthaceae. Mol Phylogenet Evol. 2006;39(1):1–15.
Renner SS. Circumscription and phylogeny of the Laurales: evidence from molecular and morphological data. Am J Bot. 1999;86(9):1301–15.
Kubitzki K. Calycanthaceae. In: Kubitzki K, Rohwer JG, Bittrich V, editors. Flowering Plants · Dicotyledons: Magnoliid, Hamamelid and Caryophyllid Families. Berlin, Heidelberg: Springer Berlin Heidelberg; 1993. p. 197–200.
Goremykin V, Hirsch-Ernst K, lfl S W, Hellwig F. The chloroplast genome of the “basal” angiospermCalycanthus fertilis–structural and phylogenetic analyses. Plant Syst Evol. 2003;242(1):119–35.
The Angiosperm Phylogeny G. An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG IV. Bot J Linn Soc. 2016;181(1):1–20.
Li J, Wang S, Jing Y, Wang L, Zhou S. A modified CTAB protocol for plant DNA extraction. Chin Bull Bot. 2013;48(1):72–8.
Dong W, Xu C, Cheng T, Lin K, Zhou S. Sequencing angiosperm plastid genomes made easy: a complete set of universal primers and a case study on the phylogeny of Saxifragales. Genome Biol Evol. 2013;5(5):989–97.
Wyman SK, Jansen RK, Boore JL. Automatic annotation of organellar genomes with DOGMA. Bioinformatics. 2004;20(17):3252–5.
Schattner P, Brooks AN, Lowe TM. The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs. Nucleic Acids Res. 2005;33:W686–9.
Kumar S, Stecher G, Tamura K. MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol. 2016;33(7):1870–4.
Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32(5):1792–7.
Rambaut A. Se-Al: Sequence alignment editor Ver. 2.0a11. 2002. http://tree.bio.ed.ac.uk/software/seal/.
Xue JH, Dong WP, Cheng T, Zhou SL. Nelumbonaceae: systematic position and species diversification revealed by the complete chloroplast genome. J Syst Evol. 2012;50(6):477–87.
Doyle JA. Revised Palynological correlations of the lower Potomac group (USA) and the Cocobeach sequence of Gabon (Barremian-Aptian). Cretac Res. 1992;13(4):337–49.
Crepet WL, Nixon KC, Gandolfo MA. An extinct calycanthoid taxon, Jerseyanthus calycanthoides , from the Late Cretaceous of New Jersey. Am J Bot. 2005;92(9):1475–85.
Kurtz S, Choudhuri JV, Ohlebusch E, Schleiermacher C, Stoye J, Giegerich R. REPuter: the manifold applications of repeat analysis on a genomic scale. Nucleic Acids Res. 2001;29(22):4633–42.
Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S. MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol. 2011;28(10):2731–9.
Denver DR, Dolan PC, Wilhelm LJ, Sung W, Lucas-Lledo JI, Howe DK, Lewis SC, Okamoto K, Thomas WK, Lynch M, et al. A genome-wide view of Caenorhabditis elegans base-substitution mutation processes. Proc Natl Acad Sci U S A. 2009;106(38):16310–4.
Saitou N, Ueda S. Evolutionary rates of insertion and deletion in noncoding nucleotide sequences of primates. Mol Biol Evol. 1994;11(3):504–12.
Ahmed I, Biggs PJ, Matthews PJ, Collins LJ, Hendy MD, Lockhart PJ. Mutational dynamics of aroid chloroplast genomes. Genome Biol Evol. 2012;4(12):1316–23.
Smith DR. Unparalleled GC content in the plastid DNA of Selaginella. Plant Mol Biol. 2009;71(6):627–39.
Guisinger MM, Kuehl JNV, Boore JL, Jansen RK. Genome-wide analyses of Geraniaceae plastid DNA reveal unprecedented patterns of increased nucleotide substitutions. Proc Natl Acad Sci U S A. 2008;105(47):18424–9.
Matsuoka Y, Yamazaki Y, Ogihara Y, Tsunewaki K. Whole chloroplast genome comparison of rice, maize, and wheat: implications for chloroplast gene diversification and phylogeny of cereals. Mol Biol Evol. 2002;19(12):2084–91.
Logacheva M, Penin A, Samigullin T, Vallejo-Roman C, Antonov A. Phylogeny of flowering plants by the chloroplast genome sequences: in search of a “lucky gene”. Biochem Mosc. 2007;72(12):1324–30.
Li YC, Korol AB, Fahima T, Beiles A, Nevo E. Microsatellites: genomic distribution, putative functions and mutational mechanisms: a review. Mol Ecol. 2002;11(12):2453–65.
Bhargava A, Fuentes FF. Mutational dynamics of microsatellites. Mol Biotechnol. 2010;44(3):250–66.
Keightley PD, Trivedi U, Thomson M, Oliver F, Kumar S, Blaxter ML. Analysis of the genome sequences of three Drosophila melanogaster spontaneous mutation accumulation lines. Genome Res. 2009;19(7):1195–201.
Xu S, Schaack S, Seyfert A, Choi EJ, Lynch M, Cristescu ME. High mutation rates in the mitochondrial genomes of Daphnia pulex. Mol Biol Evol. 2012;29(2):763–9.
Yin D, Wang Y, Zhang X, Ma X, He X, Zhang J. Development of chloroplast genome resources for peanut (Arachis hypogaea L.) and other species of Arachis. Sci Rep. 2017;7(1):11649.
Cao J, Jiang D, Zhao Z, Yuan S, Zhang Y, Zhang T, Zhong W, Yuan Q, Huang L. Development of chloroplast genomic resources in Chinese yam (Dioscorea polystachya). Biomed Res Int. 2018;2018:6293847.
Curci PL, De Paola D, Sonnante G. Development of chloroplast genomic resources for Cynara. Mol Ecol Resour. 2016;16(2):562–73.
Ochoterena H. Homology in coding and non-coding DNA sequences: a parsimony perspective. Plant Syst Evol. 2009;282(3–4):151–68.
Ingvarsson PK, Ribstein S, Taylor DR. Molecular evolution of insertions and deletion in the chloroplast genome of silene. Mol Biol Evol. 2003;20(11):1737–40.
Hamilton MB, Braverman JM, Soria-Hernanz DF. Patterns and relative rates of nucleotide and insertion/deletion evolution at six chloroplast intergenic regions in new world species of the Lecythidaceae. Mol Biol Evol. 2003;20(10):1710–21.
Amanda R, Li Z, Van de Peer Y, Ingvarsson PK. Contrasting rates of molecular evolution and patterns of selection among gymnosperms and flowering plants. Mol Biol Evol. 2017;34(6):1363–77.
Smith SA, Donoghue MJ. Rates of molecular evolution are linked to life history in flowering plants. Science. 2008;322(5898):86–9.
Sloan DB, Taylor DR. Evolutionary rate variation in organelle genomes: the role of mutational processes. In: Organelle Genetics. Springer Berlin Heidelberg; 2012. p. 123–46.
Bromham L, Cowman PF, Lanfear R. Parasitic plants have increased rates of molecular evolution across all three genomes. BMC Evol Biol. 2013;13(1):126.
Dong W, Liu J, Yu J, Wang L, Zhou S. Highly variable chloroplast markers for evaluating plant phylogeny at low taxonomic levels and for DNA barcoding. PLoS One. 2012;7(4):e35071.
Dong W, Xu C, Li C, Sun J, Zuo Y, Shi S, Cheng T, Guo J, Zhou S. ycf1, the most promising plastid DNA barcode of land plants. Sci Rep. 2015;5:8348.
Xu C, Dong W, Li W, Lu Y, Xie X, Jin X, Shi J, He K, Suo Z. Comparative analysis of six Lagerstroemia complete chloroplast genomes. Front Plant Sci. 2017;8(15):15.
Weglohner W, Subramanian AR. Nucleotide sequence of maize chloroplast rpl32: completing the apparent set of plastid ribosomal protein genes and their tentative operon organization. Plant Mol Biol. 1993;21(3):543–8.
Ueda M, Fujimoto M, Arimura S, Murata J, Tsutsumi N, Kadowaki K. Loss of the rpl32 gene from the chloroplast genome and subsequent acquisition of a preexisting transit peptide within the nuclear gene in Populus. Gene. 2007;402(1–2):51–6.
Jansen RK, Saski C, Lee SB, Hansen AK, Daniell H. Complete plastid genome sequences of three Rosids (Castanea, Prunus, Theobroma): evidence for at least two independent transfers of rpl22 to the nucleus. Mol Biol Evol. 2011;28(1):835–47.
Park S, Jansen RK, Park S. Complete plastome sequence of Thalictrum coreanum (Ranunculaceae) and transfer of the rpl32 gene to the nucleus in the ancestor of the subfamily Thalictroideae. BMC Plant Biol. 2015;15:40.
Weng M-L, Ruhlman TA, Jansen RK. Plastid-nuclear interaction and accelerated coevolution in plastid ribosomal genes in Geraniaceae. Genome Biol Evol. 2016;8(6):1824–38.
McDonald MJ, Wang WC, Huang HD, Leu JY. Clusters of nucleotide substitutions and insertion/deletion mutations are associated with repeat sequences. PLoS Biol. 2011;9(6):e1000622.
Tian DC, Wang Q, Zhang PF, Araki H, Yang SH, Kreitman M, Nagylaki T, Hudson R, Bergelson J, Chen JQ. Single-nucleotide mutation rate increases close to insertions/deletions in eukaryotes. Nature. 2008;455(7209):105–U170.
This work was supported by The Special Fund for Strategic Pilot Technology of Chinese Academy of Sciences (XDA23080204) and Strategic Biological Resources Network Service of Chinese Academy of Sciences.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
. Sample accession numbers of the four species in Calycanthaceae analyzed in this study and their localities with voucher information.
List of specific primers used to amplify and sequence the chloroplast genomes of Calycanthaceae.
List of taxa used for estimating the divergence time of the Calycanthaceae.
. List of genes found in the chloroplast genomes of Calycanthaceae.
Chloroplast genome map of Chimonanthus praecox, a representative of all four newly determined chloroplast genomes. The genes inside and outside the circle are transcribed clockwise and counterclockwise, respectively. Genes that belong to different functional groups are represented by different colors. The thick lines indicate the extent of the inverted repeats (IRa and IRb) that separate the genomes into small (SSC) and large (LSC) single-copy regions. The flower in the center is of Chimonanthus praecox.
. Detailed view of the border regions between the inverted repeats and the single-copy regions of the Calycanthaceae chloroplast genomes. The figure is not to scale.
The codon usage in the Calycanthaceae chloroplast genomes.
Divergence times of the crown groups estimated using BEAST version 1.6.1 under the uncorrelated lognormal (UCLN) model based on 83 chloroplast genes of 26 taxa. Numbers on the nodes are the estimated medium ages (Mya).
About this article
Cite this article
Dong, W., Xu, C., Wen, J. et al. Evolutionary directions of single nucleotide substitutions and structural mutations in the chloroplast genomes of the family Calycanthaceae. BMC Evol Biol 20, 96 (2020). https://doi.org/10.1186/s12862-020-01661-0