Exploring evolution characteristic between cultivated tea and its wild relatives using complete chloroplast genomes

Background: The cultivated tea is one of the most important economic and ecological trees distributing worldwide. The cultivated tea suffered from long-term targeted selection of traits and overexploitation of habitats by human beings, which may change the genetic structure. Chloroplast is an organelle with a conserved cyclic structure, and can help us better understand the evolutionary relationship of Camellia plants. Results: The study conducted comparative analysis and evolution analysis between cultivated tea and wild tea, and detected the evolution characteristic in cultivated tea. Chloroplast genome sizes of cultivated tea were slightly different, ranged from 157,025 bp to 157,085 bp. These cultivated species were more conservative than wild species, in terms of the genome length, genes number, genes arrangement and GC contents. However, the IRs length of cultivated species was about 20 bp shorter than that of C. sinensis var. sisnensis. We also found that the nucleotide diversity of 14 sequences in cultivated tea was higher than that of wild tea. These results provided the evidence on the variation of chloroplast genomes of cultivated tea. Detail analysis on the chloroplast genome variation and evolution of cultivated tea showed that 67 SNPs and 46 indels and 16 protein coding genes had nucleotide substitutions. The most comment variation gene was ycf1. It has the largest number of nucleotide substitutions. At the same time, in ycf1, ve amino acid sites were exhibiting site-specic selection, and a 9 bp sequence insertion was found in the C. sinensis cultivar. Anhua. The phylogenetic tree constructed by ycf1 sequence shows that two cultivated tea were not completely clustered, and the evolutionary relationship between C. sinensis var. sisnensis and C. sinensis cultivar. Longjing is closer than that of C. sinensis cultivar. Anhua. Conclusions: The cultivated species were more conservative than wild species in terms of architecture and linear sequence order. The variation of chloroplast genome of cultivated tea was mainly manifested in the nucleotide polymorphism of some sequences. The ycf1 gene played an important role in the adaptive evolution of cultivated tea. These results provided evidence regarding the inuence of human activities on tea. , including CSVS, CSCL, CSCA, CSVA, CGR, CPU and CSVP, and clade (cid:0) , including CIM and CTA. The clade I was strongly supported, because the posterior probabilities or bootstrap values obtained by the neighbor-joining (NJ), the maximum parsimony (MP), the Bayesian inference (BI) and the maximum likelihood (ML) were very high for each lineage. These results suggested that the evolution direction of seven species in the clade I was the same, but not the same as two species in the clade II. All phylogenetic trees proved that CSVS was the closest relative species to CSCA and CSCL, and the three species were in the same branch. In particular, in ycf1-Tree, the posterior probabilities or bootstrap values of CSVS, CSCL and CSCL were lower than that of complete cp-Tree and SCDS-Tree. And the value of CSCL was less than 50%. These results suggested that the ycf1 gene has evolved in cultivated tea.

The abundant cultivars germplasm resources are the fundamental condition for the dark tea products quality. The quality of dark tea could be more related to cultivar of CSVS and geographical tea are formed [15].
Chloroplast (cp) genome is often used to analyze evolution process and phylogenetics status because of its high conservation and relative compact gene alignment. Moreover, cp genome sequences are useful in the identi cation of closely related, breeding-compatible plant species [16]. Although the cp genome is very useful, there are still a very limited number of cp genomes available for the Camellia species so far. Until now, there are only sixteen Camellia complete cp genomes which has been sequenced, including 2 cultivated species and 14 wild species [7,14,[17][18][19][20][21]. It has been proved that human interference has effects on the genetic structure, leaf nutrients and pollen morphology of Camellia [22][23][24]. Due to human overexploitation of habitats and long-term targeted selection of traits, the genetic diversity of Camellia germplasm resources have been signi cantly reduced [25]. Nevertheless, it remains unclear what the evolution mechanism of the cp genome of arti cially selected cultivated Camellia is.
Current researches always ignore material differences between cultivated and wild species. Basing on our sequencing complete chloroplast genome of CSCA (MH042531) data, we want to explore evolution characteristic between cultivated tea and its wild relatives [14]. To assess the variations of the chloroplast genome in wild and cultivated species of Camellia, and detect the evolution characteristic in cultivated tea, we selected earlier published Camellia chloroplast genomes and did comparative and evolutionary analysis. It can help us to better understand the structure of Camellia chloroplast genomes and the phylogenetic relationships among species, and provide more information about the in uence of human activities on tea.
We believe that this research can make other researchers pay more attention to the source of the tea tree.

Comparison of chloroplast genomes between cultivated tea and wild tea
Chloroplast genomic similarity We used CSVS as a reference sequence and compared it with cpDNA genomes of two cultivated species and 13 wild relatives (Table 1). Compared with the genome length of CSVS, the average length variation of cultivated species and wild species are 62 bp and 186 bp respectively. Three of the wild species, named C. pitardii, C. cuspidata, C. yunnanensis, have a 500 bp length variation compared with CSVS. Similarly, the number of genes and GC content of cultivated species were more stable than that of wild species. Comparing the gene and intron deletion between cultivated and wild species, the results showed that the introns of rps12 gene were deleted in all cultivated species and some wild species. The orf42 and ycf1 genes of some wild species were deleted, but these two genes are common in cultivated species. The differences of GC content of intron and IGS in cultivated species were about 0.01% to 0.03%, but we found that the differences of intron and IGS in wild species were 0% to 1.5% and 0% to 0.46%, respectively. mVISTA and BRIG were used to compare the genomic sequence identity. The results showed that the cp genomes of these 16 species were highly similar at the genomic level, indicating that their genomes are relatively conservative. The regions with relatively low identity were atpH_atpI, trnE-UCC_trnT-GGU, psaA_ycf3, ycf15_trnL-CAA, ycf1_ndhF and ndhG_ndhI (Figs. 2 & 3). In conclusion, at the genomic level, cultivated species were more conservative than wild species.
The expansion and contraction of IR regions The locations of IRs regions were extracted via a self-BLASTN search, and the characteristics of IR/LSC and IR/SSC boundary regions were analyzed. The IRs boundary regions of the 16 complete Camellia cp genomes were compared, showing slight differences in junction positions (Fig. 4). In order to detect possible IR border polymorphism, rst of all, we compared four IR-SC boundaries among cultivated tea and CSVS cp genome, but no difference was found at the LSC-IRb, IRa-LSC border. At the IRb-SSC, SSC-IRa border, only minor differences were discovered. Then, we compared the cp genome boundaries of wild tea and CSVS. The rps19 gene at the LSC-IRb boundary expanded 52 bp from the LSC region to the IRb side in C. sinensis var. pubilimba, while it stopped at the 46 bp from the LSC region in the rest of the species. On the other side of the IRa-LSC boundary, the lengths of the spacers between the IRa-LSC junction and the rpl2 gene (in IRa) were 112 bp for C. sinensis var. pubilimba while those of the rest species were all 106 bp. Consistently, in all of the comparative cp genomes, the ycf1 gene spanned the SSC-IRb region and the length of ycf1 were from 963bp to 1069bp in IRa. Remarkably, most species have created the ycf1 pseudogene at the IRa-LSC junction while it was not observed in C. sinensis var. assamica, C. taliensis, C. impressinervis, C. pitardii, C. crapnelliana, C. cuspidate, C. yunnanensis. Like most plants, the ndhF gene involved in photosynthesis was located in the SSC region. However, ndhF gene was located at the IRb-SSC boundary of C. reticulate, and there was a 35 bp overlap between ndhF and ψycf1.

Nucleotide diversity
Comparison based on the Pi values of the sixteen species' cp genomes were presented, including the intergeneric regions (IGS), protein-coding genes and introns (Table S1, Fig. 5). The results showed that the Pi values of the genes, introns and IGS in wild species were higher than cultivated species. The average values for the genes, introns and IGS in wild species were about 16.6, 3.5 and 9.1 times to that of cultivated species.
The genes and IGS in wild species with higher values, such as psbB and psbD, while Pi values were 0 in cultivated species. These results suggest that these genes and IGS were more conserved in cultivated species than in wild species.
The Pi values of genes and IGS, such as psbB, psbD, trnI-CAU_ycf2, trnI-GAU_rrn16, and trnI-CAU_rpl23, were higher of wild species, but they were 0 in cultivated species. Furthermore, except for ndhD, ndhF, ndhH and psbC, the Pi values of the photosynthetic genes of cultivated tea were 0. The Pi values of these genes were smaller than that of wild species. These results indicate that these genes and IGS were more conserved of cultivated species than of wild species. Although the Pi values of cultivated species were less, we still found that the Pi values of rps16, rps4, trnL-UAA_intron, rps4_trnT-UGU, ndhC_trnV-UAC, cemA_petA, rpl33_rps18, psbN_psbH, rpl36_infA, rpl14_rpl16, rps7_rps12, ndhG_ndhI, trnV-GAC_rps12, and rps12_rps7 in cultivated species were higher than that in wild species, and these difference sequences were mainly located in LSC region (Fig. 5).

Phylogenetic analysis of cultivated tea and wild tea
We constructed three phylogenetic trees of cultivated and wild tea, namely, the complete cp genomic tree (complete cp-Tree), all shared coding protein genes among all species tree (SCDS-Tree) and the ycf1 gene tree (ycf1-Tree) ( Fig. 6-8). All phylogenetic trees strongly supported that Thea subgenus could be divided into two clades: clade , including CSVS, CSCL, CSCA, CSVA, CGR, CPU and CSVP, and clade , including CIM and CTA. The clade I was strongly supported, because the posterior probabilities or bootstrap values obtained by the neighbor-joining (NJ), the maximum parsimony (MP), the Bayesian inference (BI) and the maximum likelihood (ML) were very high for each lineage. These results suggested that the evolution direction of seven species in the clade I was the same, but not the same as two species in the clade II. All phylogenetic trees proved that CSVS was the closest relative species to CSCA and CSCL, and the three species were in the same branch. In particular, in ycf1-Tree, the posterior probabilities or bootstrap values of CSVS, CSCL and CSCL were lower than that of complete cp-Tree and SCDS-Tree. And the value of CSCL was less than 50%. These results suggested that the ycf1 gene has evolved in cultivated tea.
However, we found con ict in three trees ( Fig. 6-8). The topological structures consisted of Protocamella subgenus (CPE and CYU), Camellia subgenus (CPI, CRE, CAZ and CCR), Metacamellia subgenus (CCU) and Thea subgenus (CIM and CTA) were poorly supported by the complete cp-Tree, SCDS-Tree and ycf1-Tree. Because most bootstrap values or posterior probabilities were less than 50% for each lineage. These results may be caused by unbalanced sampling.
The complete cp-Tree showed some structural variations in Camellia cp genomes (Fig. 6). The clade, which made up of CSVS, CSCL, CSCA, CSVA, CGR, CPU, CSVP and CPE, was reported the loss of rps12 intron, the pseudo ycf1 gene, and the pseudo ycf2 gene (except for CSVA). The other species, except for CRE and CAZ, was reported the loss of ycf1 gene and orf42 gene.

Chloroplast genome variation and evolution in cultivated tea
In order to explain the changes of cp genome structure of cultivated tea group, we took CSVS, CSCA and CSCL as objects to research SNPs and indels in the cp genome of cultivated tea. After comparing the whole cp genome of three species, 67 SNPs and 46 indels were found. The LSC, IRb, SSC and IRa regions contain 43, 3, 13, 8 SNPs and 37, 2, 5, 2 indels, respectively (Table S2). Most of SNPs and indels were located in the non-coding protein region (IGS and intron). There were 39 SNPs and 41 Indels in this region, while 28 SNPs and 5 Indels are found in the coding protein region. The two ycf1 genes, which located in the junction of SSC and IRa, contained the most SNPs and indels, with the number of 6 and 2 respectively.
For photosynthetic genes, psbC, ndhD, ndhF and ndhH presented SNPs variations, while the psbI gene presented indel variation. For the 14 sequences with higher Pi value in cultivated species than in wild species, trnV-GAC_rps12 and ndhG_ndhI contained the most abundant SNPs, with 5 and 2 respectively (Fig. 5).
In order to have a clear view of the evolution of cultivated species, we used their 80 shared coding protein genes to calculated Ka, Ks and Ka/Ks ratio. The results showed that only 16 coding protein genes had synonymous or non-synonymous mutations ( Fig. 9, Table S3). Among them, there are non-synonymous mutations in matK, rps16, rpoC2, rpoB, accD, clpP, rps8, ycf1, ndhD, ndhH and rps15. The genes with the highest rate of non-synonymous mutations are rps16, rps8 and rps15. There are synonymous mutations in rpoB, psbC, rps4, ycf4, rpoA and ndhF. The highest mutation rates are rps4, ycf4 and rpoA. Of the 80 genes, 79 have a Ka / Ks value of 0, and only rpoB, has a Ka/Ks value of 0.3004< 0.5, suggesting very strong purifying selective pressure.
The site speci c selection events of 16 genes with synonymous or non-synonymous mutations were analyzed by Bayesian Empirical Bayes (BEB), and found some amino acid sites of ycf1 and rps15 exhibited site-speci c selection (Table S4). In ycf1, there were six sites under position selection, and in rps15, there was one site under position selection. For example, in rps15 gene, the codon ACC (threonine) of CSVS mutated to AAC (asparagine) of two cultivated species.

Discussion
Understanding the genetic variation between cultivated and wild species is crucial for introducing the interest traits from wild species into cultivars [26]. Organelle genome sequencing is proven to be an effective way to resolve phylogenetic relationships among closely related species [27,28]. Here, we constructed and compared the complete cpDNA genome sequences of two cultivars and fourteen wild species of Camellia. At the genomic level, cultivated species were more conservative than wild species, in terms of architecture and linear sequence order (the length, genes number, genes arrangement, and GC contents) ( . As for other land plant species, such as peanuts, cherries and radishes, the cp genome size and structure, as well as gene content and order are highly conserved among the cultivated and wild species acquired cp genomes [29][30][31]. We found that the IR regions of two kinds of cultivated tea expanded or contracted. The IRs length of the two cultivated species was about 20 bp shorter than that of CSVS, accounting for 32% of the difference in the complete genome length (Fig.  4). In fact, the contraction and expansion of IRs is considered to be one of the important reasons for cp genome length variation [32]. Further analysis showed that in CSCA, the sum of the length of two ycf1 genes located in the boundary region of IRb/SSC and SSC/IRa, was 18 bp less than that of CSVS and CSCL (Fig. 4). This is mainly because a 9 bp sequence (TCCTTCTTC/GAAGAAGAAGGA) was inserted into the ycf1 gene, based on the results of SNPs and Indels analysis (Table  S2). It is suggested that ycf1 is one of the important reasons for the expansion or contraction of the IRs of cultivated species. The same results were also found in Zheng's study [33]. He analyzed the cp genome length variation in 272 species and found that atpA, accD and ycf1 accounted for 13% of the difference in length. Therefore, ycf1, which is associated with plant survival, may play a key role in the cp genome size variations of cultivated tea.
In addition to the variations in genome size, there are also some nucleotide mutations in the cultivated species. In this study, the nucleotide diversity of cultivated tea was lower than that of wild tea (Fig. 5), the unbalanced sampling between the wild tea (14) and cultivated tea (2) may lead to the nucleotide diversity difference of cpDNA fragments. The nucleotide diversity comparison of 358 cultivated rice and 54 wild rice also presented similar results [34]. Nevertheless, we also found that the nucleotide diversity of 14 sequences in cultivated tea was higher than that of wild tea (rps16, rps4, trnL-UAA_intron, rps4_trnT-UGU, ndhC_trnV-UAC, cemA_petA, rpl33_rps18, psbN_psbH, rpl36_infA, rpl14_rpl16, rps7_rps12, ndhG_ndhI, trnV-GAC_rps12, and rps12_rps7) (Fig. 5). These sequences proved the variations in cp genomes of cultivated tea, and they were potential molecular markers for distinguishing Camellia species and for the phylogenetic analysis of Camellia.
Previous studies have proved that human interference has effects on the genetic structure, leaf nutrients and pollen morphology of Camellia. Yan et al. analyzed the genetic relationship of ve semi wild tea which were lack of human management for a long time was studied by using genome-wide SNP. He nd that human interference will affect the genetic structure of tea. After the human interference stopped, the tea from ve different geographical regions could be divided into three different groups because of the absence of free pollination [22]. Xiong et al. make comparative analyses of the nutrient content in leaves in cultivated and wild C. nitidissima. He nd that cultivated C. nitidissima had signi cantly higher contents of essential amino acids (26.05%) and total amino acids (33.27%) than that of wild C. nitidissima [23]. Shu et al. proved that there are obvious differences in pollen morphology and exine morphology between cultivated and wild species of Camellia [24]. Therefore, to explore speci c evolution characteristics between cultivated tea and its wild relatives, we subsequently did evolution research on cultivated tea.
First, to have a clear view of the cp genomic adaptative evolution of cultivated tea, we did evolution analysis on proteincoding sequence. The Ka/Ks ratio is very useful for measuring selective pressure at the protein level [35]. In this study, Ka/Ks value of 79 genes are 0, and the only rpoB is 0.3004. Besides, some amino acid sites of ycf1 and rps15 exhibiting sitespeci c selection (Table S3 & S4). rpoB were crucial for genetic information transmission, which affect transcription of DNA into RNA and translation of RNA to protein. They were also under selective pressure in beverage crops [36]. rps15 gene have function in chloroplast ribosome subunits [35]. ycf1, encoding a component of the chloroplast's inner envelope membrane protein translocon, is one of the largest plastid genes [36], and is also essential for almost all plant lineages [37]. These positively selected genes may have played key roles in the adaptation of cultivated tea to various environments. Generally, the deletion or insertion of amino acids and bases in the coding protein gene will affect the structure and function of this gene [38][39][40]. In our study, 16 protein coding genes have nucleotide substitutions, among which ycf1 gene has the largest number of nucleotide substitution. At the same time, in ycf1, ve amino acid sites were exhibiting site-speci c selection, and a 9 bp sequence insertion was found in the CSCA (Table S2- S4, Fig. 9). Therefore, we hypothesized that the ycf1 gene played an important role in the adaptive evolution of cultivated tea.
ycf1 is an open reading framework of unknown function, but some studies infer that ycf1 is very important for plant survival [33,41]. In tobacco, a chimeric add a gene conferring resistance to aminoglycoside antibiotics is transferred into ycf1 in its cp genome. Then, it is cultured in the plant regeneration medium containing the presence of the antibiotic spectinomycin. After that, the maintenance of a fairly constant ratio of wild-type versus transformed genome copies is found. However, the wildtype genome is still present in all samples whereas the transplastomic fragments are missing from several samples after culturing in antibiotic-free medium. This experiment proved that ycf1 encode products that are essential for cell survival. ycf1 is also an important molecular marker of plants [42,43]. Because it has higher variability than other known cp molecular markers (such as widely used rbcL and matk genes), in total number of parsimony informative characters and in percent variability.
Phylogenetic analysis of cultivated and wild tea showed that two cultivar tea were closely related to the CSVS (Figs. 6 & 7), which supports the previous nding that most of the tea originated directly from CSVS [13]. But in ycf1-Tree, the posterior probabilities or bootstrap values of CSVS, CSCL and CSCL were lower than that of complete cp-Tree and SCDS-Tree, which suggested that the ycf1 gene has evolved in cultivated tea (Figs. [6][7][8]. Similar results have been found in Corylus [44]. The ycf1 gene of Corylus chinensis and Corylus avellana has similar evolutionary history, which is different from that of Corylus heterophylla. This evolution of cultivated plants may be related to the utilization e ciency of photosynthesis. Because of photosystem biogenesis regulator 1 (PBR1), the RNA binding protein encoded by the nuclear genome, can improve the translation e ciency of ycf1 in Arabidopsis thaliana cp genome. Then, the symbiosis and stability maintenance of the three photosynthetic complexes were regulated [45]. However, at present, the effect of the mutation in the single amino acid site and insertion or deletion of the short sequence on the function of ycf1 is still not clear, and cultivated tea may be ideal materials for this kind of research.

Conclusion
In this work, the complete cp genomes of two cultivated and 14 wild species of Camellia were selected. Genomic variation and evolution press were compared in these species. Our results shown that the cultivated species were more conservative than wild species in terms of architecture and linear sequence order. The variation of chloroplast genome of cultivated tea was mainly manifested in the nucleotide polymorphism of some sequences. This nucleotide polymorphism also led to the mutation of amino acid sites in some genes, among which ycf1 was the gene with the most mutation sites. In addition to amino acid mutation, there was a 9 bp base insertion in the ycf1 gene. ycf1 is believed to be a critical gene for plant survival, and might in uence photosynthesis and is related to plant adaptation. In cultivated tea, two cultivated tea are not completely clustered, and the evolutionary relationship between CSVS and CSCL was closer than that of CSCA. However, at present, the effect of the mutation in the single amino acid site and insertion or deletion of the short sequence on the function of ycf1 was still not clear, and cultivated tea may be ideal materials for this kind of research.

Genomic materials collection of cultivated tea
The complete cp genome of CSCA has been presented and annotated in our previous study [14] with GenBank accession number MH042531. Meanwhile, we searched in National Center for Biotechnology Information (NCBI) dataset to nd published cultivated tea's complete cp genome, only CSCL with accession number KF562708 have been published [46]. Gene map of cultivated tea was generated using OGDRAW v1.2 [47].

Comparative analysis between cultivated tea and wild tea
The Basic Local Alignment Search Tool (BLAST) was used to nd similarity cp genomes of CSCA in NCBI. After screening the cp genome of Camellia, there were 16 Camellia cp genomes with sampling information, including 2 cultivated species (only CSCA and CSCL) and 14 wild species (Table 2). Previous studies have shown that both CSCA and CSCL originated directly from CSVS [13,46]. Therefore, we used CSVS as the reference sequence to study genomic variations and evolution direction between cultivated tea and wild tea.
The comparison of cp genomes was based on the method of Li [48] using mVista in Shu e-LAGAN mode and Blast Ring Image Generator (BRIG), respectively. CSVS was set as the reference sequence.
We annotated the IRs boundary of Camellia cp genomes by Plastid Genome Annotator (PGA) [49]. The location of their IRs was extracted, and then the expansion and contraction of IR regions were visualized by using the Visio professional 2016.
The comparison of nucleotide diversity (Pi) was nished according to the method of Njuguna [50]. First, we manually extracted 211 loci shared among 16 Camellia species, including 80 coding protein genes, 117 intergenic regions (IGS), and 14 intron regions. After multiple alignments, a sliding window analysis was conducted to compare the nuclear diversity among the cp genomes using DnaSP v6.10.04 [51]. The window length was 600 bp with a 200 bp step size.

Phylogenetic analysis of Camellia
Three datasets were used to construct the following phylogenetic trees for 14 wild species and two cultivated species in the Camellia: (I) the complete cp genomes, (II) the all shared coding protein genes among all species (SCDS), ( ) the ycf1 gene sequences. All datasets were aligned using MAFFT v7.380 [52] under the FFT-NS-2 default setting. The alignments were used for phylogenetic analysis. According to the method described by Xie et al. [53] and Zhang et al. [54], we used four methods to construct phylogenetic trees, that were the NJ method, the MP method, the BI method and the ML method. Coffea canephora and Coffea arabica were selected as the outgroup.
The NJ analysis was reconstructed via MEGA7.0 [55] under the default settings with 1000 bootstrap values. The MP analysis was performed in PAUP 4.0a167 [56] with heuristic searches with 1000 bootstrap replicates. The BI analysis was performed with Mrbayes 3.2.7 [57] under the best substitution models and parameters. The analysis parameters were set as four chains that were run simultaneously for 10,000,000 generations or until the average standard deviation of the split frequencies fell below 0.01. The best substitution models and parameters were computed by jmodeltest 2.1.7 [58]. The ML analysis was carried out in IQ-TREE [59] using the default settings, with 1000 bootstrap values for tree evaluation. The best substitution models were computed by IQ-TREE.

Evolution research on cultivated tea
The number and position of single nucleotide polymorphisms (SNPs) and insertion/deletions (Indels) in the genomes of CSVS, CSCA and CSCL were extracted in DnaSP v6. 10.04 according to the Wu's method [60].

Availability of data and materials
Raw sequences data of CSCA were submitted to National Center for Biotechnology Information (NCBI) database with accession number MH042531. The other genomic data mentioned in articles also can be accessed from NCBI.
Ethics approval and consent to participate Not applicable.

Figure 3
Alignment visualization of the sixteen Camellia chloroplast genome sequences using C.sinensis var. sinensis as reference. Vertical scale indicates the percentage of identity, ranging from 50% to 100%. Arrows indicate the annotated genes and their transcriptional direction. The different colored boxes correspond to exons, tRNA or rRNA, and noncoding sequences (CNSs).

Figure 4
Comparison of IRs boundary regions among the 16 Camellia chloroplast genomes, using C. sinensis var. sisnensis as the reference. Boxes above or below the line are forward strands and reverse strands, respectively.

Figure 5
Comparative analysis of nucleotide variability (Pi) values between the cultivated tea and wild tea cp genome sequences. Xaxis: position of the midpoint of a window, Y-axis: nucleotide diversity of each window.

Figure 6
The phylogenetic tree of Camellia species based on the complete cp genomes (complete cp-Tree). Coffea canephora and Coffea arabica were selected as the outgroup. Tree constructed by the neighbor-joining (NJ), the maximum parsimony (MP), the Bayesian inference (BI) and the maximum likelihood (ML) with bootstrap values or posterior probabilities above the branches, respectively. The bootstrap values less than 50% were represented by "-".

Figure 7
The phylogenetic tree of Camellia species based on the all shared coding protein genes among all species (SCDS-Tree). Coffea canephora and Coffea arabica were selected as the outgroup. Tree constructed by the neighbor-joining (NJ), the maximum parsimony (MP), the Bayesian inference (BI) and the maximum likelihood (ML) with bootstrap values or posterior probabilities above the branches, respectively. The bootstrap values less than 50% were represented by "-".

Figure 8
The phylogenetic tree of Camellia species based on the ycf1 gene (ycf1-Tree). Coffea canephora and Coffea arabica were selected as the outgroup. Tree constructed by the neighbor-joining (NJ), the maximum parsimony (MP), the Bayesian inference (BI) and the maximum likelihood (ML) with bootstrap values or posterior probabilities above the branches, respectively. The bootstrap values less than 50% were represented by "-".