Exceptionally high rates of positive selection on the rbcL gene in the genus Ilex (Aquifoliaceae)
BMC Evolutionary Biology volume 19, Article number: 192 (2019)
The genus Ilex (Aquifoliaceae) has a near-cosmopolitan distribution in mesic habitats from tropical to temperate lowlands and in alpine forests. It has a high rate of hybridization and plastid capture, and comprises four geographically structured plastid groups. A previous study showed that the plastid rbcL gene, coding for the large subunit of Rubisco, has a particularly high rate of non-synonymous substitutions in Ilex, when compared with other plant lineages. This suggests a strong positive selection on rbcL, involved in yet unknown adaptations. We therefore investigated positive selection on rbcL in 240 Ilex sequences from across the global range.
The rbcL gene shows a much higher rate of positive selection in Ilex than in any other plant lineage studied so far (> 3000 species) by tests in both PAML and SLR. Most positively selected residues are on the surface of the folded large subunit, suggesting interaction with other subunits and associated chaperones, and coevolution between positively selected residues is prevalent, indicating compensatory mutations to recover molecular stability. Coevolution between positively selected sites to restore global stability is common.
This study has confirmed the predicted high incidence of positively selected residues in rbcL in Ilex, and shown that this is higher than in any other plant lineage studied so far. The causes and consequences of this high incidence are unclear, but it is probably associated with the similarly high incidence of hybridization and introgression in Ilex, even between distantly related lineages, resulting in large cytonuclear discordance in the phylogenies.
The genus Ilex (hollies) constitutes the monogeneric family Aquifoliaceae in the campanulid subclade of the asterids . Ilex species are deciduous or evergreen dioecious trees or shrubs of mesic temperate, subtropical, tropical, and tropical-montane climates. At least 644 species have been described. Ilex has a near-cosmopolitan distribution, with the highest diversities in East-Asia, and in Central and South America, moderate numbers of species in Southeast Asia and North America, and a few in Africa, Australia, and Europe, and on most remote volcanic islands in the Atlantic and Pacific, including the Canary Islands, Madeira, Azores, Caribbean, Polynesia, Hawaii, New Caledonia, and Fiji . The lineage is old, dating from the end of Cretaceous [36, 38], with the oldest accepted fossil record of 69 Million Years Ago (MYA, ). However, there is a large gap between this date and the estimated Miocene age of the most recent common ancestor of all extant Ilex, based on plastid and nuclear markers . Many pre-Miocene fossils have been attributed to Ilex (; ), suggesting extensive lineage extinctions, so that the extant genus is now phylogenetically isolated.
Abundant production of small, fleshy fruits, consumed primarily by birds in most species, can account for its wide dispersal, but the exceptional adaptive capacity in this morphologically relatively uniform genus is surprising. There is evidence for a high frequency of hybridization and introgression in Ilex, leading to a reticulate pattern of evolution. Phylogenetic trees obtained from nuclear and plastid data are very different  and Shi et al.  suggest that many of the 150–200 narrow-range species recognized in China may be a result of hybridization, which may also be true elsewhere. Widespread hybridization is supported by anatomical and morphological studies , and documented by molecular phylogenies [36,37,38, 47, 49].
A feature of the genus Ilex is that the rbcL gene, encoding the large subunit of Rubisco and a widely-used plastid sequence in phylogeny, appears to have a particularly high rate of non-synonymous substitutions compared with other genera . This high rate of non-synonymous substitution suggests a positive selection on the plastid-encoded Rubisco large subunit in Ilex. Rubisco catalyzes the first steps of photosynthetic fixation of carbon and of photorespiration in all photosynthetic organisms (see [4, 10, 44], for recent reviews). The most prevalent form of Rubisco (form I) is an hexadecamer of 8 catalytic large subunits (L, harboring the active site) encoded by the rbcL gene in the plastome, and 8 regulatory small subunits (S), encoded in the nucleosome by a small family of rbcS genes . The assembly and enzymatic function of Rubisco are, in part, guaranteed by specific interactions of the L subunit with the S subunits, assisted by several nuclear-encoded specific chaperones [22, 23].
Several studies provide evidence that positive selection on particular amino-acid residues of the Rubisco large subunit is involved in plant adaptation to various environmental stresses. This was proposed for the endemic Hawaiian genus Schiedea adapting to dry or wet habitats , in the heterophyllous aquatic plant Potamogeton , in switches from C3 to C4 photosynthesis [7, 30], in movements from dry to wet habitats in the Balearic Islands , in adaptation of Cardamine resedifolia to high altitudes , and associated with leaf traits and climate characteristics in the genus Quercus . Moreover, studies based on a large sample of rbcL sequences show that positive selection is widespread in most lineages of land plants [21, 29].
A consequence of rbcL positive selection is a frequently observed pattern of coevolution between positively selected sites. Indeed, the multiple roles of amino acids in enzymes (folding, stability, enzymatic activity) imply that the modification of one parameter can affect other properties. Stability and activity of the enzyme are likely to be negatively correlated, and compensatory mutations are then needed to restore global stability . Thus, when positive selection occurs on a residue, a compensatory co-evolving mutation on another residue of the molecule is likely to be detected. This has been confirmed in land plants [24, 25, 46, 53]. Rubisco has evolved CO2/O2 specificity ratios according to the environmental situation, allowing a perpetual fine-tuning of its performance . Although the plastid rbcL gene sequence is highly conserved, this enzyme has played a role in plant adaptation to changing environments across geological time and continues to do so across recent geographical space.
Another feature of adaptive mutations in rbcL is that during introgression, plastid-encoded maternal L subunits of one species become associated with nuclear-encoded paternal S subunits of another species, potentially creating compatibility problems . Similar compatibility problems could also arise from the numerous nuclear-encoded chaperones intimately linked to Rubisco subunits (; ). The impacts and repair of such incompatibilities on Rubisco function should be most significant in lineages experiencing high rates of hybridization or/and introgression, as it is the case in Ilex .
In this study, we looked for evidence of positive selection on rbcL in the genus Ilex. We first investigate rbcL sequences (truncated after codon position 436) of 116 species from across the global range (“Worldwide dataset”) and then untruncated rbcL sequences of 124 accessions collected in natural habitats in China (“Chinese dataset”).
Evolution of the gene rbcL of Ilex includes all the features of positive selection
Positive selection on the rbcL gene of Ilex
The Worldwide dataset of 116 rbcL sequences (see Materials and Methods and Additional file 1: Table S1) included the first 436 codons. There were 91 variable codons, of which 24 had synonymous substitutions and 67 had non-synonymous substitutions (Fig. 1). In order to speed up the analyses, PAML calculations were conducted on rbcL sequences of the 4 Ilex plastid lineages  separately: the Asian/North-American Group 1 (14 sequences), the Asian/North-American Group 2 (36 sequences), the American Group 3 (31 sequences), and the Eurasian Group 4 (35 sequences). A 5th “20-Ilex” rbcL DNA matrix consisted of 20 selected species of Ilex representing the 4 plastid groups and a wide geographic distribution (see Fig. 3).
Positively selected sites were detected (at a Bayesian posterior probability of 0.95) by the program Codeml of PAML using model M8 . Altogether for the five alignments studied, 32 positively selected residues were detected (Table 1). However, the SLR (Stepwise likelihood Ratio) test only confirmed 17 of these (Table 1). Using PAML, likelihood ratio tests (LRT) were done for each of the 4 alignments representing the 4 previously defined phylogenetic plastid lineages. Likelihood ratio tests of different models (model M2a allowing ω > 1 versus models M1a not allowing ω > 1, as well as model M8 allowing ω > 1 versus models M7 or M8a not allowing ω > 1) indicated that positive selection was highly probable for the 4 plastid groups at a significance level of < 0.001:
Group 1, 2∆l = 42.88 (M2a/M1a), 49.68 (M8/M7), and 40.92 (M8/M8a).
Group 2, 2∆l = 113.94 (M2a/M1a), 111.20 (M8/M7), and 111.04 (M8/M8a).
Group 3, 2∆l = 52.14 (M2a/M1a), 52.16 (M8/M7), and 52.12 (M8/M8a).
Group 4, 2∆l = 175.96 (M2a/M1a), 60.00 (M8/M7), and 42.74 (M8/M8a).
Positively selected sites were mainly found in the Asian/North American Group 2 (25 sites by PAML, 11 confirmed by SLR test), followed by the American Group 3 (12 sites, 2 confirmed), the Asian/North American Group 1 (7 sites, 3 confirmed), and then Eurasian Group 4 (7 sites, 1 confirmed) (Table 1).
A total of 25 positively selected residues were detected by PAML in the rbcL coding sequences of the Chinese dataset of 124 sequences (see materials and methods and Additional file 2: Table S2), six of which were new, of which residues 461 and 470 were unavailable in the Worldwide dataset truncated at residue 436 (Table 2). The SLR test confirmed 9 of these residues as positively selected. The Asian/North-American Group 2 again had the most positively selected residues (Table 2).
Most positively selected sites are located at the surface of the folded rbcL
The 38 positively selected sites in rbcL detected by PAML (32 in the Worldwide dataset and 6 additional ones in the Chinese dataset) were mapped onto the 3-D structure of the modeled Rubisco large subunit of Ilex canariensis (see Materials and Methods). A large proportion of positively selected residues are located at the surface of the folded large subunit, with only 6 residues (53, 84, 196, 221, 317 and 328) being buried (Fig. 2a). Exposed residues could be candidates for interactions with other large subunits (intra-dimer and dimer-dimer interaction), with small subunits, and with Rubisco activase and other chaperones, as suggested by Kapralov and Filatov . Extrapolating from the Spinacia oleracea Rubisco 3D-structure (1RCX, ), potential implications of Ilex positively selected residues in quaternary interactions were examined. Positively selected residues located within 6 Å (the van der Waals contact) of the surface of other subunits were selected using Swiss-Pdb Viewer (http://swissmodel.expasy.org). Some of these (13 of 38) were in van der Waals contact with another large subunit intra-dimer interface, a dimer-dimer interface, and/or with a small subunit (Table 3). Many positively selected residues of the equatorial surface of Rubisco were still visible (and potentially accessible by chaperones) even when the 4 interacting large subunits (B, E, R and V) and the 3 interacting small subunits (s, f and w) were represented in contact with the large L subunit (Fig. 2b). Positively selected residues 219, 225, 226, 429, and 461 of the large subunits were in contact with the small subunits (Table 3). The same was true for the numerous nuclear-encoded Rubisco chaperones that might interact with the many exposed positively selected residues on the equatorial surface of Rubisco (15, 19, 21, 23, 28, 30, 84, 86, 91, 95, 97, 340, 353, 354 and 470, see Fig. 2b), that were not in contact with any other L or S subunits (see for instance [22, 23]).
As expected, positively selected sites of the rbcL gene are extensively coevolving
Selecting only pairwise correlations above 0.5, CAPS detected 17 (out of 32) positively selected residues of the rbcL Worldwide dataset that were involved in coevolution (Table 4). Except for residue 95, all the others were detected as positively selected residues in the Worldwide dataset. However, residue 95 was detected as positively selected in the Chinese dataset (Table 2). The mean pairwise distance of coevolving pairs in the 3-D structure of the Rubisco large subunit was 22.9 Å.
Compared with other lineages, Ilex has the highest rate of positive selection
Using the M8 model of PAML, positive selection on the rbcL gene of Ilex was compared with 8 lineages studied by Kapralov and Filatov  using recalculations based on their data. Eight of their lineages potentially have high rates of positive selection and were chosen for a comparison with Ilex (see their additional file 2). Three (out of 151) lineages (eurosids I-26, euasterids I-9 and Coniferales 6) had dN/dS values (in the “eleventh class” of the M8 model) higher than 20.5 (the value found in Ilex). Five (out of 151) other lineages (commelinids-7, commelinids-15, eurosids I-24, euasterids I-8 and Gnetales-1) had a higher proportion of detected positive selection sites (in the “eleventh class” of the M8 model) than 8.2% (the proportion found in Ilex).
PAML calculations indicated that the rbcL gene of the 20-species dataset of Ilex had 26 positively selected residues, while the largest number in the other 8 lineages was 15 in the commelinids-15 lineage (Table 5). All the others had < 10. The SLR test confirmed 7 positively selected residues in the Ilex 20-species dataset, while the highest number found in the other 8 lineages was 2 in the Coniferales-6 lineage. A comparison with other angiosperm lineages (Table 5), shows that the number of positively selected residues of rbcL in the Worldwide dataset of Ilex (32) (Bayesian posterior probability: 0.95) was never reached in any lineage studied by Kapralov and Filatov . In their 151 studied datasets, the number of positively selected residues never passed 16 (see their additional file 3).
The program Codeml of PAML was used in the above comparisons. It should be noted, however, that using the Site Likelihood-Ratio (SLR) test, a measure of the strength of evidence for selection, the number of positively selected residues remaining significant at 95% after the SLR adjustment was 17 (among the 32 residues found by PAML in the Worldwide dataset of Ilex, see Table 1). A similar decrease was generally observed in other studies when PAML results were adjusted by SLR (see for instance ). SLR adjustment did not change the major results of this study. The 17 positively selected residues in the Rubisco large subunit of Ilex detected by SLR still exceeded the maximum of 16 found in a broad survey of green plants by Kapralov and Filatov  using PAML.
Seven additional lineages (Table 5 and Additional file 3: Table S3) were also compared because they are trees (as are most Ilex) that have experienced frequent hybridization-introgression (Nothofagus, Populus, Quercus, Salix) or because of reported positive selection (Flaveria, Potamogeton, Schiedea). Positively selected sites are much more frequent in Ilex that in any other lineage studied (Table 5).
A very high rate of positive selection on the rbcL gene of Ilex
This study has confirmed that the plastid gene rbcL, encoding the large subunit of Rubisco, has experienced an exceptionally high rate of positive selection in the genus Ilex. Kapralov and Filatov  listed the 20 most often positively selected rbcL residues, which accounted for > 70% of the cases of positive selection in their broad survey of green plants. Twelve positively selected residues detected in Ilex belonged to this top-20 list. Several residues detected in other lineages were also identified as positively selected in Ilex (Table 6). Thus, a large proportion of the rbcL sites positively selected in the green plants as a whole were also detected in the Ilex. This implies that the rbcL gene has a limited number of residue positions that can be positively selected for the fine-tuning of Rubisco , and that many of these have been selected in Ilex.
This particularly high rate of positive selection may affect the molecular clock of rbcL in Ilex, compared with other lineages. A high proportion of non-synonymous substitutions (such as positive selection) is indeed linked to a high rate of nucleotide substitution at the 1st and 2nd codon position compared to the rate of nucleotide substitution at the 3rd position. This last feature explains why the dating of the crown age of the extant genus Ilex using a fossil-calibrated molecular clock was at least two times overestimated using 1st and 2nd codon positions than using the 3rd position of rbcL . As positive selection seems to be widespread in rbcL, a codon effect on age calculation using rbcL may be frequent (; ), and this is particularly true in the genus Ilex.
Positive selection and environmental traits in Ilex
Positive selection on rbcL is frequently observed in terrestrial plants but not in aquatic plants, algae, or bacteria . It has been proposed that the thermal and water regime of the aquatic habitat is more stable, while land plants must adapt to the variability of their habitat, requiring a tuning of the activity of Rubisco by positive selection. The CO2/O2 specificity factor of Rubisco τ (determining the relative rate of photosynthesis and photorespiration) is very sensitive to cellular water availability, CO2 concentration, and temperature . A relation between rbcL positively selected substitution at particular sites and morphological, biological, or climate characteristics of different lineages has been suggested (Table 6).
Ilex has two independent hotspots of speciation, in South America and East Asia, each comprising around 200 Ilex species. Both hotspots show high rates of recent speciation associated with, respectively, the final uplift of the Andes (from 5 to 2 MYA ;,  ;) and the Hengduan Mountains (4 to 3 MYA, ). In both of these hotspots, the recent development of high mountains and deep valleys may have accelerated the diversification of Ilex species through local vicariance, dispersal, isolation, secondary contact, and ecological speciation events [32, 42]. Asian/North-American Group 2 had the highest rate of positive selection on rbcL, but the species of this lineage included in this study do not occupy a wider range of environments than those of other lineages. This suggests that there is no simple relationship between positive selection and adaptability. Indeed, although Ilex occupies a wide range of environments, it is absent from areas without year-round soil moisture or with exposure to prolonged winter cold, which may have reduced the pressure for tuning Rubisco activity.
The rbcL gene has a limited number of sites that can be positively selected. However, they are enough to suggest that cooperation of several different selected substitution sites can mediate a similar effect on Rubisco conformation. Conversely, one unique selected substitution can have different effects on rbcL folding and on Rubisco conformation, depending on the other sites of mutation . Figures 3 shows that a particular selected mutation site of rbcL can be detected in different branches of the Ilex tree, indicating convergence in different clades and reversion in other, making interpretation rather complex. The fine-tuning of Rubisco is the result of a complex synergy of multiple coevolving substitutions sites. For instance, several positively selected residues (P142T, M309I and A328S) that were often associated to C3-C4 transition (Table 6) were also positively selected in Ilex, which is entirely C3. Most studies associating rbcL positive selection with a particularly biological or climatic trait involved only 2 or 3 selected mutations that were often linked by coevolution (for instance the pair M309I and A328S, ). The large number of positively selected rbcL sites found in Ilex suggests that other drivers are involved.
Introgression may play a major role in the high rate of rbcL positive selection
Hybridization and introgression are pervasive in Ilex . Advantageous positively selected residues in rbcL might contribute to the survival and spread of individuals and species with introgression. Adaptive positive selection on the Rubisco large subunit encoded by a non-recombining plastid DNA may promote, among other mechanisms (demographic events, fixation by chance), advantageous plastid captures . This could be the case in Ilex. Huge cytonuclear discordances in the phylogeny of this genus have been reported: the plastid phylogeny is more geographically structured (with 4 geographically-determined phylogenetic clades) than the nuclear phylogeny, which was closer to the morphological classification of the genus . Since the genus Ilex probably experienced extensive adaptation to changing habitats, plastid capture by adaptive introgression could well be promoted by a Darwinian selection on Rubisco.
On the other hand, such adaptive introgressions require that the surface of plastid-encoded Rubisco large subunits of one species fit with the surface of nuclear-encoded small subunits and chaperones of another species. This “molecular adaptation” could be essential for introgression between distantly related species, requiring a subsequent modification of molecular interactions. In Ilex, species from different lineages can hybridize when they come into secondary contact, helped by rather weak reproductive barriers and by many repeated intercontinental dispersals .
In this context, selection pressure on the Rubisco large subunit of one species might be, in part, the consequences such molecular incompatibility with the small subunit of another species. The Rubisco activase chaperone was suggested to interact with the region 89–94 of rbcL . In Ilex, the exposed positively selected residues 91 and 95 may interfere with Rubisco activase (see Fig. 2b). Chaperones rbcX interacted with the C-terminal tail of one L subunit and the N-terminal region of another L subunit, to form 2 clamps that stabilized the dimer L2 . Residues 461 and 470 (C-terminal tail) were positively selected. The same was true for positively selected residues 15 and 19 in the N-terminal region (see Fig. 2b).
It has to be noted that the majority of exposed positively selected residues of the Ilex Rubisco, found on the equatorial surface of the L8S8 hexadecamer shown in Fig. 2b, were not in the list of the 20 most often positively selected residues detected by Kapralov and Filatov . These were residues 15, 19, 21, 23, 30, 84, 91, 340, 353 and 354 (see Fig. 2b and Tables 1, 2). Such uncommon positively selected sites, potentially interacting with chaperones, may represent a particularity of the genus Ilex. Thus, it can be postulated that the rbcL gene of taxa experiencing speciation by adaptive introgressions was expected to be under compensatory selective pressure to make the interface with nuclear-encoded partners compatible. In Flaveria, specific residue substitutions in the small subunit gene (rbcS) were correlated with the kinetic properties of Rubisco , but this observation was not further examined. In addition, Rubisco chaperones might have some effects on evolvability of rbcL .
Cytonuclear phylogenetic discordance is pervasive in Ilex, except in the well-defined Eurasian plastid group 4 which perfectly matches the nuclear clade Aquifolium . Some hybridization events were noticeable between closely related members of this group. However, it appears that, for an unknown reason, introgression never occurred between species of the Eurasian group 4 and distantly related species belonging to the other groups 2, 3 and 4. This may explain the fewer positive selections rbcL in this group compared with other groups (Tables 1 and 2). Thus, in lineages experiencing introgression, a high rate of positive selection on the Rubisco large subunit may be expected. A large part of it would represent molecular adaptations resulting from introgression and plastid capture, a current situation in Ilex. Therefore, in future studies, molecular adaptations need to be discriminated from environmental adaptations of Rubisco (see ).
In Ilex, the plastid rbcL gene coding for the large subunit of Rubisco had a higher rate of positive selection than in other lineages studied so far. Most of these positively selected residues are located at the surface of the large subunit and are likely to be involved in interactions with other plastid-encoded L or nuclear-encoded S subunits, as well as with the numerous nuclear-encoded Rubisco chaperones. Coevolution between positively selected sites to restore global stability is common. The high rate of positive selection in Ilex is probably linked with the pervasive hybridization and introgression in the genus, although the precise mechanisms behind this link are currently unclear.
Materials and methods
We first investigated rbcL sequences of 116 specimens  representing most of the global distribution range of the genus (including 4 continents and 8 Atlantic and Pacific archipelagos); the “Worldwide dataset”. Then we examined rbcL sequences of 124 accessions of Chinese Ilex species recently collected in their natural habitats, mainly in Yunnan, SW China; the “Chinese dataset”. SW China is the center of Ilex diversity in East Asia and species occupy an exceptionally wide range of habitats, from lowland tropical rainforest to subtropical tree lines. Altogether, this study was based on 240 specimens.
The worldwide dataset
For the detection of positively selected sites and site coevolution in rbcL, the data of Manen et al. , comprising 116 specimens and 108 species, was used. These 116 rbcL sequences were truncated after codon position 436. DNA-bank accession numbers are listed in Additional file 1: Table S1. To speed up PAML and SLR calculations (see below), the rbcL data was divided into five DNA alignments. Four of them represent previously recognized plastid groups (; see Additional file 1: Table S1): Asian/North-American Group 1 (14 sequences), Asian/North-American Group 2 (36 sequences), American Group 3 (31 sequences), and Eurasian Group 4 (35 sequences). The fifth consisted of 20 representative Ilex species from the 4 phylogenetic plastid groups (species names in bold in Additional file 1: Table S1). All Ilex trees used in this study had the topology of the plastid tree of Manen et al. . The rbcL sequence of the rather phylogenetically isolated I. canariensis  was chosen as a reference in alignments and trees because it didn’t belong to a particular plastid group. All alignments were done well using MUSCLE  because of the high conservatism in rbcL sequences. The five Ilex alignments (14 to 36 rbcL sequences) were in the size range of alignments used by Kapralov and Filatov  in their broad study based on 151 lineages of photosynthetic organisms, so their results could be compared with the results obtained in this study. For CAPS calculation (detection of coevolving residues, see below), the rbcL alignment of the 116 Ilex specimens was used.
The Chinese dataset
New rbcL sequences from China consist of 124 accessions, including 77 identified and 35 as yet unidentified species. Most rbcL sequences were amplified and sequenced from DNA-extracts of fresh or dried material, using amplifying and sequencing primers of Fay et al. . Some others were extracted from total plastid genomes of Yao et al. . Additional file 2: Table S2 gives information on specimens and the DNA-bank accession numbers of their rbcL sequences. These sequences were not truncated after codon position 436. Several accessions that have not yet been morphologically determined were kept in the analysis because rbcL sequence comparisons with BLAST (Basic Local Alignment Search Tool, www.ncbi.nlm.nih.gov/BLAST) confirmed that they indeed represent Ilex and phylogenetic comparison with the sequences of Manen et al.  allowed the determination of the phylogenetic groups to which they belong: North-America/East-Asian Group 1 (39 sequences), Asian/North-American Group 2 (40 sequences), and Eurasian Group 4 (45 sequences). PAML and SLR calculations (detection of positive selection, see below) were done on these three plastid lineages separately and results compared with calculations on the Worldwide dataset.
Detection of positive selection on rbcL
The nonsynonymous/synonymous rate ratio (ω = dN/dS) is an indicator of selective pressure at the protein level, with ω = 1 meaning neutral mutations, ω < 1 purifying selection, and ω > 1 diversifying positive selection. Amino acid sites in a protein are expected to be under different selective pressures and to have different underlying ω ratios. In the program Codeml of the PAML package (Phylogenetic Analysis by Maximum Likelihood ;), different models that account for heterogeneous ω ratios among amino acid sites were tested by maximum likelihood on aligned protein-coding DNA sequences and the corresponding phylogenetic tree. The program is useful for testing adaptive molecular evolution and identifying amino acid sites under positive selection. The likelihoods obtained by the models M7 and M8a (allowing ω ≤ 1, the null hypothesis) and by the model M8 (allowing ω > 1) were compared by the likelihood ratio test (LRT). For the model M8, a Bayesian empirical procedure was used to estimate the mean value for each codon and the posterior probability that it is under positive selection. LRT of models Ma1 (the null hypothesis) and Ma2 (allowing ω > 1) were also calculated.
The use of PAML allows a comparison with earlier studies on rbcL positive selection, particularly Kapralov and Filatov . The Sitewise Likelihood-Ratio test (SLR test, http://www.ebi.ac.uk/goldman-srv/SLR/#download), a measure of the strength of the evidence for selection (), was used for adjustment. It distinguishes potential false-positive results that have been reported with PAML. Data on eight lineages from Kapralov and Filatov  were re-calculated using both PAML and the SLR test for comparison.
Structural analysis of the large subunit of Rubisco
The protein structure of the large subunit of Rubisco for Ilex canariensis was modeled by homology using the Swiss Model server (http://swissmodel.expasy.org ;). This program looks for the closest sequence from which a 3-D structure is available in the Protein Data Base (PDB) and uses it as model. The “magic fit” option of Swiss-Pdb Viewer shows that the three-dimensional structure of the large subunit of Ilex canariensis fits well with the three-dimensional structure of the large subunit of Spinacia oleracea (Additional file 4: Figure S1). Swiss-Pdb Viewer (http://www.expasy.org/spdbv; ) was used to localize positively selected residues on this modeled Ilex canariensis structure. Using the “Select” commands of Swiss-Pdb Viewer, the vicinity of positively selected residues at a 6 Å distance (the van der Waals contact) can be investigated at the interface between other Rubisco large subunits and between the large subunit and small subunits. As rbcS is not yet available in Ilex, an extrapolation was based on the Spinacia oleracea Rubisco three-dimensional structure (PDB accessions 1RCX, ).
Detecting coevolving sites
CAPS (Coevolution Analysis using Protein Sequences, http://caps.tcd.ie [11, 12];) was used for identifying coevolution between amino acid sites. Blosum-corrected amino acid distances were used to identify amino acid covariation. Phylogenetic relationships were used to remove the phylogenetic and stochastic dependencies between sites. The 3D protein structure was used to identify the nature of the dependencies between coevolving amino acid sites . Times of divergence were estimated as the mean number of substitutions per synonymous site. DNA alignment of rbcL sequences of the worldwide dataset of 116 Ilex specimens, the corresponding plastid ML-optimized tree of Manen et al. , and the previously determined 3-D structure of the large subunit of Ilex canariensis Rubisco, were used as inputs. Parameters were: alpha threshold 0.001, number of simulated alignments 1000, bootstrap value 0.95, await convergence and time correction on.
Availability of data and materials
All data included in the manuscript has been deposited in GenBank and their accession numbers can be found in supplementary files.
Coevolution Analysis using Protein Sequences
Global Biodiversity Information Facility
Million years ago
Phylogenetic analysis by maximum likelihood
Principal component analysis
Protein Data Base
Sitewise Likelihood-Ratio tests
Antonelli A. Higher level phylogeny and evolutionary trends in Campanulaceae subfam. Lobelioideae: Molecular signal overshadows morphology. Molecular phylogenetics and evolution. 2008;46(1):1–18.
APG IV. An update of the angiosperm phylogeny group classification for the orders and families of flowering plants. Bot J Linn Soc. 2016;181:1–20.
Baas P. Inheritance of foliar and nodal anatomical characters in some Ilex hybrids. Bot J Linn Soc. 1978;77:41–52.
Bathellier C, Tcherkez G, Lorimer GH, Farquhar GD. Rubisco is not really so bad. Plant Cell Environ. 2018;41:705–16.
Biasini M, Bienert S, Waterhouse A, Arnold K, Studer G, Schmidt T, Kiefer F, Gallo Cassarino T, Bertoni M, Bordoli L, Schwede T. SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information. Nucleic Acids Res. 2014;42:W252–8.
Bracher A, Hauser T, Liu C, Hartl FU, Hayer-Hartl M. Structural analysis of the Rubisco-assembly chaperone RbcX-II from Chlamydomonas reinhardtii. PLoS One. 2015;10:e0135448.
Christin PA, Salamin N, Muasya AM, Roalson EH, Russier F, Besnard G. Evolutionary switch and genetic convergence on rbcL following the evolution of C4 photosynthesis. Mol Biol Evol. 2008;25:2361–8.
Durão P, Aigner H, Nagy P, Mueller-Cajar O, Hartl FU, Hayer-Hartl M. Opposing effects of folding and assembly chaperones on evolvability of Rubisco. Nat Chem Biol. 2015;11:148–55.
Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–7.
Erb JT, Zarzycki J. A short history of RubisCO: the rise and fall (?) of Nature’s predominant CO2 fixing enzyme. Curr Opin Biotechnol. 2018;49:100–7.
Fares MA, McNally D. CAPS: coevolution analysis using protein sequences. Bioinformatics. 2006;22:2821–2.
Fares MA, Travers SA. A novel method for detecting intramolecular coevolution: adding a further dimension to selective constraints analyses. Genetics. 2006;173:9–23.
Fay MF, Bayer C, Alverson WS, de Bruijn AY, Chase MW. Plastid rbcL sequence data indicate a close affinity between Diegodendron and Bixa. Taxon. 1998;47:43–50.
Galmés J, Flexas J, Keys AJ, Cifre J, Mitchell RA, Madgwick PJ, Haslam RP, Medrano H, Parry MA. Rubisco specificity factor tends to be larger in plant species from drier habitats and in species with persistent leaves. Plant Cell Environ. 2005;28:571–9.
Galmés J, Andralojc PJ, Kapralov MV, Flexas J, Keys AJ, Molins A, Parry MAJ, Conesa MA. Environmentally driven evolution of Rubisco and improved photosynthesis and growth within the C3 genus Limonium (Plumbaginaceae). New Phytol. 2014a;203:989–99.
Galmés J, Kapralov MV, Andralojc PJ, Conesa MA, Keys AJ, Martin AJ, Parry MAJ, Flexas J. Expanding knowledge of the Rubisco kinetics variability in plant species: environmental and evolutionary trends. Plant Cell Environ. 2014b;37:1989–2001.
Gao YD, Harris AJ, Zhou SD, He XJ. Evolutionary events in Lilium (including Nomocharis, Liliaceae) are temporally correlated with orogenies of the Q–T plateau and the Hengduan Mountains. Mol Phylogenet Evol. 2013;68:443–60.
Gruber AV, Feiz L. Rubisco assembly in the chloroplast. 2018;Frontiers in Molecular Biosciences. 5:24.
Gregory-Wodzicki KM. Uplift history of the central and northern Andes: a review. Geol Soc Am Bull. 2000;112:1091–105.
Guex N, Peitsch MC. SWISS-MODEL and the Swiss-Pdb viewer: an environment for comparative protein modeling. Electrophoresis. 1997;18:2714–23.
Hao DC, Mu J, Xiao PG. Molecular evolution and positive Darwinian selection of the gymnosperm photosynthetic Rubisco enzyme. Bot Stud. 2010;51:491–510.
Hauser T, Bhat JY, Miličić G, Wendler P, Hartl FU, Bracher A, Hayer-Hartl M. Structure and mechanism of the Rubisco-assembly chaperone Raf1. Nat Struct Mol Biol. 2015a;22:720–8.
Hauser T, Popilka L, Hartl FU, Hayer-Hartl M. Role of auxiliary proteins in Rubisco biogenesis and function. Nat Plants. 2015b;1:15065.
Hermida-Carrera C, Fares MA, Fernández Á, Gil-Pelegrín E, Kapralov MV, Mir A, Molins A, Peguero-Pina JJ, Rocha J, Sancho-Knapik D, Galmés J. Positively selected amino acid replacements within the RuBisCO enzyme of oak trees are associated with ecological adaptations. PLoS One. 2017;12:e0183970.
Hu S, Sablok G, Wang B, Qu D, Barbaro E, Viola R, Li M, Varotto C. Plastome organization and evolution of chloroplast genes in Cardamine species adapted to contrasting habitats. BMC Genomics. 2015;16:306.
Iida S, Miyagi A, Aoki S, Ito M, Kadono Y, Kosuge K. Molecular adaptation of rbcL in the heterophyllous aquatic plant Potamogeton. PLoS One. 2009;4:e4633.
Kapralov MV, Smith JAC, Filatov DA. Rubisco evolution in C4 Eudicots: an analysis of Amaranthaceae sensu lato. PLoS One. 2012;1:e52974.
Kapralov MV, Filatov DA. Molecular adaptation during adaptive radiation in the Hawaiian endemic genus Schiedea. PLoS One. 2006;1:e8.
Kapralov MV, Filatov DA. Widespread positive selection in the photosynthetic Rubisco enzyme. BMC Evol Biol. 2007;7:73.
Kapralov MV, Kubien DS, Andersson I, Filatov DA. Changes in Rubisco kinetics during the evolution of C4 photosynthesis in Flaveria (Asteraceae) are associated with positive selection on genes encoding the enzyme. Mol Biol Evol. 2011;28:1491–503.
Knobloch E, Mai DH. 1986. Monographie der Früchte und Samen in der Kreide von Mitteleuropa. Rozpr. Ustredni Ustav Geologicky. 1986;47:5–219.
Liu J-Q, Wang Y-J, Wang A-L, Hideaki O, Abbott RJ. Radiation and diversifcation within the Ligularia-Cremanthodium-Parasenecio complex (Asteraceae) triggered by uplift of the Qinghai-Tibetan plateau. Mol Phyl Evol. 2006;38:31–49.
Li X, Sun B, Xiao L, Wu J, Lin Z. Leaf macrofossils of Ilex protocornuta sp. nov. (Aquifoliaceae) from the Late Miocene of East China: Implications for palaeoecology. Review of Palaeobotany and Palynology. 2010;161(1-2):87–103.
Loizeau P-A, Barriera G, Manen J-F, Broennimann O. Towards an understanding of the distribution of Ilex L. (Aquifoliaceae) on a world-wide scale. Biol Skr. 2005;55:501–20.
Magalon and Sanderson, 2005" should be "Magallóan SA, Sanderson MJ. Angiosperm divergence times: the effect of genes, codon positions, and time constraints. Evolution. 2005;59(8):1653–1670.
Manen J-F, Barriera G, Loizeau PA, Naciri Y. The history of extant Ilex species (Aquifoliaceae): evidence of hybridization within a Miocene radiation. Mol Phylogenet Evol. 2010;57:961–77.
Manen J-F. Are both sympatric species Ilex perado and Ilex canariensis secretly hybridizing? Indication from nuclear markers collected in Tenerife. BMC Evol Biol. 2004;4:1.
Manen J-F, Boulter MC, Naciri-Graven Y. The complex history of the genus Ilex L.(Aquifoliaceae): evidence from the comparison of plastid and nuclear DNA sequences and from fossil data. Plant Syst Evol. 2002;235:79–98.
Manen J-F, Cuénoud P, Martinez MD. Intralineage variation in the pattern of rbcL nucleotide substitution. Plant Syst Evol. 1998;211:103–12.
Massingham T, Goldman N. Detecting amino acid sites under positive selection and purifying selection. Genetics. 2005;169(3):1753–1762.
Parto S, Lartillot N. Molecular adaptation in Rubisco: discriminating between convergent evolution and positive selection using mechanistic and classical codon models. PLoS One. 2018; https://doi.org/10.1371/journal.pone.0192697.
Pennington RT, Lavin M, Särkinen T, Lewis GP, Klitgaard BB, Hughes C. Contrasting plant diversification histories within the Andean biodiversity hotspot. PNAS. 2010;107:13783–7.
Portis AR, Li C, Wang D, Salvucci ME. Regulation of Rubisco activase and its interaction with Rubisco. J Exp Bot. 2008;59:1597–604.
Pottier M, Gilis D, Boutry M. The hidden face of Rubisco. Trends Ecol Evol. 2018; https://doi.org/10.1016/j.tplants.2018.02.006.
Sanderson MJ, Doyle JA. Sources of error and confidence intervals in estimating the age of angiosperms from rbcL and 18S rDNA data. Am J Bot. 2001;88:1499–516.
Sen L, Fares MA, Liang B, Gao L, Wang B, Wang T, Su YJ. Molecular evolution of rbcL in three gymnosperm families: identifying adaptive and coevolutionary patterns. Biol Direct. 2011;6:29.
Setoguchi H, Watanabe I. Intersectional gene flow between insular endemics of Ilex (Aquifoliaceae) on the Bonin Islands and the Ryukyu Islands. Am J Bot. 2000;87:793–810.
Sharwood RE, von Caemmerer S, Maliga P, Whitney SM. The catalytic properties of hybrid Rubisco comprising tobacco small and sunflower large subunits mirror the kinetically equivalent source Rubiscos and can support tobacco growth. Plant Physiol. 2008;146:83–96.
Shi L, Li N, Wang S, Zhou Y, Huang W, Yang Y, Ma Y, Zhou R. Molecular evidence for the hybrid origin of Ilex dabieshanensis (Aquifoliaceae). PLoS One. 2016;11:e0147825.
Studer RA, Christin PA, Williams MA, Orengo CA. Stability-activity tradeoffs constrain the adaptive evolution of RubisCO. PNAS. 2014;111:2223–8.
Taylor TC, Andersson I. The structure of the complex between rubisco and its natural substrate ribulose 1,5-bisphosphate. J Mol Biol. 1997;265:432–44.
Wachter RM, Salvucci ME, Carmo-Silva AE, Barta C, Genkov T, Spreitzer R. Activation of interspecies-hybrid Rubisco enzymes to assess different models for the Rubisco–Rubisco activase interaction. Photosynth Res. 2013;117:557–66.
Wang M, Kapralov MV, Anisimova M. Coevolution of amino acid residues in the key photosynthetic enzyme Rubisco. BMC Evol Biol. 2011;11:266.
Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007;24:1586–91.
Yao X, Tan Y-H, Liu Y-Y, Song Y, Yang J-B, Corlett RT. Chloroplast genome structure in Ilex (Aquifoliaceae). Sci Rep. 2016;6:28559.
We thank Juan-hong Zhang, Chun-yan Lin and Ji-xiong Yang from the Germplasm Bank of Wild Species of Kunming Institute of Botany, Chinese Academy of Sciences, for their help with experiments.
This work was supported by grants from the CAS “Light of West China” Program [Y9XB071B01 to X.Y.], the CAS 135 program [2017XTBG-T03 to X.Y.] and the 1000 Talents Program [WQ20110491035 to R.T.C.]. Y.H.T. was supported by grants from the Southeast Asia Biodiversity Research Institute, Chinese Academy of Sciences for collecting some samples.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
List of world-wide Ilex species . In bold, species used in the 20-species analyses.
List of Chinese Ilex species studied here.
List of taxa used in comparison with Ilex.
Comparison of the 3-dimensional structure of the Rubisco large subunit of Spinacia oleracea (in pink) with modeled 3-dimensional structure of the Rubisco large subunit of Ilex canariensis (in orange) using the “magic fit” option of Swiss-Pdb Viewer.
About this article
Cite this article
Yao, X., Tan, Yh., Yang, Jb. et al. Exceptionally high rates of positive selection on the rbcL gene in the genus Ilex (Aquifoliaceae). BMC Evol Biol 19, 192 (2019). https://doi.org/10.1186/s12862-019-1521-1