Skip to main content

Exceptionally high rates of positive selection on the rbcL gene in the genus Ilex (Aquifoliaceae)



The genus Ilex (Aquifoliaceae) has a near-cosmopolitan distribution in mesic habitats from tropical to temperate lowlands and in alpine forests. It has a high rate of hybridization and plastid capture, and comprises four geographically structured plastid groups. A previous study showed that the plastid rbcL gene, coding for the large subunit of Rubisco, has a particularly high rate of non-synonymous substitutions in Ilex, when compared with other plant lineages. This suggests a strong positive selection on rbcL, involved in yet unknown adaptations. We therefore investigated positive selection on rbcL in 240 Ilex sequences from across the global range.


The rbcL gene shows a much higher rate of positive selection in Ilex than in any other plant lineage studied so far (> 3000 species) by tests in both PAML and SLR. Most positively selected residues are on the surface of the folded large subunit, suggesting interaction with other subunits and associated chaperones, and coevolution between positively selected residues is prevalent, indicating compensatory mutations to recover molecular stability. Coevolution between positively selected sites to restore global stability is common.


This study has confirmed the predicted high incidence of positively selected residues in rbcL in Ilex, and shown that this is higher than in any other plant lineage studied so far. The causes and consequences of this high incidence are unclear, but it is probably associated with the similarly high incidence of hybridization and introgression in Ilex, even between distantly related lineages, resulting in large cytonuclear discordance in the phylogenies.


The genus Ilex (hollies) constitutes the monogeneric family Aquifoliaceae in the campanulid subclade of the asterids [2]. Ilex species are deciduous or evergreen dioecious trees or shrubs of mesic temperate, subtropical, tropical, and tropical-montane climates. At least 644 species have been described. Ilex has a near-cosmopolitan distribution, with the highest diversities in East-Asia, and in Central and South America, moderate numbers of species in Southeast Asia and North America, and a few in Africa, Australia, and Europe, and on most remote volcanic islands in the Atlantic and Pacific, including the Canary Islands, Madeira, Azores, Caribbean, Polynesia, Hawaii, New Caledonia, and Fiji [36]. The lineage is old, dating from the end of Cretaceous [36, 38], with the oldest accepted fossil record of 69 Million Years Ago (MYA, [31]). However, there is a large gap between this date and the estimated Miocene age of the most recent common ancestor of all extant Ilex, based on plastid and nuclear markers [36]. Many pre-Miocene fossils have been attributed to Ilex ([34]; [33]), suggesting extensive lineage extinctions, so that the extant genus is now phylogenetically isolated.

Abundant production of small, fleshy fruits, consumed primarily by birds in most species, can account for its wide dispersal, but the exceptional adaptive capacity in this morphologically relatively uniform genus is surprising. There is evidence for a high frequency of hybridization and introgression in Ilex, leading to a reticulate pattern of evolution. Phylogenetic trees obtained from nuclear and plastid data are very different [36] and Shi et al. [49] suggest that many of the 150–200 narrow-range species recognized in China may be a result of hybridization, which may also be true elsewhere. Widespread hybridization is supported by anatomical and morphological studies [3], and documented by molecular phylogenies [36,37,38, 47, 49].

A feature of the genus Ilex is that the rbcL gene, encoding the large subunit of Rubisco and a widely-used plastid sequence in phylogeny, appears to have a particularly high rate of non-synonymous substitutions compared with other genera [39]. This high rate of non-synonymous substitution suggests a positive selection on the plastid-encoded Rubisco large subunit in Ilex. Rubisco catalyzes the first steps of photosynthetic fixation of carbon and of photorespiration in all photosynthetic organisms (see [4, 10, 44], for recent reviews). The most prevalent form of Rubisco (form I) is an hexadecamer of 8 catalytic large subunits (L, harboring the active site) encoded by the rbcL gene in the plastome, and 8 regulatory small subunits (S), encoded in the nucleosome by a small family of rbcS genes [6]. The assembly and enzymatic function of Rubisco are, in part, guaranteed by specific interactions of the L subunit with the S subunits, assisted by several nuclear-encoded specific chaperones [22, 23].

Several studies provide evidence that positive selection on particular amino-acid residues of the Rubisco large subunit is involved in plant adaptation to various environmental stresses. This was proposed for the endemic Hawaiian genus Schiedea adapting to dry or wet habitats [28], in the heterophyllous aquatic plant Potamogeton [26], in switches from C3 to C4 photosynthesis [7, 30], in movements from dry to wet habitats in the Balearic Islands [15], in adaptation of Cardamine resedifolia to high altitudes [25], and associated with leaf traits and climate characteristics in the genus Quercus [24]. Moreover, studies based on a large sample of rbcL sequences show that positive selection is widespread in most lineages of land plants [21, 29].

A consequence of rbcL positive selection is a frequently observed pattern of coevolution between positively selected sites. Indeed, the multiple roles of amino acids in enzymes (folding, stability, enzymatic activity) imply that the modification of one parameter can affect other properties. Stability and activity of the enzyme are likely to be negatively correlated, and compensatory mutations are then needed to restore global stability [50]. Thus, when positive selection occurs on a residue, a compensatory co-evolving mutation on another residue of the molecule is likely to be detected. This has been confirmed in land plants [24, 25, 46, 53]. Rubisco has evolved CO2/O2 specificity ratios according to the environmental situation, allowing a perpetual fine-tuning of its performance [50]. Although the plastid rbcL gene sequence is highly conserved, this enzyme has played a role in plant adaptation to changing environments across geological time and continues to do so across recent geographical space.

Another feature of adaptive mutations in rbcL is that during introgression, plastid-encoded maternal L subunits of one species become associated with nuclear-encoded paternal S subunits of another species, potentially creating compatibility problems [48]. Similar compatibility problems could also arise from the numerous nuclear-encoded chaperones intimately linked to Rubisco subunits ([52]; [18]). The impacts and repair of such incompatibilities on Rubisco function should be most significant in lineages experiencing high rates of hybridization or/and introgression, as it is the case in Ilex [36].

In this study, we looked for evidence of positive selection on rbcL in the genus Ilex. We first investigate rbcL sequences (truncated after codon position 436) of 116 species from across the global range (“Worldwide dataset”) and then untruncated rbcL sequences of 124 accessions collected in natural habitats in China (“Chinese dataset”).


Evolution of the gene rbcL of Ilex includes all the features of positive selection

Positive selection on the rbcL gene of Ilex

The Worldwide dataset of 116 rbcL sequences (see Materials and Methods and Additional file 1: Table S1) included the first 436 codons. There were 91 variable codons, of which 24 had synonymous substitutions and 67 had non-synonymous substitutions (Fig. 1). In order to speed up the analyses, PAML calculations were conducted on rbcL sequences of the 4 Ilex plastid lineages [36] separately: the Asian/North-American Group 1 (14 sequences), the Asian/North-American Group 2 (36 sequences), the American Group 3 (31 sequences), and the Eurasian Group 4 (35 sequences). A 5th “20-Ilex” rbcL DNA matrix consisted of 20 selected species of Ilex representing the 4 plastid groups and a wide geographic distribution (see Fig. 3).

Fig. 1
figure 1

Variable codons (91 to 436) found among 116 Ilex sequences (Worldwide dataset) mapped on the accession I. canariensis_90, as reference. Uncolored codons are synonymous substitutions. Yellow codons are non-synonymous substitutions. # means positively selected codons (Bayesian posterior probability > 0.95 using the M8 model of Codeml, PAML (Table 1). + means positively selected residues found among the 20 most often positively selected residues of [29] (Table 1). § means residues involved in coevolution (Table 4)

Positively selected sites were detected (at a Bayesian posterior probability of 0.95) by the program Codeml of PAML using model M8 [54]. Altogether for the five alignments studied, 32 positively selected residues were detected (Table 1). However, the SLR (Stepwise likelihood Ratio) test only confirmed 17 of these (Table 1). Using PAML, likelihood ratio tests (LRT) were done for each of the 4 alignments representing the 4 previously defined phylogenetic plastid lineages. Likelihood ratio tests of different models (model M2a allowing ω > 1 versus models M1a not allowing ω > 1, as well as model M8 allowing ω > 1 versus models M7 or M8a not allowing ω > 1) indicated that positive selection was highly probable for the 4 plastid groups at a significance level of < 0.001:

  • Group 1, 2∆l = 42.88 (M2a/M1a), 49.68 (M8/M7), and 40.92 (M8/M8a).

  • Group 2, 2∆l = 113.94 (M2a/M1a), 111.20 (M8/M7), and 111.04 (M8/M8a).

  • Group 3, 2∆l = 52.14 (M2a/M1a), 52.16 (M8/M7), and 52.12 (M8/M8a).

  • Group 4, 2∆l = 175.96 (M2a/M1a), 60.00 (M8/M7), and 42.74 (M8/M8a).

Table 1 List of positively selected sites of rbcL of Ilex with posterior probabilities (PAML, Codeml, model M8) > 0.95 in the 5 subsets of the Worldwide dataset. Groups 1, 2, 3 and 4 represent the 4 phylogenetic plastid groups of [36] (Asian/North-American Group 1, Asian/North/American Group 2, American Group 3, and Eurasian Group 4. The 20-species sample represents a selection of the 116 specimens of [36], reflecting the 4 groups and a wide geographic distribution (see Fig. 3). In red, selected Ilex residues belonging to the list of the 20 most often positively selected sites detected in the 151 lineages of Kapralov and Filatov [29]

Positively selected sites were mainly found in the Asian/North American Group 2 (25 sites by PAML, 11 confirmed by SLR test), followed by the American Group 3 (12 sites, 2 confirmed), the Asian/North American Group 1 (7 sites, 3 confirmed), and then Eurasian Group 4 (7 sites, 1 confirmed) (Table 1).

A total of 25 positively selected residues were detected by PAML in the rbcL coding sequences of the Chinese dataset of 124 sequences (see materials and methods and Additional file 2: Table S2), six of which were new, of which residues 461 and 470 were unavailable in the Worldwide dataset truncated at residue 436 (Table 2). The SLR test confirmed 9 of these residues as positively selected. The Asian/North-American Group 2 again had the most positively selected residues (Table 2).

Table 2 List of positively selected residues of rbcL with posterior probabilities (PAML, Codeml, Model M8) > 0.95 in the Chinese dataset. This dataset represents group 1, 2, and 4 (The American Group 3 is not present in China). In red, selected Ilex residues belonging to the list of 20 most often positively selected sites detected in the 151 lineages of Kapralov and Filatov [29]. (*) indicates that the residues have ≥95% SLR support for positive selection. (a) indicates that residues are not positively selected in the Worldwide dataset or that residues (461 and 470) were not available in the truncated Worldwide dataset. (b) Number of residues

Most positively selected sites are located at the surface of the folded rbcL

The 38 positively selected sites in rbcL detected by PAML (32 in the Worldwide dataset and 6 additional ones in the Chinese dataset) were mapped onto the 3-D structure of the modeled Rubisco large subunit of Ilex canariensis (see Materials and Methods). A large proportion of positively selected residues are located at the surface of the folded large subunit, with only 6 residues (53, 84, 196, 221, 317 and 328) being buried (Fig. 2a). Exposed residues could be candidates for interactions with other large subunits (intra-dimer and dimer-dimer interaction), with small subunits, and with Rubisco activase and other chaperones, as suggested by Kapralov and Filatov [29]. Extrapolating from the Spinacia oleracea Rubisco 3D-structure (1RCX, [51]), potential implications of Ilex positively selected residues in quaternary interactions were examined. Positively selected residues located within 6 Å (the van der Waals contact) of the surface of other subunits were selected using Swiss-Pdb Viewer ( Some of these (13 of 38) were in van der Waals contact with another large subunit intra-dimer interface, a dimer-dimer interface, and/or with a small subunit (Table 3). Many positively selected residues of the equatorial surface of Rubisco were still visible (and potentially accessible by chaperones) even when the 4 interacting large subunits (B, E, R and V) and the 3 interacting small subunits (s, f and w) were represented in contact with the large L subunit (Fig. 2b). Positively selected residues 219, 225, 226, 429, and 461 of the large subunits were in contact with the small subunits (Table 3). The same was true for the numerous nuclear-encoded Rubisco chaperones that might interact with the many exposed positively selected residues on the equatorial surface of Rubisco (15, 19, 21, 23, 28, 30, 84, 86, 91, 95, 97, 340, 353, 354 and 470, see Fig. 2b), that were not in contact with any other L or S subunits (see for instance [22, 23]).

Fig. 2
figure 2

Positively selected residues of Ilex represented on the surface of the Rubisco large subunit. a Front and back of folded large subunit of the modeled Ilex canariensis 3-D structure. Visible positively selected residues from the Worldwide dataset are in red, while additional positively selected residues from the Chinese dataset are in blue. b Extrapolation on the Rubisco Spinacia oleracea 3-D structure (1RCX). Presented are the 4 large subunits (E,B,V and R) and 3 small subunits (f, s and w) that are in contact with one large subunit (here L, in orange). Positively selected residues are represented with the same colors as in A. Most of these residues (represented here on the L subunit) are exposed on the equatorial surface the L8S8 Rubisco hexadecamer. Sites 142, 429 and 461 are involved in intra-dimer, dimer-dimer and L/S interactions (see Table 3). All other sites (15, 19, 21, 23, 28, 30, 84, 86, 91, 95, 97, 340, 353, 354 and 470) represent positively selected residues potentially involved in interactions with chaperones

Table 3 Positively selected residues of the large subunit of Ilex Rubisco that are in contact with another L or S subunit, extrapolated from the Spinacia oleracea Rubisco 3D-structure (see Fig. 2). Thirteen positively selected residues of the Rubisco large subunit of Ilex are at the interface (≤ 6 Å) of intra-dimer (ID), of dimer-dimer (DD), and of large/small subunits (SSU). Two of them (219 and 461) share a contact with another large and with a small subunit. In brackets, the total number of residues of the Rubisco large subunit that are in contact with another large subunit or the small subunit in ID, DD and SSU interactions

As expected, positively selected sites of the rbcL gene are extensively coevolving

Selecting only pairwise correlations above 0.5, CAPS detected 17 (out of 32) positively selected residues of the rbcL Worldwide dataset that were involved in coevolution (Table 4). Except for residue 95, all the others were detected as positively selected residues in the Worldwide dataset. However, residue 95 was detected as positively selected in the Chinese dataset (Table 2). The mean pairwise distance of coevolving pairs in the 3-D structure of the Rubisco large subunit was 22.9 Å.

Table 4 Pairs of coevolving residues detected by CAPS (Coevolution Analysis using Protein Sequences) in 116 rbcL sequences of the Ilex Worldwide dataset. Parameters used in CAPS are: alpha threshold 0.001, number of simulated alignments 1000, bootstrap value 0.95, await convergence and time correction on

Compared with other lineages, Ilex has the highest rate of positive selection

Using the M8 model of PAML, positive selection on the rbcL gene of Ilex was compared with 8 lineages studied by Kapralov and Filatov [29] using recalculations based on their data. Eight of their lineages potentially have high rates of positive selection and were chosen for a comparison with Ilex (see their additional file 2). Three (out of 151) lineages (eurosids I-26, euasterids I-9 and Coniferales 6) had dN/dS values (in the “eleventh class” of the M8 model) higher than 20.5 (the value found in Ilex). Five (out of 151) other lineages (commelinids-7, commelinids-15, eurosids I-24, euasterids I-8 and Gnetales-1) had a higher proportion of detected positive selection sites (in the “eleventh class” of the M8 model) than 8.2% (the proportion found in Ilex).

PAML calculations indicated that the rbcL gene of the 20-species dataset of Ilex had 26 positively selected residues, while the largest number in the other 8 lineages was 15 in the commelinids-15 lineage (Table 5). All the others had < 10. The SLR test confirmed 7 positively selected residues in the Ilex 20-species dataset, while the highest number found in the other 8 lineages was 2 in the Coniferales-6 lineage. A comparison with other angiosperm lineages (Table 5), shows that the number of positively selected residues of rbcL in the Worldwide dataset of Ilex (32) (Bayesian posterior probability: 0.95) was never reached in any lineage studied by Kapralov and Filatov [29]. In their 151 studied datasets, the number of positively selected residues never passed 16 (see their additional file 3).

Table 5 Comparison of positive selection in rbcL of different lineages detected by PAML (M8 model) and by SLR (probability ≥0.95)

The program Codeml of PAML was used in the above comparisons. It should be noted, however, that using the Site Likelihood-Ratio (SLR) test, a measure of the strength of evidence for selection, the number of positively selected residues remaining significant at 95% after the SLR adjustment was 17 (among the 32 residues found by PAML in the Worldwide dataset of Ilex, see Table 1). A similar decrease was generally observed in other studies when PAML results were adjusted by SLR (see for instance [27]). SLR adjustment did not change the major results of this study. The 17 positively selected residues in the Rubisco large subunit of Ilex detected by SLR still exceeded the maximum of 16 found in a broad survey of green plants by Kapralov and Filatov [29] using PAML.

Seven additional lineages (Table 5 and Additional file 3: Table S3) were also compared because they are trees (as are most Ilex) that have experienced frequent hybridization-introgression (Nothofagus, Populus, Quercus, Salix) or because of reported positive selection (Flaveria, Potamogeton, Schiedea). Positively selected sites are much more frequent in Ilex that in any other lineage studied (Table 5).


A very high rate of positive selection on the rbcL gene of Ilex

This study has confirmed that the plastid gene rbcL, encoding the large subunit of Rubisco, has experienced an exceptionally high rate of positive selection in the genus Ilex. Kapralov and Filatov [29] listed the 20 most often positively selected rbcL residues, which accounted for > 70% of the cases of positive selection in their broad survey of green plants. Twelve positively selected residues detected in Ilex belonged to this top-20 list. Several residues detected in other lineages were also identified as positively selected in Ilex (Table 6). Thus, a large proportion of the rbcL sites positively selected in the green plants as a whole were also detected in the Ilex. This implies that the rbcL gene has a limited number of residue positions that can be positively selected for the fine-tuning of Rubisco [16], and that many of these have been selected in Ilex.

Table 6 Positively selected residues detected in Ilex (in both the Worldwide and Chinese datasets) also described in other lineages and associated with biological and environment traits

This particularly high rate of positive selection may affect the molecular clock of rbcL in Ilex, compared with other lineages. A high proportion of non-synonymous substitutions (such as positive selection) is indeed linked to a high rate of nucleotide substitution at the 1st and 2nd codon position compared to the rate of nucleotide substitution at the 3rd position. This last feature explains why the dating of the crown age of the extant genus Ilex using a fossil-calibrated molecular clock was at least two times overestimated using 1st and 2nd codon positions than using the 3rd position of rbcL [36]. As positive selection seems to be widespread in rbcL, a codon effect on age calculation using rbcL may be frequent ([45]; [35]), and this is particularly true in the genus Ilex.

Positive selection and environmental traits in Ilex

Positive selection on rbcL is frequently observed in terrestrial plants but not in aquatic plants, algae, or bacteria [29]. It has been proposed that the thermal and water regime of the aquatic habitat is more stable, while land plants must adapt to the variability of their habitat, requiring a tuning of the activity of Rubisco by positive selection. The CO2/O2 specificity factor of Rubisco τ (determining the relative rate of photosynthesis and photorespiration) is very sensitive to cellular water availability, CO2 concentration, and temperature [14]. A relation between rbcL positively selected substitution at particular sites and morphological, biological, or climate characteristics of different lineages has been suggested (Table 6).

Ilex has two independent hotspots of speciation, in South America and East Asia, each comprising around 200 Ilex species. Both hotspots show high rates of recent speciation associated with, respectively, the final uplift of the Andes (from 5 to 2 MYA [19];, [1] [34];) and the Hengduan Mountains (4 to 3 MYA, [17]). In both of these hotspots, the recent development of high mountains and deep valleys may have accelerated the diversification of Ilex species through local vicariance, dispersal, isolation, secondary contact, and ecological speciation events [32, 42]. Asian/North-American Group 2 had the highest rate of positive selection on rbcL, but the species of this lineage included in this study do not occupy a wider range of environments than those of other lineages. This suggests that there is no simple relationship between positive selection and adaptability. Indeed, although Ilex occupies a wide range of environments, it is absent from areas without year-round soil moisture or with exposure to prolonged winter cold, which may have reduced the pressure for tuning Rubisco activity.

The rbcL gene has a limited number of sites that can be positively selected. However, they are enough to suggest that cooperation of several different selected substitution sites can mediate a similar effect on Rubisco conformation. Conversely, one unique selected substitution can have different effects on rbcL folding and on Rubisco conformation, depending on the other sites of mutation [50]. Figures 3 shows that a particular selected mutation site of rbcL can be detected in different branches of the Ilex tree, indicating convergence in different clades and reversion in other, making interpretation rather complex. The fine-tuning of Rubisco is the result of a complex synergy of multiple coevolving substitutions sites. For instance, several positively selected residues (P142T, M309I and A328S) that were often associated to C3-C4 transition (Table 6) were also positively selected in Ilex, which is entirely C3. Most studies associating rbcL positive selection with a particularly biological or climatic trait involved only 2 or 3 selected mutations that were often linked by coevolution (for instance the pair M309I and A328S, [50]). The large number of positively selected rbcL sites found in Ilex suggests that other drivers are involved.

Fig. 3
figure 3

Mapping of positively selected sites in the 20-species tree. Only SLR positively selected sites are included, as well as Ilex positively selected sites recorded in Table 6. Asian/NAm means Asian/North American

Introgression may play a major role in the high rate of rbcL positive selection

Hybridization and introgression are pervasive in Ilex [36]. Advantageous positively selected residues in rbcL might contribute to the survival and spread of individuals and species with introgression. Adaptive positive selection on the Rubisco large subunit encoded by a non-recombining plastid DNA may promote, among other mechanisms (demographic events, fixation by chance), advantageous plastid captures [28]. This could be the case in Ilex. Huge cytonuclear discordances in the phylogeny of this genus have been reported: the plastid phylogeny is more geographically structured (with 4 geographically-determined phylogenetic clades) than the nuclear phylogeny, which was closer to the morphological classification of the genus [36]. Since the genus Ilex probably experienced extensive adaptation to changing habitats, plastid capture by adaptive introgression could well be promoted by a Darwinian selection on Rubisco.

On the other hand, such adaptive introgressions require that the surface of plastid-encoded Rubisco large subunits of one species fit with the surface of nuclear-encoded small subunits and chaperones of another species. This “molecular adaptation” could be essential for introgression between distantly related species, requiring a subsequent modification of molecular interactions. In Ilex, species from different lineages can hybridize when they come into secondary contact, helped by rather weak reproductive barriers and by many repeated intercontinental dispersals [36].

In this context, selection pressure on the Rubisco large subunit of one species might be, in part, the consequences such molecular incompatibility with the small subunit of another species. The Rubisco activase chaperone was suggested to interact with the region 89–94 of rbcL [43]. In Ilex, the exposed positively selected residues 91 and 95 may interfere with Rubisco activase (see Fig. 2b). Chaperones rbcX interacted with the C-terminal tail of one L subunit and the N-terminal region of another L subunit, to form 2 clamps that stabilized the dimer L2 [23]. Residues 461 and 470 (C-terminal tail) were positively selected. The same was true for positively selected residues 15 and 19 in the N-terminal region (see Fig. 2b).

It has to be noted that the majority of exposed positively selected residues of the Ilex Rubisco, found on the equatorial surface of the L8S8 hexadecamer shown in Fig. 2b, were not in the list of the 20 most often positively selected residues detected by Kapralov and Filatov [29]. These were residues 15, 19, 21, 23, 30, 84, 91, 340, 353 and 354 (see Fig. 2b and Tables 1, 2). Such uncommon positively selected sites, potentially interacting with chaperones, may represent a particularity of the genus Ilex. Thus, it can be postulated that the rbcL gene of taxa experiencing speciation by adaptive introgressions was expected to be under compensatory selective pressure to make the interface with nuclear-encoded partners compatible. In Flaveria, specific residue substitutions in the small subunit gene (rbcS) were correlated with the kinetic properties of Rubisco [30], but this observation was not further examined. In addition, Rubisco chaperones might have some effects on evolvability of rbcL [8].

Cytonuclear phylogenetic discordance is pervasive in Ilex, except in the well-defined Eurasian plastid group 4 which perfectly matches the nuclear clade Aquifolium [36]. Some hybridization events were noticeable between closely related members of this group. However, it appears that, for an unknown reason, introgression never occurred between species of the Eurasian group 4 and distantly related species belonging to the other groups 2, 3 and 4. This may explain the fewer positive selections rbcL in this group compared with other groups (Tables 1 and 2). Thus, in lineages experiencing introgression, a high rate of positive selection on the Rubisco large subunit may be expected. A large part of it would represent molecular adaptations resulting from introgression and plastid capture, a current situation in Ilex. Therefore, in future studies, molecular adaptations need to be discriminated from environmental adaptations of Rubisco (see [41]).


In Ilex, the plastid rbcL gene coding for the large subunit of Rubisco had a higher rate of positive selection than in other lineages studied so far. Most of these positively selected residues are located at the surface of the large subunit and are likely to be involved in interactions with other plastid-encoded L or nuclear-encoded S subunits, as well as with the numerous nuclear-encoded Rubisco chaperones. Coevolution between positively selected sites to restore global stability is common. The high rate of positive selection in Ilex is probably linked with the pervasive hybridization and introgression in the genus, although the precise mechanisms behind this link are currently unclear.

Materials and methods

We first investigated rbcL sequences of 116 specimens [36] representing most of the global distribution range of the genus (including 4 continents and 8 Atlantic and Pacific archipelagos); the “Worldwide dataset”. Then we examined rbcL sequences of 124 accessions of Chinese Ilex species recently collected in their natural habitats, mainly in Yunnan, SW China; the “Chinese dataset”. SW China is the center of Ilex diversity in East Asia and species occupy an exceptionally wide range of habitats, from lowland tropical rainforest to subtropical tree lines. Altogether, this study was based on 240 specimens.

The worldwide dataset

For the detection of positively selected sites and site coevolution in rbcL, the data of Manen et al. [36], comprising 116 specimens and 108 species, was used. These 116 rbcL sequences were truncated after codon position 436. DNA-bank accession numbers are listed in Additional file 1: Table S1. To speed up PAML and SLR calculations (see below), the rbcL data was divided into five DNA alignments. Four of them represent previously recognized plastid groups ([36]; see Additional file 1: Table S1): Asian/North-American Group 1 (14 sequences), Asian/North-American Group 2 (36 sequences), American Group 3 (31 sequences), and Eurasian Group 4 (35 sequences). The fifth consisted of 20 representative Ilex species from the 4 phylogenetic plastid groups (species names in bold in Additional file 1: Table S1). All Ilex trees used in this study had the topology of the plastid tree of Manen et al. [36]. The rbcL sequence of the rather phylogenetically isolated I. canariensis [36] was chosen as a reference in alignments and trees because it didn’t belong to a particular plastid group. All alignments were done well using MUSCLE [9] because of the high conservatism in rbcL sequences. The five Ilex alignments (14 to 36 rbcL sequences) were in the size range of alignments used by Kapralov and Filatov [29] in their broad study based on 151 lineages of photosynthetic organisms, so their results could be compared with the results obtained in this study. For CAPS calculation (detection of coevolving residues, see below), the rbcL alignment of the 116 Ilex specimens was used.

The Chinese dataset

New rbcL sequences from China consist of 124 accessions, including 77 identified and 35 as yet unidentified species. Most rbcL sequences were amplified and sequenced from DNA-extracts of fresh or dried material, using amplifying and sequencing primers of Fay et al. [13]. Some others were extracted from total plastid genomes of Yao et al. [55]. Additional file 2: Table S2 gives information on specimens and the DNA-bank accession numbers of their rbcL sequences. These sequences were not truncated after codon position 436. Several accessions that have not yet been morphologically determined were kept in the analysis because rbcL sequence comparisons with BLAST (Basic Local Alignment Search Tool, confirmed that they indeed represent Ilex and phylogenetic comparison with the sequences of Manen et al. [36] allowed the determination of the phylogenetic groups to which they belong: North-America/East-Asian Group 1 (39 sequences), Asian/North-American Group 2 (40 sequences), and Eurasian Group 4 (45 sequences). PAML and SLR calculations (detection of positive selection, see below) were done on these three plastid lineages separately and results compared with calculations on the Worldwide dataset.

Detection of positive selection on rbcL

The nonsynonymous/synonymous rate ratio (ω = dN/dS) is an indicator of selective pressure at the protein level, with ω = 1 meaning neutral mutations, ω < 1 purifying selection, and ω > 1 diversifying positive selection. Amino acid sites in a protein are expected to be under different selective pressures and to have different underlying ω ratios. In the program Codeml of the PAML package (Phylogenetic Analysis by Maximum Likelihood [54];), different models that account for heterogeneous ω ratios among amino acid sites were tested by maximum likelihood on aligned protein-coding DNA sequences and the corresponding phylogenetic tree. The program is useful for testing adaptive molecular evolution and identifying amino acid sites under positive selection. The likelihoods obtained by the models M7 and M8a (allowing ω ≤ 1, the null hypothesis) and by the model M8 (allowing ω > 1) were compared by the likelihood ratio test (LRT). For the model M8, a Bayesian empirical procedure was used to estimate the mean value for each codon and the posterior probability that it is under positive selection. LRT of models Ma1 (the null hypothesis) and Ma2 (allowing ω > 1) were also calculated.

The use of PAML allows a comparison with earlier studies on rbcL positive selection, particularly Kapralov and Filatov [29]. The Sitewise Likelihood-Ratio test (SLR test,, a measure of the strength of the evidence for selection ([40]), was used for adjustment. It distinguishes potential false-positive results that have been reported with PAML. Data on eight lineages from Kapralov and Filatov [29] were re-calculated using both PAML and the SLR test for comparison.

Structural analysis of the large subunit of Rubisco

The protein structure of the large subunit of Rubisco for Ilex canariensis was modeled by homology using the Swiss Model server ( [5];). This program looks for the closest sequence from which a 3-D structure is available in the Protein Data Base (PDB) and uses it as model. The “magic fit” option of Swiss-Pdb Viewer shows that the three-dimensional structure of the large subunit of Ilex canariensis fits well with the three-dimensional structure of the large subunit of Spinacia oleracea (Additional file 4: Figure S1). Swiss-Pdb Viewer (; [20]) was used to localize positively selected residues on this modeled Ilex canariensis structure. Using the “Select” commands of Swiss-Pdb Viewer, the vicinity of positively selected residues at a 6 Å distance (the van der Waals contact) can be investigated at the interface between other Rubisco large subunits and between the large subunit and small subunits. As rbcS is not yet available in Ilex, an extrapolation was based on the Spinacia oleracea Rubisco three-dimensional structure (PDB accessions 1RCX, [51]).

Detecting coevolving sites

CAPS (Coevolution Analysis using Protein Sequences, [11, 12];) was used for identifying coevolution between amino acid sites. Blosum-corrected amino acid distances were used to identify amino acid covariation. Phylogenetic relationships were used to remove the phylogenetic and stochastic dependencies between sites. The 3D protein structure was used to identify the nature of the dependencies between coevolving amino acid sites [11]. Times of divergence were estimated as the mean number of substitutions per synonymous site. DNA alignment of rbcL sequences of the worldwide dataset of 116 Ilex specimens, the corresponding plastid ML-optimized tree of Manen et al. [36], and the previously determined 3-D structure of the large subunit of Ilex canariensis Rubisco, were used as inputs. Parameters were: alpha threshold 0.001, number of simulated alignments 1000, bootstrap value 0.95, await convergence and time correction on.

Availability of data and materials

All data included in the manuscript has been deposited in GenBank and their accession numbers can be found in supplementary files.



Coevolution Analysis using Protein Sequences


Global Biodiversity Information Facility


Million years ago


Phylogenetic analysis by maximum likelihood


Principal component analysis


Protein Data Base


Ribulose-1,5-bisphosphate carboxylase-oxygenase


Sitewise Likelihood-Ratio tests


  1. Antonelli A. Higher level phylogeny and evolutionary trends in Campanulaceae subfam. Lobelioideae: Molecular signal overshadows morphology. Molecular phylogenetics and evolution. 2008;46(1):1–18.

    CAS  PubMed  Article  Google Scholar 

  2. APG IV. An update of the angiosperm phylogeny group classification for the orders and families of flowering plants. Bot J Linn Soc. 2016;181:1–20.

    Article  Google Scholar 

  3. Baas P. Inheritance of foliar and nodal anatomical characters in some Ilex hybrids. Bot J Linn Soc. 1978;77:41–52.

    Article  Google Scholar 

  4. Bathellier C, Tcherkez G, Lorimer GH, Farquhar GD. Rubisco is not really so bad. Plant Cell Environ. 2018;41:705–16.

    CAS  PubMed  Article  Google Scholar 

  5. Biasini M, Bienert S, Waterhouse A, Arnold K, Studer G, Schmidt T, Kiefer F, Gallo Cassarino T, Bertoni M, Bordoli L, Schwede T. SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information. Nucleic Acids Res. 2014;42:W252–8.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  6. Bracher A, Hauser T, Liu C, Hartl FU, Hayer-Hartl M. Structural analysis of the Rubisco-assembly chaperone RbcX-II from Chlamydomonas reinhardtii. PLoS One. 2015;10:e0135448.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  7. Christin PA, Salamin N, Muasya AM, Roalson EH, Russier F, Besnard G. Evolutionary switch and genetic convergence on rbcL following the evolution of C4 photosynthesis. Mol Biol Evol. 2008;25:2361–8.

    CAS  PubMed  Article  Google Scholar 

  8. Durão P, Aigner H, Nagy P, Mueller-Cajar O, Hartl FU, Hayer-Hartl M. Opposing effects of folding and assembly chaperones on evolvability of Rubisco. Nat Chem Biol. 2015;11:148–55.

    Article  CAS  PubMed  Google Scholar 

  9. Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–7.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  10. Erb JT, Zarzycki J. A short history of RubisCO: the rise and fall (?) of Nature’s predominant CO2 fixing enzyme. Curr Opin Biotechnol. 2018;49:100–7.

    CAS  PubMed  Article  Google Scholar 

  11. Fares MA, McNally D. CAPS: coevolution analysis using protein sequences. Bioinformatics. 2006;22:2821–2.

    CAS  PubMed  Article  Google Scholar 

  12. Fares MA, Travers SA. A novel method for detecting intramolecular coevolution: adding a further dimension to selective constraints analyses. Genetics. 2006;173:9–23.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  13. Fay MF, Bayer C, Alverson WS, de Bruijn AY, Chase MW. Plastid rbcL sequence data indicate a close affinity between Diegodendron and Bixa. Taxon. 1998;47:43–50.

    Article  Google Scholar 

  14. Galmés J, Flexas J, Keys AJ, Cifre J, Mitchell RA, Madgwick PJ, Haslam RP, Medrano H, Parry MA. Rubisco specificity factor tends to be larger in plant species from drier habitats and in species with persistent leaves. Plant Cell Environ. 2005;28:571–9.

    Article  Google Scholar 

  15. Galmés J, Andralojc PJ, Kapralov MV, Flexas J, Keys AJ, Molins A, Parry MAJ, Conesa MA. Environmentally driven evolution of Rubisco and improved photosynthesis and growth within the C3 genus Limonium (Plumbaginaceae). New Phytol. 2014a;203:989–99.

    PubMed  Article  CAS  Google Scholar 

  16. Galmés J, Kapralov MV, Andralojc PJ, Conesa MA, Keys AJ, Martin AJ, Parry MAJ, Flexas J. Expanding knowledge of the Rubisco kinetics variability in plant species: environmental and evolutionary trends. Plant Cell Environ. 2014b;37:1989–2001.

    PubMed  Article  CAS  Google Scholar 

  17. Gao YD, Harris AJ, Zhou SD, He XJ. Evolutionary events in Lilium (including Nomocharis, Liliaceae) are temporally correlated with orogenies of the Q–T plateau and the Hengduan Mountains. Mol Phylogenet Evol. 2013;68:443–60.

    PubMed  Article  Google Scholar 

  18. Gruber AV, Feiz L. Rubisco assembly in the chloroplast. 2018;Frontiers in Molecular Biosciences. 5:24.

  19. Gregory-Wodzicki KM. Uplift history of the central and northern Andes: a review. Geol Soc Am Bull. 2000;112:1091–105.

    Article  Google Scholar 

  20. Guex N, Peitsch MC. SWISS-MODEL and the Swiss-Pdb viewer: an environment for comparative protein modeling. Electrophoresis. 1997;18:2714–23.

    CAS  PubMed  Article  Google Scholar 

  21. Hao DC, Mu J, Xiao PG. Molecular evolution and positive Darwinian selection of the gymnosperm photosynthetic Rubisco enzyme. Bot Stud. 2010;51:491–510.

    CAS  Google Scholar 

  22. Hauser T, Bhat JY, Miličić G, Wendler P, Hartl FU, Bracher A, Hayer-Hartl M. Structure and mechanism of the Rubisco-assembly chaperone Raf1. Nat Struct Mol Biol. 2015a;22:720–8.

    CAS  PubMed  Article  Google Scholar 

  23. Hauser T, Popilka L, Hartl FU, Hayer-Hartl M. Role of auxiliary proteins in Rubisco biogenesis and function. Nat Plants. 2015b;1:15065.

    CAS  PubMed  Article  Google Scholar 

  24. Hermida-Carrera C, Fares MA, Fernández Á, Gil-Pelegrín E, Kapralov MV, Mir A, Molins A, Peguero-Pina JJ, Rocha J, Sancho-Knapik D, Galmés J. Positively selected amino acid replacements within the RuBisCO enzyme of oak trees are associated with ecological adaptations. PLoS One. 2017;12:e0183970.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  25. Hu S, Sablok G, Wang B, Qu D, Barbaro E, Viola R, Li M, Varotto C. Plastome organization and evolution of chloroplast genes in Cardamine species adapted to contrasting habitats. BMC Genomics. 2015;16:306.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  26. Iida S, Miyagi A, Aoki S, Ito M, Kadono Y, Kosuge K. Molecular adaptation of rbcL in the heterophyllous aquatic plant Potamogeton. PLoS One. 2009;4:e4633.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  27. Kapralov MV, Smith JAC, Filatov DA. Rubisco evolution in C4 Eudicots: an analysis of Amaranthaceae sensu lato. PLoS One. 2012;1:e52974.

    Article  CAS  Google Scholar 

  28. Kapralov MV, Filatov DA. Molecular adaptation during adaptive radiation in the Hawaiian endemic genus Schiedea. PLoS One. 2006;1:e8.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  29. Kapralov MV, Filatov DA. Widespread positive selection in the photosynthetic Rubisco enzyme. BMC Evol Biol. 2007;7:73.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  30. Kapralov MV, Kubien DS, Andersson I, Filatov DA. Changes in Rubisco kinetics during the evolution of C4 photosynthesis in Flaveria (Asteraceae) are associated with positive selection on genes encoding the enzyme. Mol Biol Evol. 2011;28:1491–503.

    CAS  PubMed  Article  Google Scholar 

  31. Knobloch E, Mai DH. 1986. Monographie der Früchte und Samen in der Kreide von Mitteleuropa. Rozpr. Ustredni Ustav Geologicky. 1986;47:5–219.

    Google Scholar 

  32. Liu J-Q, Wang Y-J, Wang A-L, Hideaki O, Abbott RJ. Radiation and diversifcation within the Ligularia-Cremanthodium-Parasenecio complex (Asteraceae) triggered by uplift of the Qinghai-Tibetan plateau. Mol Phyl Evol. 2006;38:31–49.

    CAS  Article  Google Scholar 

  33. Li X, Sun B, Xiao L, Wu J, Lin Z. Leaf macrofossils of Ilex protocornuta sp. nov. (Aquifoliaceae) from the Late Miocene of East China: Implications for palaeoecology. Review of Palaeobotany and Palynology. 2010;161(1-2):87–103.

    Article  Google Scholar 

  34. Loizeau P-A, Barriera G, Manen J-F, Broennimann O. Towards an understanding of the distribution of Ilex L. (Aquifoliaceae) on a world-wide scale. Biol Skr. 2005;55:501–20.

    Google Scholar 

  35. Magalon and Sanderson, 2005" should be "Magallóan SA, Sanderson MJ. Angiosperm divergence times: the effect of genes, codon positions, and time constraints. Evolution. 2005;59(8):1653–1670.

    Article  Google Scholar 

  36. Manen J-F, Barriera G, Loizeau PA, Naciri Y. The history of extant Ilex species (Aquifoliaceae): evidence of hybridization within a Miocene radiation. Mol Phylogenet Evol. 2010;57:961–77.

    PubMed  Article  Google Scholar 

  37. Manen J-F. Are both sympatric species Ilex perado and Ilex canariensis secretly hybridizing? Indication from nuclear markers collected in Tenerife. BMC Evol Biol. 2004;4:1.

    Article  CAS  Google Scholar 

  38. Manen J-F, Boulter MC, Naciri-Graven Y. The complex history of the genus Ilex L.(Aquifoliaceae): evidence from the comparison of plastid and nuclear DNA sequences and from fossil data. Plant Syst Evol. 2002;235:79–98.

    CAS  Article  Google Scholar 

  39. Manen J-F, Cuénoud P, Martinez MD. Intralineage variation in the pattern of rbcL nucleotide substitution. Plant Syst Evol. 1998;211:103–12.

    Article  Google Scholar 

  40. Massingham T, Goldman N. Detecting amino acid sites under positive selection and purifying selection. Genetics. 2005;169(3):1753–1762.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  41. Parto S, Lartillot N. Molecular adaptation in Rubisco: discriminating between convergent evolution and positive selection using mechanistic and classical codon models. PLoS One. 2018;

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  42. Pennington RT, Lavin M, Särkinen T, Lewis GP, Klitgaard BB, Hughes C. Contrasting plant diversification histories within the Andean biodiversity hotspot. PNAS. 2010;107:13783–7.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  43. Portis AR, Li C, Wang D, Salvucci ME. Regulation of Rubisco activase and its interaction with Rubisco. J Exp Bot. 2008;59:1597–604.

    CAS  PubMed  Article  Google Scholar 

  44. Pottier M, Gilis D, Boutry M. The hidden face of Rubisco. Trends Ecol Evol. 2018;

    CAS  PubMed  Article  Google Scholar 

  45. Sanderson MJ, Doyle JA. Sources of error and confidence intervals in estimating the age of angiosperms from rbcL and 18S rDNA data. Am J Bot. 2001;88:1499–516.

    CAS  PubMed  Article  Google Scholar 

  46. Sen L, Fares MA, Liang B, Gao L, Wang B, Wang T, Su YJ. Molecular evolution of rbcL in three gymnosperm families: identifying adaptive and coevolutionary patterns. Biol Direct. 2011;6:29.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  47. Setoguchi H, Watanabe I. Intersectional gene flow between insular endemics of Ilex (Aquifoliaceae) on the Bonin Islands and the Ryukyu Islands. Am J Bot. 2000;87:793–810.

    CAS  PubMed  Article  Google Scholar 

  48. Sharwood RE, von Caemmerer S, Maliga P, Whitney SM. The catalytic properties of hybrid Rubisco comprising tobacco small and sunflower large subunits mirror the kinetically equivalent source Rubiscos and can support tobacco growth. Plant Physiol. 2008;146:83–96.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  49. Shi L, Li N, Wang S, Zhou Y, Huang W, Yang Y, Ma Y, Zhou R. Molecular evidence for the hybrid origin of Ilex dabieshanensis (Aquifoliaceae). PLoS One. 2016;11:e0147825.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  50. Studer RA, Christin PA, Williams MA, Orengo CA. Stability-activity tradeoffs constrain the adaptive evolution of RubisCO. PNAS. 2014;111:2223–8.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  51. Taylor TC, Andersson I. The structure of the complex between rubisco and its natural substrate ribulose 1,5-bisphosphate. J Mol Biol. 1997;265:432–44.

    CAS  PubMed  Article  Google Scholar 

  52. Wachter RM, Salvucci ME, Carmo-Silva AE, Barta C, Genkov T, Spreitzer R. Activation of interspecies-hybrid Rubisco enzymes to assess different models for the Rubisco–Rubisco activase interaction. Photosynth Res. 2013;117:557–66.

    CAS  PubMed  Article  Google Scholar 

  53. Wang M, Kapralov MV, Anisimova M. Coevolution of amino acid residues in the key photosynthetic enzyme Rubisco. BMC Evol Biol. 2011;11:266.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  54. Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007;24:1586–91.

    CAS  PubMed  Article  Google Scholar 

  55. Yao X, Tan Y-H, Liu Y-Y, Song Y, Yang J-B, Corlett RT. Chloroplast genome structure in Ilex (Aquifoliaceae). Sci Rep. 2016;6:28559.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

Download references


We thank Juan-hong Zhang, Chun-yan Lin and Ji-xiong Yang from the Germplasm Bank of Wild Species of Kunming Institute of Botany, Chinese Academy of Sciences, for their help with experiments.


This work was supported by grants from the CAS “Light of West China” Program [Y9XB071B01 to X.Y.], the CAS 135 program [2017XTBG-T03 to X.Y.] and the 1000 Talents Program [WQ20110491035 to R.T.C.]. Y.H.T. was supported by grants from the Southeast Asia Biodiversity Research Institute, Chinese Academy of Sciences for collecting some samples.

Author information

Authors and Affiliations



XY, RTC and JFM designed the study. XY, YHT and JFM collected the samples. XY, JBY and JFM performed the experiments. XY, YW and JFM analyzed the data. XY, RTC and JFM wrote and edited the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Xin Yao, Richard T. Corlett or Jean-François Manen.

Ethics declarations

Ethics approval and consent to participate

Not applicable

Consent for publication

Not applicable

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1: Table S1.

List of world-wide Ilex species [32]. In bold, species used in the 20-species analyses.

Additional file 2: Table S2.

List of Chinese Ilex species studied here.

Additional file 3: Table S3.

List of taxa used in comparison with Ilex.

Additional file 4: Figure S1.

Comparison of the 3-dimensional structure of the Rubisco large subunit of Spinacia oleracea (in pink) with modeled 3-dimensional structure of the Rubisco large subunit of Ilex canariensis (in orange) using the “magic fit” option of Swiss-Pdb Viewer.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Yao, X., Tan, Yh., Yang, Jb. et al. Exceptionally high rates of positive selection on the rbcL gene in the genus Ilex (Aquifoliaceae). BMC Evol Biol 19, 192 (2019).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Rubisco
  • Positive selection
  • PALM
  • Environmental adaptation
  • Molecular adaptation