Nonsynonymous mutations cause amino acid changes and nonsense mutations introduce pre-mature stop codons to the CDSs. To our expectation, the derived allele frequency (DAF) spectrum exhibited a trend of DAFsyn > DAFnsy > DAFnonsense (Fig. 1b), demonstrating the selection force acting on these slightly deleterious mutations. Next, we divided the synonymous mutations (795,623 sites) into two groups according to whether the mutation increases (415,905 sites) or decreases (379,718 sites) the tAI value (Fig. 1c). We clearly observed that the synonymous mutations that increase tAI had significantly higher DAF than synonymous mutations that decrease tAI (Fig. 1b). The significance held true even after multiple testing correction [27]. This result indicates that synonymous mutations are not strictly neutral, although the final amino acid is unchanged, the efficiency during translation process could be affected by synonymous mutations and so that they are subjected to natural selection. Moreover, if the selection on tAI change is really the factor that shapes the DAF spectrum, then we should observe optimal tAI changes in more conserved genes and suboptimal or non-optimal tAI changes in less important genes. We used dN/dS values (“Methods”) to measure the conservation level of genes. Genes with lower dN/dS are more conserved. We grouped all the synonymous mutations into 50 bins with increasing dN/dS value of host genes, and calculated the mean tAI changes within each group (Fig. 1d). We found that the delta tAI values were significantly negative correlated with dN/dS of host genes (Fig. 1d), indicating that functionally more important genes tend to have optimal tAI changes. Again, the result supports the beneficial consequences of synonymous mutations that increase tAI. Note that this advantage we proposed here should be context-independent because it only relies on the change in tAI values caused by the synonymous mutations.
Classification of isoaccepting and non-isoaccepting mutations
As mentioned above, in the coding region, the mutation types could only be (1) synonymous, (2) nonsynonymous or (3) nonsense mutation. The synonymous mutations are then divided into two categories: isoaccepting and non-isoaccepting mutations. The difference is that isoaccepting mutations do not change the decoding tRNA (anticodon) of the codons while the non-isoaccepting mutations do (Fig. 2a). This definition is based on the relationship between the codon after and before mutation. Among the 795,623 synonymous mutations, 648,051 were isoaccepting and 147,572 were non-isoaccepting mutations.
The codon co-occurrence pattern and the codon context
As mentioned in the previous section, the codon co-occurrence phenomenon is the biased pattern of clustering the isoaccepting codons [19], forming the isoaccepting codon context. Simply speaking, two or more consecutive isoaccepting codons form a “stretch” (Fig. 2b). The advantage of this co-occurrence pattern connected with rapid tRNA recharging has also been introduced (Fig. 2c) [19]. This advantage would also enhance the local translation efficiency. Intuitively, the mutations that create or maintain this isoaccepting codon context should be selected for.
The codon context is defined as the relationship between two adjacent codons, such as the focal codon and its upstream codon (Fig. 2b and “Methods”). There were 20,881,471 codons in the CDSs of all transcripts annotated in A. thaliana, among which 19,157,956 were in nonsynonymous codon context and 1,626,971 were in synonymous codon context, the remaining 96,544 were discarded due to start or stop codons. The codons in synonymous context were further divided into two groups: 616,844 in isoaccepting context and 1,010,127 in non-isoaccepting context.
Context-dependent selection on isoaccepting mutations
We have already defined the mutation types according to the relationship between the codons after and before mutation (Fig. 2a and “Methods”), and defined the codon context according to the relationship between the focal codon and its 5-prime codon (Fig. 2b and “Methods”). Apparently, our next step was to link the mutation type and codon context.
Among the 795,623 synonymous mutations: (1) 648,051 were isoaccepting mutations, among which 22,294 (3.44%) were in isoaccepting context, 30,668 (4.73%) were in non-isoaccepting context, 595,065 (91.8%) were in nonsynonymous context and 24 are related to stop codons; (2) 147,572 were non-isoaccepting mutations, among which 4,057 (2.75%) were in isoaccepting context, 10,093 (6.84%) were in non-isoaccepting context, 133,413 (90.4%) were in nonsynonymous context and 9 were related to stop codons.
We illustrated the DAF distribution of different mutation types in different codon context (Fig. 2d). The result shows that DAF is higher for isoaccepting mutations in isoaccepting context compared to non-isoaccepting mutations in isoaccepting context or isoaccepting mutations in non-isoaccepting context (Fig. 2d). Similarly, higher DAF was observed for the non-isoaccepting mutations in non-isoaccepting context (Fig. 2d). However, if we looked at the synonymous mutations in nonsynonymous context, the DAF distributions between isoaccepting and non-isoaccepting mutations did not exhibit significant difference (Fig. 2d).
For simplicity, these results could be understood as the preference on isoaccepting or non-isoaccepting mutations in their own context. This trend suggests that apart from the context-independent effect like the change in tAI, the synonymous mutations could also be selected with a context-dependent manner.
Testing the results on distantly located mutations to cancel the effect of LD
Linkage disequilibrium (LD) usually causes the non-independent frequency spectrum of neighboring mutations. It is necessary to cancel the effect of LD. The LD blocks could be of different size, and therefore it is difficult to split the groups of mutations by a certain distance.
We first looked at the distance between adjacent mutations. Different types of mutations were checked separately. For the tAI-up, tAI-down, synonymous, nonsynonymous, and nonsense mutations, it is obvious that nonsense mutations have the greatest distance with each other since each gene has one nonsense mutations at most within an individual (Fig. 3a). The synonymous and nonsynonymous mutations have a median distance less than 100 bp. For the isoaccepting and non-isoaccepting mutations in different context, we found that the non-isoaccepting mutations in isoaccepting context have the greatest distance with each other (Fig. 3b), presumably due to the limited number of this kind of mutations.
Next, we should compare the patterns observed in Figs. 1 and 2 by controlling for distance. However, different types of mutations have very different distributions in the genome, making it difficult to exert a certain cutoff to control for distance. Therefore, we regarded each gene as a unit and select one mutation per gene. We believe that this option would significantly reduce the effect of LD. For each type of mutation, the one at the most 5′ end of each gene was selected. The average distance between the “mutations per gene” was 91.0 Kb (the 5% and 95% quantile was 3.4 Kb ~ 156.2 Kb). Under such a great distance, the LD should vanish rapidly. We found that the patterns observed for tAI (Fig. 3c) and isoaccepting context (Fig. 3d) still existed after control for distance. Therefore, our conclusion is robust.