- Research article
- Open access
- Published:
Aligning functional network constraint to evolutionary outcomes
BMC Evolutionary Biology volume 20, Article number: 58 (2020)
Abstract
Background
Functional constraint through genomic architecture is suggested to be an important dimension of genome evolution, but quantitative evidence for this idea is rare. In this contribution, existing evidence and discussions on genomic architecture as constraint for convergent evolution, rapid adaptation, and genic adaptation are summarized into alternative, testable hypotheses. Network architecture statistics from protein-protein interaction networks are then used to calculate differences in evolutionary outcomes on the example of genomic evolution in yeast, and the results are used to evaluate statistical support for these longstanding hypotheses.
Results
A discriminant function analysis lent statistical support to classifying the yeast interactome into hub, intermediate and peripheral nodes based on network neighborhood connectivity, betweenness centrality, and average shortest path length. Quantitative support for the existence of genomic architecture as a mechanistic basis for evolutionary constraint is then revealed through utilizing these statistical parameters of the protein-protein interaction network in combination with estimators of protein evolution.
Conclusions
As functional genetic networks are becoming increasingly available, it will now be possible to evaluate functional genetic network constraint against variables describing complex phenotypes and environments, for better understanding of commonly observed deterministic patterns of evolution in non-model organisms. The hypothesis framework and methodological approach outlined herein may help to quantify the extrinsic versus intrinsic dimensions of evolutionary constraint, and result in a better understanding of how fast, effectively, or deterministically organisms adapt.
Background
Genetic constraint and evolutionary outcomes
Understanding the genetic dimension of evolutionary constraint is crucial to understanding adaptation and other repeatedly observed outcomes of evolution such as convergent phenotypes, rapid adaptation, or genic evolution (see Glossary in Table 1). For example, divergent genetic populations of the well-studied Caribbean lizard Anolis cybotes [3] have nonetheless evolved convergent phenotypic, ecological, reproductive, and physiological adaptations to high elevations on three separate mountain chains, which is mirrored by genomic adaptations in a subset of genes [4,5,6,7]. These observations made in natural populations suggest that the variants available to mutation and selection may be constrained at the genomic level, enabling faster adaptations and higher rates of convergent evolution than were possible without constraint.
Many studies have shown the non-independence of genes from one another, be it through physical linkage, phylogenetic relationship (e.g., in the case of whole genome duplications), or functional interaction (Fig. 1). Futuyma [8] cited Schluter [9], noting that correlations between genes could reduce the degrees of freedom on which selection can operate. Mayr [10] stated that “coadapted” genes are a result of natural selection, being brought together to form a “balanced system”, but ruled out that such gene complexes would be of any interest to evolutionary biology, as ultimately only the complete phenotype is selected (8, p.184ff). Nonetheless, evolutionary trajectories of complex phenotypes have been extensively studied through the concept of the genetic variance-covariance or G-matrix [11]. Some mechanistic properties of the genome leading to the constraints that can be expressed as a G-matrix are trait polygeny, trait pleiotropy, and linkage (Figs. 1), [12], but the evolutionary constraints of correlated traits implied from a G-matrix can be rapidly overcome in only a few generations [13], hinting at additional genomic properties influencing evolutionary constraint. A more detailed look into the mechanistic basis of constrained phenotypic evolution at the molecular level is therefore necessary, and now made possible through the rapid accumulation of genomic and other molecular -omics data sets in the public domain.
Genetic constraint through gene functional importance
The functional importance of a gene has long been thought to cause such evolutionary constraint at the molecular level: protein-coding genes that are indispensable for the organism should be highly conserved and thus, be constrained through evolution, as most nonsynonymous mutations would be detrimental to protein function and thus would most likely result in non-adaptive phenotypes. Consequently, these genes should have a lower rate of molecular evolution. Such genes have formerly been identified through their “dispensability”. This term describes how essential genes are for organismal function within a certain environmental context, which can be estimated through knockout experiments (Table 1).
Zhang and Yang [14] reviewed evidence from empirical studies, but found that essential genes are not evolving more slowly than nonessential genes. Instead, highly expressed genes seem to have lower rates of protein evolution (dubbed the “E-R anticorrelation” [15, 16], which some authors relate to translational selection on amino acids with different metabolic cost [15]. Many studies have ascribed an important role to gene expression levels in constraining the evolutionary rate of proteins [17,18,19]. But perhaps, functional importance needs to be defined differently than via gene essentiality or dispensability, and expression level may be a correlative variable linked to another cause. In Saccharomyces cerevisiae (in following: yeast), which was used for many studies on protein evolutionary rate and functional importance, essential genes are required for organismal growth and performance under optimal environmental conditions. A gene that renders an organism nonfunctional may thus predominantly be active in genetic pathways related to development and growth. However, in a multicellular organism such as a vertebrate, also the genes that are essential for organismal viability and reproduction are of high functional importance, and potentially could be under evolutionary constraint [20], such as in the example of genes coding for eye color determining mating success in Drosophila melanogaster [21]. A high proportion of the human genome has also been found to be under selective constraint in other mammals, indicating that gene dispensability is not a binary variable [22]. As reviewed by Zhang and Yang [14], Wilson et al. [23] suggested that evolutionary rate may be determined by both functional importance and functional constraint [14, 23]. If functional importance measured as (negative) gene dispensability does not predict variations and constraints of evolutionary rate, perhaps functionally more constrained genes are the ones evolving slowly. Prior studies have attempted to identify functional constraint in terms of which sites within a protein are essential for performing its function, called protein functional site constraint in Fig. 1. The Neutral Theory [24] already identified codon constraint where nonsynonymous mutations are of larger consequence than synonymous ones as being important for evolution (Fig. 1).
Genetic constraint through gene pleiotropy and network architecture
During the recent decades, network thinking has emerged as a powerful approach for better understanding biological realities [25]. The network concept might also have deep implications in evolutionary biology. Gene interaction networks were found to evolve either faster or slower than comparable genes functioning without being connected to others [26,27,28,29], and gene regulatory circuits convergently evolve in the absence of shared ancestry [30]. The overall network architecture or hierarchy of genes within the network is likely to contribute to the speed and mode of evolution and the phenotype components associated with it, regulated through functional constraint of nodes within the network [25]. For example, a study by Jeong and colleagues [31] found that genes with many functional interaction partners are also likely to be essential, which, however, does not provide enough evidence to extrapolate directly from functional constraint to evolutionary outcome.
Functional genetic network structure has been shown to affect evolutionary outcomes through “gene pleiotropy” in yeast: gene products that interact with many others are thought to be involved in many cellular pathways and by that means, to have multiple (pleiotropic) effects on the cellular function [32, 33]. Fitness effects of mutations in pleiotropic genes could be partitioned across several phenotypic components, increasing the likelihood of maladaptive effects, which means that they should be more conserved through evolution and evolve more slowly [32]. However, it is important to note, despite that a connection between pleiotropic gene and pleiotropic phenotype was implied in these studies [32, 33], gene pleiotropy or the number of functional connections a gene has with others should be regarded as a distinct concept from phenotypic pleiotropy, unless such a relationship to pleiotropic phenotype has been demonstrated. Proteins with many interactants may be constrained in the evolution of their functional sites to instances of co-evolution with the interactant genes, in order to maintain their functionality [34]. A corresponding model of evolutionary constraint on evolution through gene pleiotropy that was explicitly based on functional network node hierarchy within an interactome was proposed by Pavlicev and Wagner [35]. They argued that for genetic adaptation in a target gene to happen, selection has to overcome the inertia generated through stabilizing selection of the genes functionally connected with the target [35]. The premise of this model is that any change in genotype-phenotype interaction represents a change in a developmental pathway and, due the position of a gene within a network, will have pleiotropic effects on the phenotype [35]. Empirical research into this topic found that most pleiotropic genes with many interaction partners only had a small pleiotropic effect on the phenotype, but some genes with large phenotypic effect were also more pleiotropic [20]. High gene pleiotropy is assumed to have a cost for adaptation, which was explained as nodes central to a network with many interaction partners evolving slower [20, 36]. This idea, dubbed the “cost of complexity” [37] would lead to faster evolution of organisms with less complex genomic architecture due to this constraint being relaxed [20], and to adaptive selection on standing genetic variation preferentially to occur in genes with low pleiotropic effects [38]. Concerning evolutionary outcomes, gene pleiotropy was suggested to limit events of genomic co-evolution [34], genomic adaptation [38], and convergent evolution [38] in nodes central to a network. Consequently, the properties of nodes within a functional genetic network may be informative to understand their evolutionary constraint. However, gene pleiotropy was defined by most of these authors [32,33,34,35, 37] as synonymous with the number of interactants, and also with centrality in a network - but looking at more recently generated interactomes, nodes topologically central to the network are usually not the nodes with the highest number of connections. Instead, nodes with most connections are located in intermediate positions within a network [39]. The number of edges of a node consequently, may not be sufficient to disentangle the effects of network structure on evolutionary constraint since it only measures one of a network’s many properties. The concept of variable genomic networks existing within populations was first explored by Wagner [40] and was represented through a hypothetical “genotype space” of similar phenotypes that might correspond to the concept of “phenotypic optima”. Selection can cause a population to modify their genotype networks in a way that renders them more robust to changes in the fitness landscape.
While the concept of network architecture influencing evolutionary outcomes is known from the studies outlined above and from others, in many cases this concept has not been sufficiently transformed into testable hypotheses yet, and correspondingly, no straightforward methodology exists for biologists to test them empirically. The first aim of this paper is to deconstruct the abstract concept of gene pleiotropy by setting genomic network architecture in relation to the three evolutionary outcomes: 1) genomic re-use generating convergent phenotypes, 2) the simultaneous occurrence of convergence and divergence within a genome, resulting in genic adaptation, and 3) the speed of adaptation. For this purpose, I propose three categories of nodes with different putative evolutionary trajectories. I set these categories in relation to previously defined hypotheses and expectations aligning functional network constraint to evolutionary outcomes. The second aim is to demonstrate how these hypotheses can be quantitatively tested. First, the nodes of the yeast interaction network are transformed into categories based on network statistical parameters and discriminant function analysis. Secondly, these are then tested for differences of estimators of evolutionary outcomes using data from yeast evolution. For this I use published data, which aligns this study to others with similar questions [15, 41, 42] but utilizes a novel approach.
Results
Aim 1: novel node classification scheme based on network statistics
A testable, hypothetical scenario of how functional genetic network architecture could influence evolutionary outcomes is shown in Fig. 2. The relationship between pre-existing hypotheses and results from the present study is shown in Table 2. Values for three network statistical parameters were obtained from the yeast interactome whose definition corresponds to the above outlined node types. Those parameters were average shortest path length (maximal in peripheral nodes), neighborhood connectivity (maximal in nodes intermediate to the network), and betweenness centrality (maximal in nodes connecting subnetworks). Maximal values for each statistical parameter were used to bin nodes into P, I and H nodes (which stands for peripheral, intermediate, and hub nodes). A Discriminant Function Analysis yielded significant support for the allocation of network statistical parameters to these P, I, and H node categories (Fig. 3, Table 3). To explore the network position of nodes that have undergone convergent adaptation, yeast ORF IDs that were demonstrated experimentally to show convergent genomic adaptation in independent experiments, strains, or species of yeasts were identified from the literature ([38, 48,49,50,51,52], Table 4). These nodes were classified as C-nodes. All network statistical parameters significantly differed between node categories, as shown with Kruskal-Wallis tests: Average shortest path length: KW-H (3,2208) = 1220.590, p < 0.0001; neighborhood connectivity, KW-H(3,2204) = 926.571, p < 0.0001; betweenness centrality KW-H(3,2208) = 293.849, p < 0.0001 (Fig. 4). I-nodes contain the highest number of edges and connect sub-networks, but are not defined with respect to their expression. C-nodes had network statistical parameters most similar to I-nodes (cf. inset network in Fig. 4).
Aim 2: genetic constraint and network architecture influencing evolution
To test the influence of functional network constraint on the evolutionary outcomes rapid adaptation and convergent evolution, as well as gene expression, I rearranged and expanded on this [41] data set (see Methods). I then tested, how network statistical parameters relate to estimators of evolutionary parameters ω, γ, and CAI (Codon Adaptation Index). The amount of mRNA produced by each gene in regular somatic cells can be estimated by CAI which is derived from codon use bias in yeast that correlates with mRNA levels (based on [54, 55]). First, a general linear model was run with evolutionary parameters as dependent variables, and network parameters as predictor variables. All three network statistical parameters were found to significantly predict estimators for evolutionary outcomes (Table 5). All node categories have significantly different values for ω (KW-H(3,2204) = 20.1345, p = 0.0002), CAI (KW-H(3,2195) = 26.1472, p = 0.00001) and γ (KW-H(3,2195) = 36.7936, p = 0.00000), as shown by Kruskal-Wallis tests (Fig. 5). Figure 5 shows that the highest values of ω are found both in P and I-nodes with almost identical median values (0.93 vs. 0.91), while γ is highest in P and I-nodes and H nodes, as expected, had highest values of CAI.
The effect size of network statistical parameters as predictors for variables estimating evolutionary outcomes relative to CAI was then determined. When CAI was incorporated into the analysis to predict values of ω and γ, average shortest path length (P-node classifier) was the predictor with highest power (0.99), followed by its interaction term with CAI (0.97), then CAI itself (0.92), followed by neighborhood connectivity (the I-node classifier, 0.83). Only betweenness centrality (the H-node classifier) was not significantly contributing to this model (Supplementary Tables 1 and 2). As mentioned previously, Fig. 5 shows that the estimator of gene expression levels is highest in H-nodes, which might explain why CAI was seen as a better predictor for ω and γ than betweenness centrality. However, AIC based model selection revealed that a global model of all four variables including CAI and network parameters explains ω and γ better than CAI itself (Supplementary Tables 3 and 4). For ω, the only model with higher likelihood than the global model is that excluding betweenness centrality, whilst the CAI-only model ranks 8th. The rewiring score γ is best explained by the global model and the CAI-only model ranks 5th.
Discussion
Aim 1: novel node classification scheme based on network statistics
One scenario of how functional genetic network architecture could influence evolutionary outcomes (Fig. 2) is the topography of nodes in a functional genomic network. When selection acts upon a population (for example, through a sudden change in climate), advantageous mutations will be selected from standing genetic variation (allele frequencies). A population will have standing genetic variation in different nodes of the network, which is dependent on the topological position of the nodes. Organisms possess a finite subset of biochemical pathways (underlying functional genetic networks) such as those related to temperature homeostasis [56,57,58], and that align to a finite amount of selected phenotype components. Figure 2 shows how the population must adapt to this newly arising selective pressure through selection of advantageous mutations in one of these subnetworks, but not by selecting mutations in any other subnetwork, as these are unrelated to the stimulus or organismal fitness in response to it, and would therefore not result in adaptation. This does not mean that other subnetworks are not under any selection, or under stabilizing selection for other causes, or that selection on one subnetworks does not influence others, but is a simplification here used for the purpose of classifying nodes. This constrains the number of mutations in the genome that selection will operate on, and thus determines the evolutionary response through genetic constraint. Second, and of high importance for the new classification scheme proposed here, node hierarchy within these subnetworks poses an additional level of constraint: and this additional level reduces the “evolutionary search space” for potential beneficial variants.
This can be illustrated through the following hypothetical construct, which reduces network structure to distinct types of nodes. Network nodes, which are functionally important for the operation of the network (hub nodes central to the network, H-nodes), are less likely to harbor significant genetic variation in first- or second-codon positions or regulatory regions because of their high functional constraint. Consequently, genetic variation, as well as adaptation to an environmental selective pressure, should both be more likely to occur within non-hub nodes within the subnetwork. Nodes with the highest number of edges are intermediately positioned within a network (intermediate nodes, I-nodes) and were shown to have weaker selective constraint [44, 45] than centrally positioned nodes, as they have lower functional constraint than H-nodes. Consequently, they should evolve faster. This assumption differs from the gene pleiotropy hypothesis, which places the highest functional constraint on these nodes (Table 2). However, because of their high cost of complexity, adaptation in I-nodes should be highly constrained in terms of which genes can adapt (depending on the nature of their functional interactions) and how (through changing the wiring pattern with other nodes). Because of gene pleiotropy, adaptations that do evolve in these nodes should have a larger phenotypic effect, which combined with the reduced possibilities for adaptation, increases the likelihood for convergent evolution in them. Genes peripheral in the network (peripheral nodes, P-nodes) have higher degrees of freedom due to the lowest degree of gene pleiotropy and should be able to accumulate genetic variation with least cost. Therefore, the population should already harbor more genetic variation within these peripheral genes on which selection can operate. Change in such nodes however, due to lower gene pleiotropic interactions, would result in less phenotypic effect and thus they are less likely to promote large evolutionary changes. In such nodes, divergence is more likely to accumulate than convergence. The expectation is thus that different node types will differ in standing genetic variation due to the different genetic constraints acting upon them. H-nodes will be very strongly constrained and only can accumulate little standing genetic variation, resulting in a low potential for selection to operate on. I-nodes will harbor sufficient standing genetic variation but be under high functional constraint, so that selection can only operate on a limited number of variants that all have multiple phenotypic effects. In different organisms, the same variants can be selected quickly due to this reduced search space, which leads to parallel genomic evolution resulting in convergent phenotypes.
P-nodes will be least constrained, allowing a lot of variation but less sweeping phenotypic effects due to lower gene pleiotropy. Selection can operate on multiple variants in these; selective advantages are more likely due to the lower gene pleiotropy in more genes, so selection will less likely lead to convergence. All three evolutionary outcomes can (among other factors such as gene expression levels) be explained with this mechanism of constraint through functional genetic network structure. As Fraser [59] pointed out, objective classification of nodes into any such type of categories is important, and they should reflect true topological properties of an interactome. In this study, significant support was found that the yeast network can be partitioned into these node categories, based on the network statistical parameters neighborhood connectivity, betweenness centrality, and average shortest path length.
Fraser’s [59] first exploration of the influence of modules within genomes and their hub nodes (called “modularity”) found that the rate of protein evolution is faster in intramodule hubs (nodes that link genes with high co-expression in response to a stimulus) compared to intermodule hubs (linking low genes with low co-expression in response to a stimulus, defined after [60]. The node classification scheme of Fraser [59] was based on gene co-expression and not on node topology, and gene co-expression in response to a stimulus followed a bimodal distribution. In this contribution, I nodes instead have the highest number of edges and connect sub-networks, but are not defined with respect to their expression (Table 2). In line with the expectations outlined above, C-nodes previously identified [38, 48,49,50,51,52] to have undergone convergent genomic adaptation in independent experiments, strains, or species of yeasts were located within the I-nodes category, showing that convergently adapting genes have similar evolutionary rates, expression levels, and degrees, as nodes that are located intermediately in the interactome.
Aim 2: genetic constraint and network architecture influencing evolution
A recent paper published by Schoenrock et al. [41] uses a data set of 4179 protein-coding genes (sourced from 13,40) to investigate the involvement of network structure in protein evolution. This data set was generated for five species of yeasts (Saccharomyces cerevisiae, S. paradoxus, S. bayanus, S. kudriavzevii, and S. mikatae). The study compared a quantitative variable related to network structure (computationally predicted re-wiring of nodes through evolution γ), with an estimator of protein evolutionary rate on nodes (substitution rate ω, measured as dN/dS). The authors found that the degree of rewiring of nodes across the phylogeny was only poorly associated with evolutionary sequence divergence, but nodes with very low evolutionary rate had high variability of rewiring scores, which indicates that changing gene interactions is an important mechanism how functionally constrained genes may evolve. While the study remained somewhat inconclusive about the influence of network structure and node rewiring on protein evolution, the data contained within it, combined with additional data, allowed me to test the hypotheses outlined above using the new node classification scheme (Table 2). In this study, all three estimators of evolutionary parameters (substitution rate, re-wiring score, and gene expression levels) were significantly predicted by the three node categories.
With respect to rapid adaptation, the highest estimated substitution rates are found both in P and I-nodes, which shows that nodes located less centrally in the network evolve faster than other nodes. However, peripheral nodes were not identified as adapting particularly fast. CAI increases towards the center of the network, with mRNA expression level being highest in hub nodes. Network node hierarchy may therefore be able to explain the E-R anticorrelation (gene expression levels being negatively correlated with evolutionary rate [14]. H-nodes connect various subnetworks with one another, and thus are likely to be involved in more diverse functions (which might be partitioned across different tissues, processes or life history phases), than nodes more peripheral in a network (Figs. 5, 54). Such common functions may require a high amount of product, which may translate into high levels of mRNA expression in these nodes. γ is highest in P and I-nodes, indicating that evolutionary rewiring events are more common in less central parts of the networks. An interesting subject for further study may be to compare explicit topologies of nodes that underwent re-wiring through evolution, in order to determine whether they can additionally move between I and P node categories over time. I-nodes harbor the majority of edges within a network - genetically re-wiring these nodes could lead to rapid adaptation [61]. Centrality of H-nodes seems to reduce their adaptability while peripheral and intermediate nodes are less constrained to adapt, and this process may involve rewiring within the network. This demonstrates how functional constraint can explain evolutionary outcomes better than estimators for gene dispensability can. Rapid genomic adaptation within diversifying populations has been shown to occur as a rapid response to selection such as anthropogenic pollution [62]. Such rapid adaptation often occurs in the presence of gene flow [47, 62]. This means that adaptation is constrained to specific genes under selection, which can interrupt their gene flow between populations, while alleles of other genes show uninterrupted gene flow. The speed of such adaptation related to divergence of a subset of genes within the same genome has been dubbed the “genic theory of speciation” [47]. Such genic evolution was shown to occur in Timema stick insects [63]. Future studies could test whether such rapidly adapting loci are preferentially located in P- and I-nodes, and whether this leads to a change in inter-node wiring patterns.
With respect to genic adaptation, Schoenrock et al. [41] could show that some functionally similar nodes experienced lower than expected levels of protein evolution, indicating purifying selection. Nodes that were evolving through fewer re-wiring events than expected, included functions related to phosphorylation, mitochondrial translation, response to pheromone, small GTPase mediated signal transduction, and transport. Nodes that were evolving among the five yeast species with higher than expected degrees of re-wiring, included the functions metabolic process, and various gene ontologies related to transcription and its regulation, as well as the regulation of transposition regulation. As indicated in Fig. 2, these results prove that evolutionary outcomes are different for functionally different subnetworks within an interactome. It might be worth noting that, as outlined above, none of these functions is particularly related to growth but rather to maintaining organismal function, which is why they would be overlooked if conserved genes were only classified by the criterion of dispensability for colony growth. Gresham et al. [51] similarly showed that evolutionary constraint in experimentally evolved yeast populations over 200 generations is dependent on the type of selection (limiting Glucose or Phosphate vs. Sulphur), with convergence being an outcome of the system level organization of the respective metabolic pathway. Additionally, the same differences in evolutionary estimators between node categories that could promote rapid adaptation, would allow genes in different node categories to evolve with differential speed, which would allow for genic adaptation.
Traditionally, convergence has been studied in non-model organisms, and with a focus on adaptive modification of the phenotype (e.g., [64]). More recently, phenotypic convergence has been traced back to in some instances resulting from identical genotypic variants (called “genomic re-use”, reviewed in [38]). These can arise either as new parallel mutations or from parallel selection of the same alleles from standing genetic variation [38] such as in the independent selection of body armor in the ectodysplasin locus of stickleback fish [65]. Other examples have recently been uncovered in skin toxin transport in poison frogs, [66] or in functional genomic adaptation to cold in a range of extant and extinct mammals including the mammoth [67]. However, convergent phenotypic adaptations can alternatively be produced by different genes, and a recent study on convergently evolved Anolis lizard ecomorphs found no convergence at the amino acid level [68]. Convergence events may also be exaptations, where a similar allele evolved due to ancestrally different selective pressures with a subsequent change of function [69]. Genomic re-use in some distantly related lineages but not in others, may indicate that constraint at the genomic level limits the evolutionary search space, but can manifest in different ways at the nucleotide level. In this study, a small number of convergently adapting genes in yeasts were preferentially located within I-nodes, as aligned to this hypothesis. This supports the notion that nodes with the highest number of edges and intermediate network position are constrained to adapt and thus increase the likelihood for convergent evolution. Gresham et al. [51], from which five C-nodes were obtained, also showed that convergent evolution is related to system level organization of the respective metabolic pathway. In summary, these results clearly demonstrate a relationship between network architecture and convergence, and if additional genes will become known to evolve convergently in yeast, this hypothesis can be further tested.
Previous studies had identified gene expression level estimates (using CAI as proxy), not network topographical structure, as the predominant explanatory variable for functional genetic constraint influencing evolutionary outcomes. However, I could show here that in model ranking including CAI as an additional predictor for ω and γ, CAI in itself did not emerge as the best predictor. These results confirm that whilst gene expression levels are an important element of genetic constraint, the position of highly expressed nodes as hub nodes in the network, together with the other network topology parameters, yield better explanatory power for two estimators of evolutionary outcomes. These results further support network topology as an important agent of evolutionary constraint.
Conclusions
Metagenomic resequencing of every 500 generations within a 60,000 generation E. coli long term evolution experiment [70] revealed that certain genes accumulated beneficial mutations through selection significantly more often than expected by chance, and were very often affected by parallel adaptation [70]. These results, together with the incidences of recurrent genomic adaptations reviewed herein, demonstrate that the above-described relationship between network structure and convergent evolution may be expandable to organisms other than yeasts [38]. Apart from the quick assessment performed in this contribution, the influence of network structure in shaping evolutionary outcomes in more complex organisms than yeast such as vertebrates still needs to be comprehensively tested. Additionally, statistics computed on edge distributions in non-model organisms may change over time as more experimental evidence on interactions becomes available, and evolutionary constraint might differ by the type of interactions studied.
As demonstrated above in the yeast example, the impending advent of large-scale functional genomic networks for many new species makes it possible to convert functional genomic network structure of related species into variables describing hierarchical node position within the network. Future tests relating evolution to genomic constraint could include node architecture, and revolve around (i) Comparing standing genetic variation to network node position (while considering the effect of demography, selective sweeps, genetic drift, bottlenecks, and other levels of extrinsic constraint); (ii) Testing whether similar subnetworks/node hierarchies adapt to same selection pressure in different organisms. (iii) Comparing the speed of realized adaptation to a mutation/selection expectation, without considering network constraint. The potential benefits of better understanding genetic constraint leading to deterministic evolution may be wide ranging-- in humans, the use of functional interaction networks is omnipresent in genomic and transcriptomic study of cancer data, and recently, calls have been made for evolutionary methods to be applied to cancer problems [71]. A recent study demonstrates how the early progression of pancreatic cancer is defined through evolutionary constraints resulting from following one of three tumor suppressive pathways, and thus may be predictable [72]. Recognizing network constraint as an evolutionary force, rather than as a juxtaposition of evolution through natural selection [73], would allow us to quantify “background genetic constraint” through functional network structure. The remaining variance could then be better allocated to mutation and selection in directing rapid, convergent, and genic phenotypic evolution.
Methods
To assess evolutionary outcomes rapid adaptation and convergent evolution, as well as to address the important factor of gene expression in shaping protein-coding gene evolution, the data set of Schoenrock et al. including yeast ORF ID, computationally predicted evolutionary PPI re-wiring score (γ), and substitution rate (ω) [41] was downloaded. The re-wiring score was obtained from comparing networks across five species of yeast [41] and was used here to assess whether nodes that change wiring patterns are linked to specific positions within the network. The dataset was then rearranged and integrated with data downloaded from Wall et al. [15] including ORF ID, and CAI (Codon Adaptation Index, a measure of RNA expression levels, based on [54]. When analyzing networks, it is important to do so on exhaustive data sets [74] to avoid experimental bias [48]. Such an exhaustive interactome for yeast generated from the BIOGRID database [75] was obtained from CYTOSCAPE v.3.6.0 [76], which contained 6508 nodes and 340,000 edges, with data curated from 5500 studies. With the goal to calculate a classifier that will aid in describing hierarchical node position within networks, common network statistical parameters were calculated from this exhaustive yeast interactome in CYTOSCAPE v.3.6.0 [76] using the Network Analyzer function. Data for the matching node ORFs were appended to the data set, and variables with non-normal distribution were BoxCox transformed. The final data set contained 2209 ORFs with only a few missing data points per variable. The network statistical parameters obtained from the yeast interactome were average shortest path length (maximal in peripheral nodes), neighborhood connectivity (maximal in nodes intermediate to the network), and betweenness centrality (maximal in nodes connecting subnetworks). Nodes with maximum values for each one of these three statistical parameters, and that were not overlapping with each other (1081 nodes, Fig. 2), were each assigned to a category: P (peripheral nodes), I (intermediate nodes) and H (hub nodes). To assign node categories to the remaining nodes in the network that may be harder to allocate visually, a discriminant function analysis (DFA) was employed in STATISTICA (V13, Tibco). All remaining nodes with significant statistical support could be associated to one of these three categories (Table 3). To explore the network position of nodes that have undergone convergent adaptation, ORF IDs that were demonstrated experimentally to show convergent genomic adaptation in independent experiments, strains, or species of yeasts (C-nodes) were identified from the literature (66, 74–78), Table 4). Out of the 26 obtained C-nodes, 21 nodes were allocated by DFA to the I-category, and five were allocated to the P-category. It was then tested how network statistical parameters relate to the evolutionary parameters ω, γ, and CAI. First, a general linear model was run with evolutionary parameters as dependent variables, and network parameters as predictor variables. Differences in, respectively, network statistical parameters or estimators for evolutionary parameters, and node categories were tested with Kruskal-Wallis tests. Because previous studies [17,18,19] have ascribed gene expression (here measured as CAI) an important role for constraining evolution, it is possible that whilst network statistical parameters do explain evolutionary parameters well, this effect could disappear once CAI itself is considered as a predictor for ω and γ. This assumption was therefore tested through (i) comparing power of predictors in another linear model, including network statistical parameters, CAI, as well as interaction terms as predictors and (ii) comparing Akaike information criteria of models generated from these variables and their interaction terms.
Availability of data and materials
The datasets generated and/or analysed during the current study are included in this published article and available as Supplementary Table 5.
Abbreviations
- CAI:
-
Codon Adaptation Index
- ORF:
-
Open Reading Frame
- P-nodes:
-
Peripheral nodes
- I-nodes:
-
Intermediate nodes
- H-nodes:
-
Hub nodes
- γ:
-
Computationally predicted re-wiring of nodes through evolution
- ω:
-
Substitution rate measured as dN/dS
References
Antonovics J, van Tienderen PH. Ontoecogenophyloconstraints? The chaos of constraint terminology. Trends Ecol Evol. 1991;6(5):166–8.
Conway MS. The runes of evolution: how the universe became self-aware. 1st ed. West Consohocken: Templeton Foundation Press; 2015.
Losos JB. Lizards in an Evolutionary Tree: Ecology and Adaptive Radiation of Anoles. 1st ed. Berkeley: University of California Press; 2009. p. 528.
Hertz PE, Huey RB. Compensation for altitudinal changes in the thermal environment by some Anolis lizards on Hispaniola. Ecology. 1981;62(3):515–21.
Wollenberg KC, Wang IJ, Glor RE, Losos JB. Determinism in the diversification of Hispaniolan trunk-ground anoles (Anolis cybotes species complex). Evolution. 2013;67(11):3175–90.
Munoz MM, Stimola MA, Algar AC, Conover A, Rodriguez AJ, Landestoy MA, et al. Evolutionary stasis and lability in thermal physiology in a group of tropical lizards. Proc R Soc Lond B Biol Sci. 2014;281(1778):20132433.
Rodríguez A, Rusciano T, Hamilton R, Holmes L, Jordan D, Wollenberg Valero KC. Genomic and phenotypic signatures of climate adaptation in an Anolis lizard. Ecol Evol. 2017;7:6390–403.
Futuyma DJ. Evolutionary constraint and ecological consequences. Evolution. 2010;64(7):1865–84.
Schluter D. The ecology of adaptive radiation. Oxford: OUP; 2000. p. 296.
Mayr E. Populations, Species, and Evolution: An Abridgment of Animal Species and Evolution. 1st ed. Cambridge: Harvard University Press; 1970. p. 453.
Arnold SJ, Bürger R, Hohenlohe PA, Ajie BC, Jones AG. Understanding the evolution and stability of the G-matrix. Evolution. 2008 Oct;62(10):2451–61.
Saltz JB, Hessel FC, Kelly MW. Trait correlations in the genomics era. Trends Ecol Evol. 2017;32(4):279–90.
Conner JK, Karoly K, Stewart C, Koelling VA, Sahli HF, Shaw FH. Rapid independent trait evolution despite a strong pleiotropic genetic correlation. Am Nat. 2011;178(4):429–41.
Zhang J, Yang J-R. Determinants of the rate of protein sequence evolution. Nat Rev Genet. 2015;16(7):409–20.
Wall DP, Hirsh AE, Fraser HB, Kumm J, Giaever G, Eisen MB, et al. Functional genomic analysis of the rates of protein evolution. Proc Natl Acad Sci U S A. 2005;102(15):5483–8.
Papp B, Pál C, Hurst LD. Dosage sensitivity and the evolution of gene families in yeast. Nature. 2003;424(6945):194–7.
Bloom JD, Drummond DA, Arnold FH, Wilke CO. Structural determinants of the rate of protein evolution in yeast. Mol Biol Evol. 2006;23(9):1751–61.
Drummond DA, Bloom JD, Adami C, Wilke CO, Arnold FH. Why highly expressed proteins evolve slowly. Proc Natl Acad Sci U S A. 2005;102(40):14338–43.
Drummond DA, Raval A, Wilke CO. A single determinant dominates the rate of yeast protein evolution. Mol Biol Evol. 2006;23(2):327–37.
Wang Z, Liao B-Y, Zhang J. Genomic patterns of pleiotropy and the evolution of complexity. Proc Natl Acad Sci U S A. 2010;107(42):18034–9.
Stanić S, Pavković-Lucic S. Mating success of wild type and sepia mutants Drosophila melanogaster in different choice. Riv Biol. 2005;98(3):513–24.
Cooper GM, Stone EA, Asimenos G. NISC comparative sequencing program, Green ED, Batzoglou S, et al. distribution and intensity of constraint in mammalian genomic sequence. Genome Res. 2005;15(7):901–13.
Wilson AC, Carlson SS, White TJ. Biochemical evolution. Annu Rev Biochem. 1977;46:573–639.
Kimura M. The neutral theory of molecular evolution. Cambridge: Cambridge University Press; 1983. p. 367.
Proulx SR, Promislow DEL, Phillips PC. Network thinking in ecology and evolution. Trends Ecol Evol. 2005;20(6):345–53.
Davidson EH. Emerging properties of animal gene regulatory networks. Nature. 2010;468(7326):911–20.
Erwin DH, Davidson EH. The evolution of hierarchical gene regulatory networks. Nat Rev Genet. 2009;10(2):141–8.
Camps M, Herman A, Loh E, Loeb LA. Genetic constraints on protein evolution. Crit Rev Biochem Mol Biol. 2007;42(5):313–26.
Chae L, Lee I, Shin J, Rhee SY. Towards understanding how molecular networks evolve in plants. Curr Opin Plant Biol. 2012;15(2):177–84.
Conant GC, Wagner A. Convergent evolution of gene circuits. Nat Genet. 2003;34(3):264–6.
Jeong H, Mason SP, Barabási AL, Oltvai ZN. Lethality and centrality in protein networks. Nature. 2001;411(6833):41–2.
He X, Zhang J. Toward a molecular understanding of pleiotropy. Genetics. 2006;173(4):1885–91.
Promislow DEL. Protein networks, pleiotropy and the evolution of senescence. Proc Biol Sci. 2004;271(1545):1225–34.
Fraser HB, Hirsh AE, Steinmetz LM, Scharfe C, Feldman MW. Evolutionary rate in the protein interaction network. Science. 2002;296(5568):750–2.
Pavlicev M, Wagner GP. A model of developmental evolution: selection, pleiotropy and compensation. Trends Ecol Evol. 2012;27(6):316–22.
Wagner A. Robustness against mutations in genetic networks of yeast. Nat Genet. 2000;24(4):355–61.
Wagner GP, Kenney-Hunt JP, Pavlicev M, Peck JR, Waxman D, Cheverud JM. Pleiotropic scaling of gene effects and the “cost of complexity.”. Nature. 2008 Mar 27;452:470.
Stern DL. The genetic causes of convergent evolution. Nat Rev Genet. 2013;14(11):751–64.
Goenawan IH, Bryan K, Lynn DJ. DyNet: visualization and analysis of dynamic molecular interaction networks. Bioinformatics. 2016;32(17):2713–5.
Wagner A. Genotype networks shed light on evolutionary constraints. Trends Ecol Evol. 2011;26(11):577–84.
Schoenrock A, Burnside D, Moteshareie H, Pitre S, Hooshyar M, Green JR, et al. Evolution of protein-protein interaction networks in yeast. PLoS One. 2017;12(3):e0171920.
Scannell DR, Zill OA, Rokas A, Payen C, Dunham MJ, Eisen MB, et al. The awesome power of yeast evolutionary genetics: new Genome sequences and strain resources for the Saccharomyces sensu stricto genus. G3. 2011;1(1):11–25.
Hurst LD, Smith NG. Do essential genes evolve slowly? Curr Biol. 1999;9(14):747–50.
Siegal ML, Promislow DEL, Bergman A. Functional and evolutionary inference in gene networks: does topology matter? Genetica. 2007;129(1):83–103.
Hahn MW, Conant GC, Wagner A. Molecular evolution in large genetic networks: does connectivity equal constraint? J Mol Evol. 2004;58(2):203–11.
Davidson EH, Erwin DH. Gene regulatory networks and the evolution of animal body plans. Science. 2006;311(5762):796–800.
Wu C-I. The genic view of the process of speciation. J Evol Biol. 2001;14(6):851–65.
Bloom JD, Adami C. Apparent dependence of protein evolutionary rate on number of interactions is linked to biases in protein-protein interactions data sets. BMC Evol Biol. 2003;3:21.
Lang GI, Rice DP, Hickman MJ, Sodergren E, Weinstock GM, Botstein D, et al. Pervasive genetic hitchhiking and clonal interference in forty evolving yeast populations. Nature. 2013;500(7464):571–4.
Gerstein AC, Lo DS, Otto SP. Parallel genetic changes and nonparallel gene–environment interactions characterize the evolution of drug resistance in yeast. Genetics. 2012;192(1):241–52.
Gresham D, Desai MM, Tucker CM, Jenq HT, Pai DA, Ward A, et al. The repertoire and dynamics of evolutionary adaptations to controlled nutrient-limited environments in yeast. PLoS Genet. 2008;4(12):e1000303.
Hittinger CT, Rokas A, Carroll SB. Parallel inactivation of multiple GAL pathway genes and ecological diversification in yeasts. Proc Natl Acad Sci U S A. 2004;101(39):14144–9.
Anderson JB, Sirjusingh C, Parsons AB, Boone C, Wickens C, Cowen LE, et al. Mode of selection and experimental evolution of antifungal drug resistance in Saccharomyces cerevisiae. Genetics. 2003;163(4):1287–98.
Coghlan A, Wolfe KH. Relationship of codon bias to mRNA concentration and protein length in Saccharomyces cerevisiae. Yeast. 2000;16(12):1131–45.
Wagner A. Energy constraints on the evolution of gene expression. Mol Biol Evol. 2005;22(6):1365–74.
Storey KB. Reptile freeze tolerance: metabolism and gene expression. Cryobiology. 2006;52(1):1–16.
Wollenberg Valero KC, Pathak R, Prajapati I, Bankston S, Thompson A, Usher J, et al. A candidate multimodal functional genetic network for thermal adaptation. PeerJ. 2014;2:e578.
Porcelli D, Butlin RK, Gaston KJ, Joly D, Snook RR. The environmental genomics of metazoan thermal adaptation. Heredity. 2015;114(5):502–14.
Fraser HB. Modularity and evolutionary constraint on proteins. Nat Genet. 2005;37(4):351–2.
Han J-DJ, Bertin N, Hao T, Goldberg DS, Berriz GF, Zhang LV, et al. Evidence for dynamically organized modularity in the yeast protein-protein interaction network. Nature. 2004;430(6995):88–93.
Roguev A, Bandyopadhyay S, Zofall M, Zhang K, Fischer T, Collins SR, et al. Conservation and rewiring of functional modules revealed by an epistasis map in fission yeast. Science. 2008;322(5900):405–10.
Reid NM, Proestou DA, Clark BW, Warren WC, Colbourne JK, Shaw JR, et al. The genomic landscape of rapid repeated evolutionary adaptation to toxic pollution in wild fish. Science. 2016;354(6317):1305–8.
Riesch R, Muschick M, Lindtke D, Villoutreix R, Comeault AA, Farkas TE, et al. Transitions between phases of genomic differentiation during stick-insect speciation. Nat Ecol Evol. 2017;1(4):82.
Losos JB, Jackman TR, Larson A, Queiroz K, Rodriguez-Schettino L. Contingency and determinism in replicated adaptive radiations of island lizards. Science. 1998;279(5359):2115–8.
Colosimo PF, Hosemann KE, Balabhadra S, Villarreal G Jr, Dickson M, Grimwood J, et al. Widespread parallel evolution in sticklebacks by repeated fixation of Ectodysplasin alleles. Science. 2005;307(5717):1928–33.
Tarvin RD, Santos JC, O’Connell LA, Zakon HH, Cannatella DC. Convergent substitutions in a sodium channel suggest multiple origins of toxin resistance in poison frogs. Mol Biol Evol. 2016;33(4):1068–81.
Yudin NS, Larkin DM, Ignatieva EV. A compendium and functional characterization of mammalian genes involved in adaptation to Arctic or Antarctic environments. BMC Genet. 2017;18(1):111.
Corbett-Detig R, Russell S, Nielsen R, Losos J. Phenotypic convergence is not mirrored at the protein level in a lizard adaptive radiation. Mol Biol Evol. 2020;msaa028. https://doi.org/10.1093/molbev/msaa028.
Losos JB. Convergence, adaptation, and constraint. Evolution. 2011;65(7):1827–40.
Good BH, McDonald MJ, Barrick JE, Lenski RE, Desai MM. The dynamics of molecular evolution over 60,000 generations. Nature. 2017;551(7678):45–50.
von Loga K, Gerlinger M. Cancer (r)evolution. Nat Ecol Evol. 2017;1(8):1051–2.
Mueller S, Engleitner T, Maresch R, Zukowska M, Lange S, Kaltenbacher T, et al. Evolutionary routes and KRAS dosage define pancreatic cancer phenotypes. Nature. 2018;554(7690):62–8.
Coyne JA. Comment on “Gene regulatory networks and the evolution of animal body plans.”. Science. 2006;313(5788):761 author reply 761.
Fraser HB, Wall DP, Hirsh AE. A simple dependence between protein evolution rate and the number of protein-protein interactions. BMC Evol Biol. 2003;3:11.
Chatr-Aryamontri A, Breitkreutz B-J, Heinicke S, Boucher L, Winter A, Stark C, et al. The BioGRID interaction database: 2013 update. Nucleic Acids Res. 2013;41(Database issue):D816–23.
Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13(11):2498–504.
Acknowledgements
I would like to thank Alexander P. Turner (Department of Computer Science, University of Hull) for helpful discussions.
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
KWV conceived the study, analyzed the data and wrote the manuscript. The author(s) read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Additional file 1: Supplementary Table 1.
Results for linear multiple regression whole model. Test of significance between sum of squares (SS) of whole model vs. SS of residual. Factors were: average shortest path length, neighborhood connectivity, betweenness centrality, and CAI expression level. Significant effects are given in italics. Supplementary Table 2. Linear regression multivariate Wilks tests of significance for predictors and their interactions, effect sizes, and powers. Abbreviations are: ASPL - average shortest path length, NC- neighborhood connectivity, BC- betweenness centrality, CAI: gene expression level. Effect df = 2; error df = 2177. Significant predictors are given in italics. Supplementary Table 3. Generalized linear model (Normal distribution, log link function) model building results for ω as dependent variable. Given are degrees of freedom df, Akaike Information Criterion AIC, delta AIC, Likelihood ratio Chi square test, and resulting error probability. Abbreviations are: ASPL - average shortest path length, NC- neighborhood connectivity, BC- betweenness centrality, CAI: gene expression level. Significant models are given in italics. The full (global) model is highlighted in red. The model with only CAI as predictor is highlighted in gray. Supplementary Table 4. Generalized linear model (Normal distribution, log link function) model building results for γ as dependent variable. Given are degrees of freedom df, Akaike Information Criterion AIC, delta AIC, Likelihood ratio Chi square test, and resulting error probability. Abbreviations are: ASPL - average shortest path length, NC- neighborhood connectivity, BC- betweenness centrality, CAI: gene expression level. Significant models are given in italics.The full (global) model is highlighted in red. The model with only CAI as predictor is highlighted in gray.
Additional file 2.
Data file.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Wollenberg Valero, K.C. Aligning functional network constraint to evolutionary outcomes. BMC Evol Biol 20, 58 (2020). https://doi.org/10.1186/s12862-020-01613-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12862-020-01613-8