Pathway model
Here, we use a generic mathematical model of biological pathways, which captures their basic properties. The model has been explained in detail previously [18, 35, 36]. In brief, it considers a pathway as a collection of interacting proteins, each of which can exist either in an active (Pi*) or inactive (Pi) state. Initially, proteins exist in equilibrium between these two states. Proteins that are in the active state can interact with another protein and influence its equilibrium. There are many different biochemical mechanisms that make such influence possible including phosphorylation, methylation, and physical contact. This model does not distinguish among these different mechanisms and uses a simple interaction coefficient to describe activation and deactivation of proteins by other proteins. Thus, for a given protein i the chemical equilibrium between active and inactive states is defined as:
(1)
where [] represents the concentration of active form of protein j and k
ij
and l
ij
represent the strength of the interaction between protein i and j. It is assumed that in case there is an energy requirement for interaction of protein i with j, it is provided from outside sources such as high-energy molecules. The influence of each protein on other proteins can only be activating or deactivating (i.e. k
ij
·l
ij
= 0) and proteins do not influence their own equilibrium (i.e. k
ii
= l
ii
= 0). In other words, processes such as autophosphorylation or intrinsic phosphotase activity are not considered in this model.
In addition to the general reaction scheme shown in (1), some proteins in the pathway can interact with signaling molecules (i.e. ligand), hence acting as receptors. Further, there are proteins – so called effectors – that are assumed to mediate the physiological response of the pathway in their active state. In a natural pathway this response can have various forms depending on the effector and can range from transcriptional control to enzymatic activity. It is assumed that effectors act solely as response regulators and do not influence the equilibrium state of other proteins in the pathway (i.e. they can not act on other proteins). To summarize, this model defines a biological pathway by a given number of proteins and a set of coefficients defining their interactions. Pathway response to one or more changing ligand concentrations can thus be obtained by solving the set of differential equations resulting from the collection of reactions as shown in (1):
(2)
where [L
s
] and a
is
stand for the concentration of ligand s and its effect on protein i respectively. Note, that the total concentration of each protein [] is constant and set to one (i.e. [] = 1 - [P
i
]). Also, the maximum value that ligand concentrations and interaction coefficients can attain is set to one for computational ease.
To assess the response of a given pathway, the model is initiated with equal amounts of active and inactive proteins (i.e. [] = [P
i
] = 0.5). Then, the system is allowed to equilibrate into a steady state in absence of any signal (i.e. the system of differential equations resulting from [2] is integrated, using the 4th order Runga-Kutta algorithm with step size equal to one, until the point where total change in protein concentrations is below 10-15 and an eigenvalue analysis indicates stability). Once the system is stable, the integration is continued from steady state protein concentrations and two separate signals are introduced (a Gaussian curve with a standard deviation of 10). After introduction of the first signal, the system is integrated until it reaches steady state again, before the second signal is introduced (see Figure 1). The integration is stopped after both signals have been introduced and system reached steady state again (or after a total of 10.000 integration steps have passed). The pathway response is then deduced from the active effector concentrations recorded during integration under ligand presence (shown as a gray area in Figure 1) and is used to calculate the fitness of the pathway as explained below. Systems that do not reach steady state before or after the introduction of signals, are considered unstable and receive a fitness of zero. To avoid any effects of numerical artifacts on the integration process concentrations smaller than 10-9 are set to zero.
Evolutionary simulations
In order to study the evolution of modularity, a specific ancestral pathway is defined. It contains two effectors, a receptor and an intermediary protein. The latter two are assumed to act as a "global" activator and deactivator respectively. In other words, both proteins are highly non-specific; the receptor is activated by all present signals and relays this activity to the effectors, and the intermediary protein inhibits both the receptor and the effectors (see Figure 1). This pathway could be thought of as the predecessor of bacterial two-component signaling pathways [37], where the receptor and the intermediary protein would correspond to a non-specific histidine kinase and phosphotase respectively.
An initial homogenous population of 1000 ancestral pathways is evolved for 1000 generations (running simulations up to 2000 generations gave qualitatively similar results to those presented in the main text as shown in Additional File 5). During evolution, pathways are subjected to selection for responding separately to the two signals presented at different times of the integration process as shown in Figure 1. Based on this selection criterion, pathway fitness (F) is defined as:
(3)
where and ( and ) stand for the maximum of the difference in active effector concentrations between their pre- (i.e. steady state) and post-signal values in the time bracket from introduction of signal A (B) until system reaches steady state again (as described above, also see Figure 1). Further, n is the number of proteins in the pathway and c is the fitness cost of each protein. The first part of the fitness function rewards pathways ability to respond to the two signals separately through the two effectors. The second part gives smaller pathways a fitness advantage, the extent of which is controlled by the parameter c (for the reported results c was 0.001). Presented results hold for c values as high as 0.1. At such high fitness cost, protein additions are rarely permitted, keeping pathways from growing in size and limiting the chances for emergence of modularity (see Results and discussion). Note that division by four is only to scale fitness between zero and one, allowing it to be used directly as replication probability in the evolutionary simulations (see below).
The same fitness function is used throughout the total duration of an evolutionary simulation, representing a constant selective pressure on the pathways as they evolve. The specific function shown in equation 3 is an ad-hoc choice that is biologically plausible. There can be many different fitness functions, that are similar to this one and that could lead to modularity. In fact, even simulations with the simplest scheme where fitness equaled pathways ability to respond to the two signals (i.e. F = 0.5·( + )) resulted in emergence of modular pathways. This indicates that for the presented analysis, which explores the effects of mechanistic processes on modularity, the exact choice of the fitness function is not crucial.
Throughout the evolutionary simulation, each generation is created from the previous one by randomly selecting individuals for replication with replacement. Randomly selected pathways replicate with a probability proportional to their fitness, and undergo mutations per protein with a certain probability (for the reported results this probability was 0.05, and simulations with 0.1 and 0.005 produce qualitatively equivalent results). Such mutations can cause one of the following with the given probabilities: loss of an existing interaction (P = 0.4) or protein (P = 0.2) in the pathway, formation of an interaction (P = 0.1), duplication of an existing protein (P = 0.1) or variation in the coefficient of a randomly selected interaction (P = 0.2). These probabilities represent the commonly accepted view that deleterious mutations are more frequent.
All these evolutionary mechanisms are biologically plausible. Except for duplications, we consider all these mechanisms resulting from one or multiple point mutations affecting protein structure and function. For interaction-loss and – adjustment events we assume mutations lead to changes in the binding surface of one protein leading to loss of an interaction or changes in its efficiency. Simulating these events involved randomly selecting two interacting proteins from the pathway, and adjusting the associated coefficient describing their interaction (i.e. the coefficient is set to zero or adjusted by a random percentage). We consider on which protein the simulated mutation has occurred and account for its effects on duplicates of the selected proteins. For example, if protein i lose its ability to interact with protein j because of mutations happened on itself (on protein j), then it (protein j) will also lose its ability to interact with duplicates of protein j (protein i). Duplication events are assumed to result from larger genomic mutations, and are simulated by adding a new protein to the pathway with exactly those interactions as a randomly selected protein from the pathway (the duplicated protein). A duplicate protein is treated as such until it receives another mutation, after which, it is treated as a unique protein.
Mutations resulting in formation of an interaction require special treatment as such an interaction can arise among proteins already participating in a given pathway (i.e. interaction formation), or between a non-participating and participating one (i.e. protein recruitment). To account for these different routes, and to evaluate their effects on the evolution of modularity, we run additional simulations where single interaction additions are simulated as protein recruitment or formation of interaction among existing proteins with a certain probability. As discussed in Results and discussion section the ratio between the relevant rates of these two routes affect modularity but not fitness and pathway size (see Additional File 3). Simulating mutations leading to interaction formation involved randomly selecting two proteins. Both or only one of these proteins are selected from the pathway in case of interaction formation and protein recruitment respectively. For protein recruitment we consider the second protein to be one of the many existing proteins in the cell that are not participating in the pathway until that point. In case of both proteins being selected from the pathway (interaction formation), the selection procedure also included signals. Furthermore, the selection procedure did not allow selection of non-receptor proteins and a signal or selection of two participating proteins that are duplicates of the same protein or of each other (as this would correspond to the formation of a self-interaction). As such, the selection processes allows for the possibility for a receptor to start interacting with a signal that it did not interact with before, but does not allow non-receptor proteins to turn into receptors. In other words, the only way for new receptors to arise in this model is through duplication of existing ones. Once two proteins are selected, an interaction is created among them by randomly selecting a coefficient from the interval [-1,1]. If a selected interaction coefficient for proteins i and j was negative (positive) then l
ij
(k
ij
) is set to the absolute value of this number and k
ij
(l
ij
) is set to zero.
To test for the effects of the initial population composition, we have run additional simulations starting with a randomly generated heterogeneous population. In different runs we initiated the population with pathways composed of two receptors, two effectors, and two, three, or five, intermediate proteins that are randomly connected. The distribution of different pathway types in three sample runs for each condition is shown in Additional File 4. To test for the effects of population size, we have run additional simulations with population size 100 and using different probability for protein recruitment. Results from these runs are shown in Additional File 5 and discussed in the main text.
Throughout the evolutionary simulations pathway fitness, and various pathway properties are recorded. Each simulation is run multiple times to ascertain that the qualitative conclusions made here are robust to stochastic fluctuations inherent in these evolutionary simulations. All simulations are written in C++ and the source code is available from the authors upon request. Sample pathway structures are drawn using Graphviz.