### The equilibrium

This system has a unique equilibrium, as the following shows. The state transitions of this model can be described by the equation \stackrel{\u0304}{w}{\mathbf{x}}^{\prime}=A\mathbf{x} where **x** is a state vector **x** = (*x*_{0},*x*_{1},…,*x*_{
n
})^{T}, and *A* is the matrix whose (*i*,*j*) entry is given by the coefficient of *x*_{
j
} in the expansion of \stackrel{\u0304}{w}{x}_{i}^{\prime} in Equation (2). This is a *non-linear* system, since \stackrel{\u0304}{w}, which is also the mean fitness of the population, depends on the state **x**.

Despite the system’s non-linearity, we can obtain the existence and uniqueness of an equilibrium state through considering the matrix *A* in the light of the *Perron-Frobenius Theorem* for irreducible matrices (see, for example, [22, Section 8.3]). This Theorem states that *a non-negative irreducible matrix has a unique eigenvector that is a scalar multiple of a positive vector*. The associated (positive) eigenvalue is the spectral radius of the matrix and is called the Perron root, and the positive eigenvector is called the Perron vector.

The (*n* + 1)×(*n* + 1)matrix *A* that arises in our model is irreducible because it is regular, meaning that a positive power of *A* has strictly positive entries. This in turn follows because the entries of *A* are all strictly positive on and below the main diagonal, as well as in the first diagonal above the main diagonal (the remaining entries are zero). Hence *A*^{n} will have only positive entries, implying that *A* is regular. The Perron-Frobenius Theorem therefore applies, and so *A* has a unique eigenvector that is a scalar multiple of a positive vector. In other words, there is an eigenvector of *A* that consists of positive entries, and with the additional constraint that the entries sum to 1 this vector (that is, state) is unique. In concrete terms, there is a unique positive vector **v** summing to 1 such that *A* **v** = *λ* **v**, with *λ* > 0. This provides a unique solution to our system’s equilibrium equation A\mathbf{x}=\stackrel{\u0304}{w}\mathbf{x}, namely **x** is the Perron vector **v** and \stackrel{\u0304}{w} is the Perron root *λ*.

An additional consequence of the regularity of *A* is that the dynamics are pushed away from the boundaries of the positive cone of the state space. In other words, there is *protected polymorphism*. Specifically, once the state vector is strictly positive, which occurs after at most *n* steps, no variable *x*_{
i
} can ever again be zero because the transition matrix has non-negative entries.

A consideration of the fitnesses reveals conditions under which the population is closer to being either fixed on the complete pathway (*x*_{
n
} high) or fixed on the empty pathway (*x*_{0} high). The cost *c* favours the empty state while exposure *ε*favours the full pathway, and the balance between these two factors is seen by comparing the fitnesses of the two extreme states. The separation between high *x*_{0}and high *x*_{
n
} should occur when *w*_{
n
}=1=*w*_{0}which holds when

\epsilon ={\epsilon}^{\ast}=\frac{1-{(1-c)}^{n}}{1+b-{(1-c)}^{n}},

(3)

following from the definition of *w*_{
i
} (Equation (1)). That is, under this heuristic, when *ε* > *ε*^{∗} the dynamics should move to high values of *x*_{
n
}at equilibrium. Note that for a given fitness cost *c*, as the pathway length *n* increases the exposure *ε* required in order for the pathway to become fixed in the population also increases.

### Dynamics

We study the transient dynamics of pathway acquisition numerically by setting the parameters according to Table 1 and iterating the process (Equation (2)) until the variables are within a small tolerance of equilibrium. For each set of parameters, the dynamics were examined from two boundary state initial conditions: 1. no individuals in the population carried any genes in the pathway (fixation on the empty pathway, *x*_{0}=1), and 2. All individuals in the population carried all of the pathway (fixation on the complete pathway, *x*_{
n
}=1). We use these initial conditions to understand how genes of a pathway are acquired and lost; other initial conditions would also lead to the unique internal equilibrium point described above.

A convenient way to represent the trajectory of the population through (*n* + 1)-dimensional space is by compressing some coordinates and using a *de Finetti* diagram (see for instance [23]) as follows. A distribution of population proportions is given by a point in the interior of an equilateral triangle whose corners represent *x*_{0}, *x*_{
n
} and 1−*x*_{0}−*x*_{
n
}. The value of each of these three variables is given by the perpendicular distance from the side opposite the vertex labelled by the variable. The quantity

1-{x}_{0}-{x}_{n}=\sum _{i=1}^{n-1}{x}_{i}

gives the proportion of the population with at least one of the genes, but not the complete pathway, and this reflects the extent of variation in gene content in the pathway. Figure 2 illustrates an example of a de Finetti diagram together with population distributions along the trajectory.

To study how the dynamics of this system are affected by horizontal transfer and natural selection, we varied parameters controlling these features of the model (see Table 1). Transfer is controlled by the base rate *γ*_{1} of transfer of a single gene and the extent of clustering *κ*, while selection is controlled by *c*, *b* and *ε*.

Figure 3 shows how varying exposure *ε*for two different values of transfer rate *γ*_{1} influences the dynamics. As exposure increases, the equilibrium shifts from near fixation on the empty pathway to near fixation on the complete pathway. For the parameters used in this figure, the threshold exposure is *ε*^{∗} = 0.198 which lies between the third and fourth columns. Also, the trajectories become flatter (less variation) as exposure increases. The overall appearance of the dynamics is similar for high and low transfer settings, even though the base transfer rates differ by six orders of magnitude. The equilibrium is more polymorphic in the case of the high transfer rate.

We further characterised transfer and selection as follows. As before we examine low and high transfer rates (*γ*_{1}), and low and high clustering (modelled by *κ*), but consider all four scenarios (Figure 4). For each of these we explored the dynamics of the process varying fitness cost *c* and exposure *ε*(recall the value of *b* is fixed at 0.1). We allowed *c* to vary between 0 and 0.01, and *ε* to vary between 0 and 1, dividing each parameter range into nine values (eight non-zero). The output quantities of interest here are the *equilibrium point* and the *extent of variation* in pathway content. The equilibrium point tells us whether the full pathway or the empty pathway is fixed or nearly fixed in the population, or if there is notable variation in the population at equilibrium. The extent of variation in gene content generated in the trajectory is captured by the peak variation (1 − *x*_{0} − *x*_{
n
}) during the dynamics.

### No cost and no benefit

The first question to ask of this model is how it behaves in the absence of any cost to carrying a partial pathway and no benefit from the acquisition of the full pathway. In this way the impact of the transfer dynamics is isolated. Since we have fixed the benefit *b*=0.1, the lack of benefit is modelled by setting exposure to zero, and the cost *c* of carrying a gene is also set to zero. This situation corresponds to the bottom left hand point of each panel in Figure 4. The behaviour of the model depends on the value of *γ*_{1}. If the transfer rate is high (panels A and B), the full pathway is acquired by over 95% of the population (indicated by a black shape). If the transfer rate is low (panels C and D), we do not have 95% of the equilibrium population at either the full or empty pathway (indicated by a greyed shape).

These results reflect the balance between the only forces in play without fitness differences: transfer (pushing towards the pathway) and deletion (pushing away). The deletion rate is fixed at 10^{−9}, and so when the transfer rate is of a similar order of magnitude *γ*_{1} = 10^{−9}(regardless of *κ*), as in the lower panels, equilibrium is near neither “corner” (*x*_{0} = 1or *x*_{
n
} = 1). In the case of a higher transfer rate (*γ*_{1} = 10^{−3}, upper panels, Figure 4) deletion is overpowered and the pathway is acquired.

### Pathway acquisition

The predicted threshold given by Equation (3), shown as a dashed curve on Figure 4, proves to be accurate: instances of fixation or near fixation on complete pathways (represented by black shapes) lie on the right of the dashed curve. Here, the exposure to the selective environment is high enough compared to the cost to allow pathway acquisition. To the left of the dashed curve, where exposure is low compared to cost, are instances of empty pathways (white shapes) or polymorphic equilibria (grey shapes). The latter outcomes, in which the equilibrium is neither close to the complete nor to the empty pathway, indicate a stable state for which there is substantial variation in gene presence. These only occur when exposure to the selective environment is very low, and the fitness cost of carrying a gene is also low or zero, with one exception: when the horizontal transfer parameters favour frequent transfer and high gene clustering (Figure 4B, high *γ*_{1} and high *κ*). In this case, variation at equilibrium appears in a wider range of parameter values.

### Variation in gene presence

As mentioned above, variation in the presence of genes from a pathway means that a significant proportion of the population contains neither the complete pathway nor the empty pathway. In our context, we define this be when at least 5% of the population are carrying neither of these extremes. The maximum value of 1 − *x*_{0} − *x*_{
n
} is an indicator of this variation. On the de Finetti diagram of a given trajectory (e.g. see Figures 2 and 3), this corresponds to the maximum height attained by the trajectory. Such variation can arise in several ways.

Firstly, at equilibrium as noted above, variation in pathway content can occur when the exposure *ε* is sufficiently low compared to cost *c* and the transfer rate *γ*_{1} is high (Figure 3A–C and grey triangles in Figure 4A and B). This high level of variation occurs because the force of horizontal transfer is strong enough to counter the loss of genes.

Variation can also arise in *transition* towards equilibrium, either from the empty or the complete pathway. In the case of the trajectory en route to acquisition, observe that genes are transferred in the model in groups of any number but with decreasing rates for larger numbers. Thus, depending on the balance of parameters, the pathway may be acquired directly by transfers of large numbers of genes, or more gradually with small numbers of genes at a time. Variation en route to acquisition is represented by black triangles in Figure 4, and occurs when the transfer rate is very high, and is more pronounced when clustering (represented by *κ*) is also high (Figure 4B and Figure 3D). In particular, from Figures 4A and 4B we see that variation en route to acquisition occurs when both cost *c* and exposure *ε*are low.

Variation can also arise in transition in the other direction, through the loss of genes starting from a complete pathway. Indeed, when transfer rates are low, the only significant variation in the presence of genes from the pathway arises when gene loss occurs starting from the complete pathway (Figures 4C and D and Figures3E–G). This occurs when exposure to the selective environment is very low. The same phenomenon arises for high transfer: absence of exposure to the selective environment results in slow decay that passes through significant variation.

This evolutionary passage of decay through intermediate numbers of genes occurs when the fitness benefit of acquiring the full pathway does not outweigh the additional cost of the final gene. In this situation, there is no contest between the complete pathway and the empty pathway, unlike the balance leading to the threshold of Equation (3). A second threshold exposure derives from this argument, namely at *w*_{n−1} = *w*_{
n
}, or

{\epsilon}^{\ast \ast}=\frac{c{(1-c)}^{n-1}}{1+b-{(1-c)}^{n}}\le {\epsilon}^{\ast}.

(4)

Roughly speaking, for exposure levels lower than this, large amounts of variation in gene presence appear as pathways decay. When exposure levels are higher than *ε*^{∗∗} but still lower than *ε*^{∗}, selection leads to pathway decay but through a more direct replacement with the empty pathway. The threshold given in Equation (4) corresponds to a very low level of exposure, closer to the far left column of the panels in Figure 4 than to the second column (representing *ε*=0.125). Figure 3 illustrates how de Finetti trajectories change around this threshold, with a value of *ε*^{∗∗} = 0.039 for these parameters lying between the second and third columns.

When there is no exposure to the selective environment at all (*ε* = 0) — the left columns of each panel in Figure 4 — we see the greatest extent of variation (represented by triangles). Most such variation arises from pathway decay, but when the transfer rate is high enough, as in panels A and B of Figure 4, polymorphic equilibria (grey triangles) indicate that variation has accumulated from the empty pathway in the absence of selection. Even in this situation, when the cost is too high little variation accumulates (white triangles at the top of the left axes in panels A and B) reflecting the dynamic balance between transfer and fitness cost.

Flat trajectories that do not reach 0.05 in height – so that 95% of the population mass remains with *x*_{0} and *x*_{
n
} at all times – tend to appear when exposure is high (circles in Figure 4). These points are where the pathway is acquired through transfers of large numbers of genes. For low transfer rates (Figure 4C and D), this is the only way that pathways are acquired. An example of such a trajectory is shown in Figure 3H. When there is a higher rate of transfer (Figure 4A and B), this outcome occurs less readily, especially when clustering is high (Figure 4B). The inverse outcome, in which the pathway is acquired in pieces (black triangles in Figure 4), is only seen when the transfer rate is high, and in particular when the degree of gene clustering *κ* is also high (Figure 4B). Figure 3D shows this scenario on a de Finetti diagram. Even here, however, the trajectory is relatively flat, indicating only a low level of transient variation. Note that gene clustering *κ*increases the transient peak variation, but only when the transfer rate *γ*_{1} is high.

### Effect of pathway length

We varied the length of the pathway to study the effect of this parameter on the model behaviour. To cover realistic pathway lengths we explored a wide range of values (*n* = 1,…,30). In the extreme case, when *n* = 1, there are only two states in the system, presence or absence of the pathway. Consequently there is no “variation” in gene presence in the sense of there being partial pathways. However the *equilibrium* behaviour can be established. Because there are only two coupled states, the system reduces to a quadratic in one variable, namely

(1-{w}_{1}){x}_{1}^{2}-(1-{w}_{1}+{\gamma}_{1}+\delta ){x}_{1}+{\gamma}_{1}=0.

In the light of the discussion of the equilibrium above, exactly one of these gives a positive solution for *x*_{0} and *x*_{1} under realistic parameter choices. In the case that there is no fitness cost and no exposure to the beneficial environment (*c* = *ε* = 0), we have both states equally fit (*w*_{0} = *w*_{1} = 1), and the system becomes linear. The dynamics are then a balance between transfer *γ*_{1} and deletion *δ*, and the equilibrium is simply *x*_{0} = *δ*/(*δ* + *γ*_{1}) and *x*_{1} = *γ*_{1}/(*δ* + *γ*_{1}). If deletion and transfer occur at the same rate, the equilibrium is (*x*_{0},*x*_{1}) = (1/2,1/2).

For *n*>1 pathway acquisition requires more exposure to the beneficial environment, relative to fitness cost. That is, the pathway acquisition threshold *ε*^{∗}(Equation (3)) moves to the right as shown in Figure 5. The effect of pathway length on variation in gene presence is more subtle. Figure 6 shows the effect of increasing length while keeping cost *c* fixed but varying exposure *ε* in both high and low transfer scenarios. The immediate observation is that variation is more common with longer pathways, as might be expected. It is also clear that this variation occurs in the course of pathway loss (indicated by unfilled circles in Figure 6), rather than acquisition, consistent with our observations for 5-gene pathways. There are some differences between low and high transfer rate scenarios (top vs bottom row of Figure 6), chiefly, variation at low *n* and acquisition behaviour as *ε*increases. With low *n* and high transfer (top row, Figure 6), the significant difference between the transfer rate *γ*_{1} and the deletion rate 10^{−9}gives rise to variation en route to an equilibrium with the full pathway or in between. The slight difference in acquisition behaviour between high and low transfer rates over various *n* concurs with the difference seen in Figure 4B and C along the dotted line heuristic threshold.