### Network dynamics

Network behavior is determined (using the 2*N* × *N* interaction matrix *W*) by a Boolean dynamical system of the form *s*_{
i
}(*t* + 1) = *σ*(*u*_{
i
}(*t*) + *v*_{
i
}(*t*)) for the *i* th protein product, where

{u}_{i}(t)=\sigma ({\displaystyle \sum _{j}^{N}{W}_{i,j}{s}_{j}(t))}

{v}_{i}(t)=\sigma ({\displaystyle \sum _{j}^{N}{W}_{i+N,j}{s}_{j}(t))}

and

\sigma (x)=\{\begin{array}{ll}1\hfill & if\phantom{\rule{0.3em}{0ex}}x>0\hfill \\ 0\hfill & otherwise\hfill \end{array}

Starting from an initial state vector **s**(0) ∈ {0, 1}^{N}, successive states **s**(*t*) are generated until we encounter a repeated state **s**(*t*_{
P
}) (*t*_{
P
}> 0), such that **s**(*t*_{
P
}) = **s**(*t*_{
R
}) for some *t*_{
R
}<*t*_{
P
}. The initial state **s**(0) is constant for each simulation and is set by randomly choosing each *s*_{
i
}= 0 or 1.

### Mutation

The probabilities for deletion and addition of a nonzero element (*W*_{i,j}) of the interaction matrix *W*, are

and

respectively, where m=\frac{\mu}{2{N}^{2}} is the mutation rate per element, *b* is a global deletion bias parameter *b* ∈ (-1, 1), and *k* is a normalizing factor to ensure that the mutation rate per genotype (*μ*) remains constant. To find *k*, we proceed as follows. In a matrix with connectivity *c*, the probability of a deletion is *cpd*, and the probability of an addition is (1 - *c*)*p*_{
a
}. To maintain a mutation rate of *m* per element, we require

c{p}_{d}+(1-c){p}_{a}=\frac{\mu}{2{N}^{2}}=m

Substituting, we obtain

m\left[c\frac{(1+b)}{k}+(1-c)\frac{(1-b)}{k}\right]

The term within parentheses must be equal to 1, and therefore

*k* = *c*(1+*b*) + (1-*c*)(1-*b*)

Note that if the deletion bias is set to its highest value *b* = 1, then *p*_{
a
}= 0 and only link deletions occur. Similarly, if it is set to its lowest value *b* = -1, then *p*_{
d
}= 0 and only link additions occur.

### Connectivity and deletion bias

It is convenient to elucidate the effect deletion bias (*b*) has on connectivity (*c*) as the population evolves, and in particular, the effect of *b* on *c*_{
f
}, the final value of *c* at generation 10000. Intuitively, one would expect high *b* values (*b*~1) to reduce connectivity, when compared to lower *b* values (*b*~-1). Simulations were performed across a range of values for *b* (*b* = -1, -0.5, 0, 0.5, 1), initial connectivity (*c*_{
i
}= 0.3, 0.45, 0.6). In all cases we find the relationship between *b* and median *c*_{
f
}to be approximately linear. We also find that, irrespective of *c*_{
i
}, a large range for *c*_{
f
}is possible. Even for relatively high initial connectivity *c*_{
i
}= 0.6 (Supp. Figure 1 in Additional file 1, right) connectivity can be reduced to well below half the initial value (compare *c*_{
i
}/2 = 0.3 with 0.215, the upper bound for 95% confidence interval), a decline beyond that predicted by the DDC model. Note that the possibility of reducing *c*_{
f
}to below *c*_{
i
}/2 suggests that there is redundancy in the founder network, before duplication.

We define the relative change in connectivity, *D* = *c*_{
f
}/*c*_{
i
}- 1. Under the DDC model, we expect a long-term decline in connectivity to *D* = -0.5. We examine the two extremes: *D* = 0 (no change in connectivity), and *D* = -0.5 (elimination of half the interactions). Again a range of conditions are investigated for initial connectivity (*c*_{
i
}= 0.3, 0.45, 0.6). In all cases, the appropriate deletion bias (*b*) is estimated using linear regression results from the relevant dataset: for example, a simulation with initial connectivity, *c*_{
i
}= 0.3 (Supp. Figure 1 in Additional file 1, left) requires a value of *b* ≃ 0.43 to attain *D* = 0 (i.e. *c*_{
f
}= 0.3).

### Measures for paralogous genes

Recall that, in the initial population, all genotypes are identical copies of a 2*N* × *N* matrix *W*, and that this matrix is generated by rowwise duplication of a random *N* × *N* matrix *Q*, such that *W*_{
i,j
}= *W*_{i+N,j}= *Q*_{
i,j
}. We measure regulatory subfunctionalization by comparing paralogous genes in some evolved genotype, by comparing the rows *W*_{
i
}and *W*_{i+N}, with the ancestral row *Q*_{
i
}. We define a simple qualitative measure to detect subfunctionalization. We define *F*_{
i
}as the set of indices *j*_{±}, such that *Q*_{
ij
}≠ 0, representing the original inputs to gene *i*, and distinguishing between positive (*j*_{+}, {Q}_{i{j}_{+}} = +1) and negative (*j*_{-}, {Q}_{i{j}_{-}} = -1) inputs. We define similar sets *A*_{
i
}, *B*_{
i
}for the rows *W*_{
i
}and *W*_{i+N}in the evolved genotype, representing the inputs to the paralogous genes. Subfunctionalization exists if some original inputs have been lost in each of the paralogs, but together they still complement each other to represent the original input set, i.e., if the following three conditions are met:

|*F*_{
i
}| > |*A*_{
i
}∩ *F*_{
i
}| > 0

|*F*_{
i
}| > |*B*_{
i
}∩ *F*_{
i
}| > 0

(*A*_{
i
}∪ *B*_{
i
}) ∩ *F*_{
i
}= *F*_{
i
}

Note that we need |*F*_{
i
}| ≥ 2 for regulatory subfunctionalization to be possible.

Neofunctionalization exists if there are any new inputs in either of the evolved paralogs, i.e.

|(*A*_{
i
}∪ *B*_{
i
}) - *F*_{
i
}| > 0

We define the number of shared links between two paralogs, as *H*_{
i
}= |*A*_{
i
}∩ *B*_{
i
}|.

To measure temporal subfunctionalization, we consider paralogs as subfunctionalized if one is ON and the other is OFF at a particular timepoint, and if the behaviour is reversed (OFF and ON respectively) at some other timepoint. More formally, if we define time courses for the two paralogs as *u*_{
i
}(*t*) and *v*_{
i
}(*t*) (as above, under "Network dynamics"), then the conditions are *u*_{
i
}(*t*_{
X
}) = 1, *v*_{
i
}(*t*_{
X
}) = 0, *u*_{
i
}(*t*_{
Y
}) = 0, *v*_{
i
}(*t*_{
Y
}) = 1. *t*_{
X
}≠ *t*_{
Y
}.

*Parsimonious* founder networks

If there are redundant interactions in the founder network, such interactions can be deleted in both duplicates during evolution with no phenotypic effect. Consequently, regulatory subfunctionalization would not be recognized, in spite of the possibility that the remaining non-redundant interactions may in fact be subfunctionalized. To address this issue, we generate *parsimonious* (i.e. with minimal redundancy) founder networks. We implement the following algorithm to obtain networks with (approximate) initial connectivity *c*_{
i
}:

1. Generate a matrix *Q'* with full connectivity (*c* = 1), and generate **s**(*t*).

2. Delete connections in random order, retaining only those deletions which do not alter the expression pattern, **s**(*t*).

3. Repeat step 2 until all attempted deletions are unsuccessful, i.e. alter **s**(*t*).

4. Accept *Q'* as new founder *Q* if it has connectivity (*c*) between *c*_{i} - Δ_{
p
}and *c*_{
i
}+ Δ_{
p
}, otherwise return to step 1 (Δ_{
p
}= 0.05 was used).

The matrix *Q* is then duplicated to create the matrix *W* in the initial population. The algorithm works because, for a large sample of initial random matrices *Q'*, one observes *c* values (for the founder matrices *Q*) across the entire range [0, 1].

### Analysis of yeast data

We used the program *GenomeHistory* [55] with the same parameters as used for *Saccharomyces cerevisiae* in the original study, resulting in a list of paralogous genes. The program also estimates the number of synonymous (*K*_{
s
}) substitutions per synonymous site, and the number of nonsynonymous (*K*_{
a
}) substitutions per nonsynonymous site. Following Evangelisti and Wagner [12] we retained only gene pairs with *K*_{
a
}< 1 for further analysis. We use the *K*_{
s
}value as a proxy for divergence time. Because we make only broad categorizations based on *K*_{
s
}, we have retained the lower accuracy *K*_{
s
}values labeled as "saturated" by *GenomeHistory*. Cell-cycle synchronized microarray data for yeast was obtained from two sources: three distinct time-courses (labeled as "alpha","cdc15", and "elutriation") were obtained from the first [26], and two time-courses (labeled as "*α* 30" and "*α* 38") from a second, more recent, dataset [27] (the "cdc28" time-course from the first dataset was excluded due to its containing many missing values, and the lower-resolution "*α* 26" time-course was excluded from the second dataset).

Paralogs *A* and *B* are considered to be temporally subfunctionalized if A is ON and B is OFF at some time *t*_{
X
}and A is OFF and B is ON at some other time *t*_{
Y
}. Since the data are continuous, these need to be discretized beforehand. Each gene and time-course [time series *S*(*t*)] were discretized independently by normalizing *S*(*t*) to the interval (0, 1) to give a series *S'*(*t*), then assigning ON values where *S'*(*t*) > *θ*, and OFF values where *S'*(*t*) < 1 - *θ*. To verify that true temporal subfunctionalization has occurred, we used subcellular localization data [56] to exclude paralogs that do not co-localize.

The null distribution (representing the distribution of temporal subfunctionalization that would be expected by chance) was generated by taking the paralogous pairs and randomly shuffing the partners, for example in a dataset with 3 paralogous pairs, (*x*_{1}, *y*_{1}),(*x*_{2}, *y*_{2}), (*x*_{3}, *y*_{3}) → (*x*_{1}, *y*_{3}), (*x*_{2}, *y*_{1}), (*x*_{3}, *y*_{2}). Separate datasets were generated for the "youngest" (i.e. lowest *K*_{
s
}) and "oldest" (i.e. highest *K*_{
s
}) quartiles for *K*_{
s
}in each time-course. Temporal subfunctionalization was then measured for 1000 random shuffes. These measurements were then used to estimate the probability (P-value) that actual temporal subfunctionalization is less than would be expected by chance, defined as the fraction of random shuffes for which temporal subfunctionalization is below the actual value. All results shown use *θ* = 0.8. However, we repeated the analysis using *θ* through the range (0.5, 0.9), and obtained qualitatively equivalent results, as shown in Supp. Table 1 in Additional file 1.