Intraclass Correlation
The final stable output vector describes the gene expression levels of a viable individual. The distribution of output values on (-1, +1) across individuals was indistinguishable from a uniform distribution, as might be expected. We also tested whether the output vectors were correlated across individuals. To test for correlation, we examined the intra-class correlation coefficient (ICC) of the elements of n stable output vectors, each from a distinct viable network with k genes:
(4)
The ICC tests whether the final output vectors of the n viable individuals are clustered together in small regions of the space [-1, 1]. In the equation for ICC, is the sample mean of the elements in the n-th individual, is the mean of all output vectors in the population, k is the number of genes in the network, and s2 is the variance of the elements among the n individuals. The value of ICC can be positive or negative, but a value close to zero means not correlated, whereas a value close to 1 or -1 means high intra-class correlation.
From our data for both the discrete and continuous models, for any number of genes, we found the ICC value very close to zero. Even with a small number of genes where there is little room for variation, the ICC was still extremely low, as shown in Figure 1 for the continuous model.
Generating Viable Individuals
Individuals were generated at random by drawing networks and initial vectors of gene expression levels from a uniform distribution on [-1, +1]. One possible interpretation of the initial vector is that it is the level of gene products passed by maternal inheritance to the zygote, where development of the embryo would begin.
We generated 12 populations of 1000 viable individuals each (one population for each network from 3-15 genes). Figure 2 shows the likelihood of finding a viable network at random. It is clear that viable networks with many genes are unlikely to occur by chance.
The model depends on a set of initial conditions to start the developmental stage of the simulation using a randomly generated initial state vector. It is therefore unclear whether the viability of these individuals is determined by the choice of the initial output vector or by the wiring of the gene network.
To test the impact of the choice of an initial output vector we selected viable individuals and replaced their networks with randomly generated ones, however retaining the original initial state vector. We repeated this process 1000 times for each individual, generating a different network each time, while tallying the number of random W matrices that supported development into a viable individual. Analogously, we performed a similar test by keeping the network while randomly changing the initial state vector instead. These tests enabled an analysis of the relation between stability due to the input vector and stability due to the network.
As shown in Figure 3, at most 20% of the vectors tested for any number of genes in the network were responsible for stability, yielding numbers very close to those of viabilities of initial state vectors drawn at random; therefore, the initial state vector has little or no effect on generating viable individuals. The network itself, however, is highly correlated with viability. In the discrete output model, viability was determined by the choice of matrix and averaged about 70% of the 12 million tests.
Evolution of Complexity
Given the very low likelihood that a random W matrix with a high number of genes will be viable (that is, support development of random vectors to a stable state as defined by Equation [2]), we tested how easily complexity might evolve from combining ("mating") viable networks and producing a new network that may be interpreted as the "offspring" of two viable networks.
Mating was performed by defining a population of 1000 viable networks and mating two randomly drawn networks at a time. The "offspring" network was then tested for stability by iterating random initial vectors according to Equation 1. This process was repeated 1000 times to generate 1000 new progeny networks.
Haploid Mating
In the process of "haploid mating", a given gene is inherited at random from the network of either parent with equal probability. Accordingly, in the haploid mating process, we randomly selected individual rows from within the paternal or maternal network and copied them to create an offspring network. This process passes on parental genes without modification from one generation to the next. Repeating the selection process for each row yields a new offspring network with a random set of both parents' genes.
The initial state vector of the new offspring is chosen at random to equal a stable state of one of the parents. This procedure reflects the assumption that one of the parents would be passing on the general stable gene-product concentrations to its offspring, analogous to the interaction between an oocyte and its mother during the earliest stages of development.
When applied to a population of 1000 viable networks (see Figure 4), haploid mating maintained a stability rate higher than 40% for progeny networks with up to 6 genes. The stability rate drops, however, to 30-40% in networks with 7 to 10 genes, and drops further to between 20-30% for networks with more than 10 genes. This result suggests that it is possible to generate complex networks with haploid mating with a much higher likelihood than generating them at random. Haploid mating is especially efficient at maintaining network stability for lower complexity networks.
It is interesting to draw a parallel between haploid mating in the discrete model and the continuous one. Haploid mating displays the same behavior in both models, with high efficiency in generating viable networks with a small number of interacting genes, but then efficiency falls off sharply as the number of genes increase. In the case of the discrete model, the efficiency drops to almost zero with 8 or more genes. In contrast, the continuous model maintains a more consistently slower drop with increasing number of genes, without ever reaching 0 even for networks of size 15.
Diploid Mating
Diploid individuals benefit from heterozygosity to modulate the effects of damage or deleterious mutations as well as from increasing diversity through the recombination events between the parental chromosomes. In the process of "diploid mating," each row in the W matrix of the progeny is calculated as the arithmetic mean of the corresponding rows in the W matrices of the parents. Biologically, this means that the effects on gene expression are additive, and effects due to dominance, overdominance, underdominance, epigenetics, parent of origin, and so forth are ignored. Taking the impact of each gene as the average of the impacts of this same gene in each parent tends to mitigate large negative or positive effects of the parental genes.
When applied to a set of 1000 viable networks, the diploid mating model generated viable progeny networks of up to 10 interacting genes in 19-32% of the iterations (Figure 4). This percentage is not as high as that in the haploid model, but diploid mating performs better as the number of genes increases. For networks with more than 10 genes, the number of viable offspring networks lies between 43-48%. The positive slope of the curve shows that the diploid mechanism with additive gene effects is very efficient in maintaining stability in complex networks.
A randomly generated network with 15 interacting genes has an 8.9% chance of being viable. When two viable individuals mate following the haploid-mating model, the likelihood of generating a viable network jumps to 22%, however diploid mating increases the likelihood to 47%. This increase may be due to the fact that these original two networks were already selected from a small pool of viable networks with 15 genes, and diploid mating maintains network stability better than haploid mating. We conclude that, while for any level of complexity (number of genes in the network) it is difficult to generate viable complex individuals at random, mating is relatively efficient in producing viable networks of the same level of complexity as those in the parents.
Random Insertion
The difficulty in finding a viable network with more than 10 interacting genes prompted the question of whether increasing the number of genes of a viable network is more successful than generating a viable network at random. To answer this question we randomly inserted a gene into a viable network and developed stable state vectors to test whether stability was retained.
A gene insertion represents the phenomenon of a new gene being fully incorporated by the genome and interacting with the other genes in the network. In the inserted gene all interaction values are chosen at random from the uniform distribution [-1,1], and all pre-existing genes receive new randomly generated values for interaction with the newly inserted gene. The stable vector also receives a new randomly generated value, representing the initial concentration of the product of the new gene. The result is a new individual with an extra transcription factor that may or may not be viable when developed with the augmented network.
From a population of 1000 viable networks we selected each in turn and tried 100 different random insertions and tested for stability. Each insertion adds a new gene at a random place in the network. The graph in Figure 5 shows how many of the 1000 networks yielded at least one viable individual after insertion. The number of genes shown in Figure 5 is the original number of genes in the network prior to the insertion. Insertions had a 62.6% success rate generating viable gene networks of 11 genes derived from 10-gene networks. The efficiency decreases as the number of genes increases, but still succeeds in 60.0% of the attempts of generating a viable network with 16 interacting genes.
Figure 6 shows the result of duplicating an existing gene at random. In this case the probability of generating a viable network is about 50% independent of the number of genes in the network. Gene duplication therefore affords an efficient mechanism of increasing the dimensionality of viable gene networks.
Random Deletion
Similarly to the test with random insertions, the likelihood of obtaining a viable network after removing a gene was tested by deleting one gene at random from a viable network and developing viable individual state vectors to asses if it would remain viable. We performed 100 random deletions in each of the 1000 previously generated viable networks. A gene deletion comprises a row and column deletion in the network, plus an entry deletion for the corresponding gene product in the initial output vector.
For networks with few interacting genes, loss of a gene is critical, with very few networks remaining viable after a deletion. This result is compatible with the difficulty in finding viable networks when there are few interacting genes. With more complex networks the numbers are still high, for example, 67.9% for networks with originally 10 interacting genes, which is significantly greater than the 14% rate for randomly generated networks with 9 interacting genes. Deletion maintains 66.6% of the viable networks with 15 interacting genes.