- Research article
- Open Access
Difference in gene duplicability may explain the difference in overall structure of protein-protein interaction networks among eukaryotes
BMC Evolutionary Biology volume 10, Article number: 358 (2010)
A protein-protein interaction network (PIN) was suggested to be a disassortative network, in which interactions between high- and low-degree nodes are favored while hub-hub interactions are suppressed. It was postulated that a disassortative structure minimizes unfavorable cross-talks between different hub-centric functional modules and was positively selected in evolution. However, by re-examining yeast PIN data, several researchers reported that the disassortative structure observed in a PIN might be an experimental artifact. Therefore, the existence of a disassortative structure and its possible evolutionary mechanism remains unclear.
In this study, we investigated PINs from the yeast, worm, fly, human, and malaria parasite including four different yeast PIN datasets. The analyses showed that the yeast, worm, fly, and human PINs are disassortative while the malaria parasite PIN is not. By conducting simulation studies on the basis of a duplication-divergence model, we demonstrated that a preferential duplication of low- and high-degree nodes can generate disassortative and non-disassortative networks, respectively. From this observation, we hypothesized that the difference in degree dependence on gene duplications accounts for the difference in assortativity of PINs among species. Comparison of 55 proteomes in eukaryotes revealed that genes with lower degrees showed higher gene duplicabilities in the yeast, worm, and fly, while high-degree genes tend to have high duplicabilities in the malaria parasite, supporting the above hypothesis.
These results suggest that disassortative structures observed in PINs are merely a byproduct of preferential duplications of low-degree genes, which might be caused by an organism's living environment.
Large-scale data of protein-protein interactions have become available from several organisms, including Saccharomyces cerevisiae (yeast; [1–4]), Caenorhabditis elegans (worm; ), Drosophila melanogaster (fly; ), Homo sapiens (human; [7, 8]), and Plasmodium falciparum (malaria parasite; ). In a protein-protein interaction network (PIN), a protein and an interaction between two proteins are represented as a node and a link, respectively. The number of links connected to a node is called a degree. The degree distribution P(k) represents the fraction of k-degree nodes in a network and characterizes the structure of a network. It is well known that various biological, technological, and social networks are scale-free networks, in which P(k) follows a power law, i.e., P(k) ~ k-γ [10–12]. In a scale-free network, therefore, most of the nodes have low degrees, but a small number of high-degree nodes (hubs) also exist. In the case of PINs, P(k) better fits a power law with an exponential cut-off, i.e., [13, 14].
A correlation between degrees of two nodes connected by a link is another feature characteristic of a network architecture. A simple way to see the degree correlation is to consider the Pearson correlation coefficient r of the degrees at both ends of a link [12, 15, 16]. A network is called as assortative when r > 0, while it is disassortative when r < 0. In an assortative network, hubs are preferentially connected to other hubs, whereas in a disassortative network, hubs tend to attach to low-degree nodes. It was reported that social networks such as coauthorships of scientific papers or film actor collaborations are assortative, whereas technological and biological networks including Internet, food web, neural network, and PIN are disassortative .
Assortativity of a network can also be evaluated by <Knn(k)>, the mean degree among the neighbors of all k-degree nodes ("nn" in <Knn(k)> represents "nearest neighbors"; [12, 14, 17, 18]). In assortative and disassortative networks, <Knn(k)> follows an increasing and decreasing functions of k, respectively. If there are no degree correlations, <Knn(k)> is independent of k, <Knn(k)> = <k2>/<k> . Several studies reported that the yeast PIN is a disassortative network showing <Knn(k)> ~ k-ν[12, 14, 17], where ν represents the extent of disassortative structure. In the yeast PIN, therefore, links between a hub and a low-degree node are favored, but those between hubs are suppressed. From this observation, Maslov and Sneppen  suggested a picture that, in the yeast PIN, a hub forms a functional module of the cell together with many low-degree neighbors. They hypothesized that the suppression of interactions between hubs minimizes unfavorable cross-talks between different functional modules and increases the robustness of a network against perturbations. Therefore, it is postulated that the disassortative structure in the yeast PIN has been favored by natural selection. Note that, if this hypothesis is true, a disassortative structure should be a general feature that is commonly observed among PINs in any organisms.
To understand the evolutionary mechanisms shaping PIN architectures, several network growth models have been proposed. Many of them are based on gene duplication and divergence, in which a randomly selected node is duplicated to generate a new node having the same links as the original node, and some links are added or eliminated in a divergence process [19–23]. We have recently proposed a non-uniform heterodimerization (NHD) model . In this model, a new link is preferentially attached between two duplicated nodes to create a cross-interaction when they share many common neighbors. We showed that this model can the best reproduce structural features of the yeast PIN, including scale-freeness, a small number of cross-interactions, and a skewed distribution of triangles composed of three nodes and three links. However, this model as well as other duplication-divergence models [21, 22] failed to explain the presence of a disassortative structure in the yeast PIN. Simulation studies showed that these models could generate a decreasing function of <Knn(k)>, yet the value of ν (0.18) in <Knn(k)> ~ k-νis much smaller than the actual value (0.47; see Tables 1 and 2). Therefore, the origin of a disassortative structure still remains unexplained. We should again note that most of these simulation studies were carried out by using the yeast PIN only, because it is currently the best characterized.
It is well-known that large-scale PIN data contain many false positive interactions . Maslov and Sneppen  used a dataset obtained by high-throughput yeast two-hybrid (Y2H) screens  to show suppression of interactions between high-degree nodes. Aloy and Russell , however, argued that the observed suppression of hub-hub interactions is probably an artifact caused by a systematic error in the Y2H data due to prey-bait asymmetry (see also Maslov and Sneppen ). To circumvent the problem of high false positive rates in high-throughput datasets, Batada et al.  used only interactions that were independently reported at least twice in different datasets, and they found that hub-hub interactions were not suppressed in the multi-validated yeast PIN data. However, Hakes et al.  pointed out that multiple validation introduces another problem: interactions observed at least twice will be biased towards well-studied proteins, such as those from particular cellular environments or highly expressed ones. They showed that assortativity of a PIN drastically changes depending on datasets . A literature-curated yeast PIN dataset , which is expected to be reliable because each of the interaction data was derived from small-scale experiments, showed a disassortative structure; however, when they retained only interactions observed twice or three times, it became rather assortative . Therefore, the presence of a disassortative structure in a PIN itself has now become controversial. These studies suggest that a global structure of a PIN has to be investigated by using various datasets obtained from different methods.
The purpose of this paper is to investigate the presence of disassortative structures in PINs and an evolutionary mechanism shaping disassortative structures, if any. For this purpose, we examined eukaryotic PINs from the yeast, worm, fly, human, and malaria parasite. We analyzed four large-scale yeast PIN datasets (MIPS ; Yu et al. ; Reguly et al. ; Batada et al. ). The datasets include Batada et al.'s updated version of a multi-validated dataset, Reguly et al.'s comprehensive literature-curated dataset, and MIPS , which has been called a "gold standard" of yeast protein interaction dataset generated by manual curations by experts. We also used recently published high-quality protein interaction data by Yu et al. , which were obtained by compiling several Y2H datasets. In addition, we examined two independent human PIN datasets (Rual et al. ; Stelzl et al. ). As a result, we show that the yeast, worm, fly, and human PINs have disassortative structures, while malaria parasite PIN is not disassortative. We then propose a possible evolutionary mechanism causing the difference in assortativity among species.
In this study, we examined nine PIN datasets from yeast, worm, fly, human, and malaria parasite (Table 1). Although the numbers of nodes and links are quite different among the five species, their degree distributions P(k) follow nearly the same curve (Figure 1 and additional file 1: Figure S1). All of the PINs examined are scale-free, suggesting that scale-freeness is a general feature of PINs. These observations are consistent with Suthram et al. .
On the other hand, a disassortative structure was not commonly observed among PINs. Although <Knn(k)> for the yeast, worm, fly, or human PIN is a decreasing function following k-ν, the malaria parasite PIN is not disassortative (Figure 2A and additional file 2: Figure S2). Note that all of the four yeast PIN datasets showed a disassortative structure regardless of the controversy on the presence of hub-hub suppression (see additional file 2: Figure S2; see Discussion). The values of ν for the eight PINs in yeast, worm, fly, and human examined are significantly non-zero (P < 3×10-4), while the value of ν for the malaria parasite PIN is not significantly different from zero (P ~ 0.27). The difference in ν between the malaria parasite PIN and each of the other eight PINs is also significant (P < 1×10-3; analysis of covariance). In agreement with these observations, the correlation coefficient r between degrees of connected nodes in the yeast, worm, fly, or human PIN is negative, while that in the malaria parasite PIN is nearly zero (Table 1).
We next examined a possible evolutionary scenario generating the difference in assortativity of PINs among species on the basis of a duplication-divergence model. Figure 2B (middle) illustrates a simple network containing a low-degree node (e.g., A) and a high-degree node (e.g., C) that are connected to each other. In a duplication process, a randomly selected node is duplicated to generate a new node having the same links as the original node, followed by a divergence process in which some links are eliminated. If a low-degree node A is duplicated to generate a new node A' (Figure 2B, right), the value of ν in a network increases, because a degree of a node (C) connected to a low-degree node increases. On the other hand, duplication of a high-degree node (C) causes the value of ν to decrease, because a degree of a node (A) connected to a high-degree node increases (Figure 2B, left). Therefore, we can hypothesize that duplications of low- and high-degree nodes in a disassortative network have an effect to make the value of ν larger and smaller, respectively.
To examine this issue in more detail, we developed a new duplication-divergence model named the degree-dependent duplication (DDD) model by modifying the NHD model that we proposed previously . In the DDD model, a duplication of a node occurs depending on its degree. In a duplication process, a randomly selected node is duplicated with a probability proportional to 1 + σk, where k is the degree of the node, and σ is a parameter determining the duplicability of the node (see Methods for details).
As for a divergence process, we examined two different models, the asymmetric divergence and the symmetric divergence (Figure 3). In the former, the removal of links occurs in only one of the duplicated nodes, while in the latter, links are lost from both of the duplicates with an equal probability. In this study, we conducted simulations using four different models: NHD with the asymmetric and symmetric divergence, which is referred to as NHD+A and NHD+S, respectively, and DDD with the asymmetric and symmetric divergence (DDD+A and DDD+S, respectively) (Table 2).
Simulation studies showed that the value of ν increases (the slope becomes steeper) as σ decreases for both DDD+A and DDD+S (Figure 2C). We found that the disassortative structures of the yeast (MIPS), worm, and fly PINs were successfully reproduced by DDD+A and DDD+S when the values of σ are negative (Table 2, additional file 3: Figure S3). The human (Rual et al.) PIN was best regenerated by DDD+S with σ = 0. Note that, although σ = 0 means no degree-dependency of duplicability, where the DDD model becomes identical to the NHD model, the resultant network is still disassortative (Figure 2C). Therefore, in order to generate a network similar to the malaria parasite PIN, the value of σ has to be positive, i.e., high-degree nodes should be duplicated more preferentially than low-degree nodes. In fact, our analysis showed that the assortativity of the malaria parasite PIN was reproduced by the DDD model with a positive σ (see Table 2 and additional file 3: Figure S3E).
The effect of link gains after gene duplication was also investigated. However, random attachments of links to duplicated nodes do not essentially affect the assortativity of resultant networks (additional file 4: Figure S4).
We also examined the average shortest path length, <L> and the extent of modularity, M in PINs (Table 1) and simulation-generated networks (Table 2). In agreement with our previous study , the values of <L> in the networks by NHD+A are larger than the actual values in PINs for all species. DDD+A gave the <L> values that are slightly closer to the actual values than NHD+A. On the other hand, for both NHD and DDD models, the symmetric divergence generated networks having larger values of <L>. It was reported that PINs are highly modular , but simulation-generated networks showed even higher values of M than the PINs (Table 2). Moreover, when we compare four networks generated by different models for each species, the value of M is positively correlated with that of <L>, which is consistent with Zhang and Zhang .
To see whether the difference in duplicability dependent on degrees accounts for the difference in assortativity, we analyzed orthologous relationships using proteomes in 55 eukaryote species. Wapinski et al.  provided data of orthologous relationships among 19 Ascomycota fungi including S. cerevisiae. In their dataset, all proteins in these 19 species are classified into ortholog groups, each of which consists of the proteins descended from a single ancestral protein in their most recent common ancestor. To evaluate the duplicability of a given gene in S. cerevisiae, we examined orthologous relationships between S. cerevisiae and each of the other 18 Ascomycota fungi. A phylogenetic tree was constructed using orthologous genes from the two species, and the number of gene duplication events observed in the phylogenetic tree was regarded as a duplicability of the gene (see Methods). In the same manner, we also evaluated gene duplicability in C. elegans, D. melanogaster, H. sapiens, and P. falciparum using other databases (see Methods).
Figure 4 and additional file 5: Figure S5 indicate the relationships between the degree and the duplicability. We classified all proteins in each PIN into three categories containing similar numbers of proteins: low- (k = 1), middle- (k = 2 - 6), and high- (k > 6) degree proteins. The results showed that the duplicability of low- and middle-degree proteins is significantly higher than that of high-degree proteins in the yeast and worm PINs (Figure 4 and additional file 5: Figure S5). The same trend was also observed in the fly PIN. In contrast, the duplicability of low- and middle-degree proteins is significantly lower than that of high-degree proteins in the malaria parasite PIN, while no clear trends were observed in the human PIN (Figure 4). These observations are consistent with the above hypothesis; i.e., the differences in degree-dependent duplicability of genes account for the difference in assortativity among species.
We also investigated the differences in degrees and duplicabilities among different functional categories in yeast and malaria parasite proteins. Table 3 shows the mean degree and the mean duplicability of yeast proteins belonging to each category obtained from the GO (gene ontology) slim database in the Saccharomyces Genome Database . Interestingly, genes in several categories with significantly higher (lower) degrees on average showed significantly lower (higher) duplicabilities. A similar analysis was conducted for malaria parasite proteins using the GO in the PlasmoDraft database  (Table 4). In this case, functional categories with high (low) degrees tend to show high (low) duplicabilities (additional file 6: Figure S6), which is an opposite trend to that observed in yeast proteins. The slopes in the degree-duplicability relationships are significantly different between the yeast and malaria parasite PINs (P < 0.01; analysis of covariance).
Disassortative structures in PINs
In this paper, we showed that the yeast, worm, fly, and human PINs are disassortative, while the malaria parasite PIN is not disassortative. Therefore, a disassortative structure is not a common feature of PINs. By comparing proteomes and conducting simulations, we demonstrated that the difference in assortativity can well be explained by assuming that the duplicability of proteins is dependent on its degree and the dependency is different among species. If low-degree proteins have preferentially duplicated in evolution as in yeast, worm, and fly, or there is no trend in the duplicability between low- and high-degree proteins as in the human, the PIN becomes disassortative. On the other hand, a PIN without a disassortative structure could be generated if high-degree proteins have preferentially duplicated as in malaria parasite. Therefore, for explaining the presence of a disassortative structure in PINs, the "selectionist view" as proposed by Maslov and Sneppen  is not necessary. It is rather likely that a disassortative structure observed in PINs is merely a byproduct of preferential duplications of low-degree proteins.
Although several authors [25, 27] claimed that the suppression of hub-hub interactions may be an artifact, our analyses using four recently published high-quality yeast PIN datasets demonstrated that all of the four PINs are in fact disassortative. In Batada et al. , they mentioned that the interactions between hubs are not suppressed, where a hub was defined as a node with k > 21 (top 10% of the nodes). However, the same data showed that the interactions between nodes with relatively high degrees (20 < k < 30) and those with very high degrees (k > 50) are suppressed and interactions between low-degree nodes (k < 3) and high-degree nodes (k > 50) are favored. Therefore, Batada et al.'s data  is not inconsistent with the presence of a disassortative structure. Moreover, the updated version  of their multi-validated yeast PIN data clearly showed disassortativity (see additional file 2: Figure S2A). These results suggest that a disassortative structure in the yeast PIN is not an artifact.
Fernández  classified yeast proteins into several categories on the basis of the existence of orthologous proteins in other genomes, e.g., the proteins that are present in eukaryotes, eubacteria, and archaebacteria, or those present in other fungi. He found that an "ancient" network consisting of proteins that are present in diverse organisms tends to be assortative and the assortative ancient network evolved into the disassortative PIN in yeast at the present time. To explain this evolutionary trend, Fernández  hypothesized a model in which an attachment of new links between similar-degree nodes is disfavored. Note that our DDD model is also consistent with the evolutionary trend toward higher disassortativity (see additional file 7: Figure S7).
PIN data include binary interaction information that is directly obtained from experiments such as Y2H and indirectly inferred from protein complex data. Wang and Zhang pointed out that these two types of data may give quite different images of PINs . We therefore excluded protein complex data from the MIPS database and reexamined the yeast PIN. The result, however, showed no significant differences in disassortativity between PINs with and without complex data (additional file 8: Figure S8). We should also note that PINs are a collection of potential interactions that occur at different times in different cells or subcellular locations, but we treated all interactions simultaneously. To see how such treatment affects our results, we examined yeast subnetworks constructed from the proteins in each subcellular localization separately. As shown in additional file 9: Figure S9, although the extent of disassortativity varies among different subcellular locations due to smaller sample sizes, in general such subnetworks also show disassortative structures.
Neofunctionalization and subfunctionalization
It is generally thought that gene duplication is a primary source for generating organismal complexity. Neofunctionalization and subfunctionalization are proposed as a fate of duplicated genes. Neofunctionalization hypothesizes that the presence of redundant copies of genes allows one duplicate to be free from selective pressure, and thus one of the duplicates can accumulate random mutations and potentially acquire novel functions . Subfunctionalization argues that each of the duplicates accumulates degenerative mutations, resulting in the division of ancestral functions into complementary subsets . Both neofunctionalization and subfunctionalization contribute to protein evolution [39–42].
In the duplication-divergence model, neofunctionalization and subfunctionalization are modeled as a random attachment of new links  and a random loss of links to duplicated nodes , respectively. Our simulation studies showed a high rate of link losses (α > 0.5; see Table 2), suggesting the importance of subfunctionalization. On the other hand, link gains were shown to have only minor effects to the structure of PINs (additional file 4: Figure S4). Altogether, our study supports a view that subfunctionalization plays a significant role in shaping the structures of PINs, which is consistent with a recent study by Gibson and Goldberg .
As for subfunctionalization, it has been reported that the number of links retained after gene duplication is considerably different between two duplicates . For this reason, several previous studies used the asymmetric divergence model [14, 45–48]. However, "complete" asymmetric divergence in which links are eliminated from only one of the duplicates is unrealistic, and the actual situation should be between asymmetric divergence and symmetric divergence. We have therefore conducted simulation studies using both symmetric and asymmetric divergence models. The results, however, did not show essential differences (Table 2).
In this study, we found that lower-degree proteins tend to duplicate more frequently in the yeast, worm, and fly PINs (Figure 4). One caveat of this analysis is that the degrees of proteins used in these analyses are present-day degrees and thus might be different from those prior to duplication. Because the number of interactions often changes greatly after duplications [19, 41], the observed degree-duplicability correlation may also be interpreted as that degrees decrease after duplication by divergence rather than that the duplicability itself is dependent on a degree. However, under this interpretation, it is difficult to explain the difference in the trend of degree-duplicability correlations among different species (Figure 4). Moreover, as mentioned above, the duplication-divergence model without considering degree-dependent duplicability is insufficient to explain the extent of disassortativity in the yeast, worm, and fly PINs.
Prachumwat and Li  found a positive correlation between degree and the proportion of unduplicated proteins in the yeast proteome, which is consistent with our results. Liang et al.  showed that the extent of protein under-wrapping, which indicates the solvent accessibility to backbone hydrogen bonds, is negatively correlated with gene duplicability in Escherichia coli, yeast, worm, fly, human, and Arabidopsis thaliana. They also found that the correlation becomes weaker for more complex organisms. It was reported that the extent of protein under-wrapping is strongly positively correlated with the degree of proteins in yeast ; therefore, their results are also consistent with ours (Figure 4). In Liang et al. , gene duplicability was defined as a protein family size. In this study, we evaluated gene duplicability by directly counting the number of gene duplication events using orthologous genes in closely related species. Therefore, we considered more recent gene duplications than Prachumwat and Li  and Liang et al. . He and Zhang showed that low-degree nodes are less important  and less important genes tend to duplicate more frequently . Their results are also consistent with ours.
Why low-degree proteins tend to be duplicated frequently in the evolution of the yeast PIN? The actual reason is currently unclear. Yet, as indicated in Table 3, some functional categories showed low degrees but high duplicabilities on average, while others showed high degrees and low duplicabilities. The former includes metabolic processes for carbohydrates or vitamins. Marland et al.  reported that the duplicability of genes involved in metabolism, especially in central metabolism, is significantly higher than that for non-metabolic genes in both yeast and E. coli. Moreover, most of the enzymes involved in these metabolic processes bind only to a specific substrate, and probably for this reason, their degrees are relatively low. The categories showing a high degree and a low duplicability are exemplified by organelle organization and biogenesis, RNA metabolic process, and transcription (see Table 3). The category "organelle organization and biogenesis" contains many proteins involved in the organization of actin filaments or cytoskeletons. Actin and actin-related proteins are known to bind many partner proteins . At the same time, they are highly conserved from yeasts to humans , and therefore gene duplications of these genes are apparently rare.
Why, then, are high-degree proteins duplicated preferentially in the evolution of the malaria parasite PIN? Table 4 indicates that genes belonging to the categories pathogenesis and interaction with host tend to have high degrees and high duplicability, though the numbers of genes in these categories are not large. These categories include many proteins of Pf erythrocyte membrane protein 1 (PfEMP1) family. PfEMP1 proteins interact with receptors in the host and change the morphology of the host cell ; therefore, the duplications of these genes would be beneficial to malaria parasites. Moreover, a PfEMP1 protein has a feature of an adhesive molecule  and can bind many partner proteins. However, the actual reason for the opposite trend of gene duplicability in the entire PIN of malaria parasite to that of other eukaryotes is currently unclear. It would be intriguing to investigate the PINs of other parasitic organisms.
These observations suggest that the duplicability of the proteins having a given function can be different and determined by each organism's living environment. The duplicability of genes for each species would, in turn, determine the overall structure of a PIN. The availability of high-quality interaction data from various species including parasitic organisms will help us to clarify the relationships between environments where organisms inhabit and the evolution of their PINs in greater detail.
In this study, we showed that disassortative structures are not common features among eukaryotes by examining nine different PINs from five eukaryote species. We found that low-degree proteins tend to show high duplicabilities for the PIN with a disassortative structure (e.g. yeast), while an opposite trend was observed for the PIN without disassortativity (e.g. malaria parasite). Simulation studies on the basis of gene duplication and divergence also supported these observations. Therefore, for explaining the presence of disassortative structure, any selective forces on the entire structure of PINs are unnecessary. Our results indicate that overall structure of PINs is primarily determined by local processes in the course of evolution.
PIN and GO data
The datasets of the yeast PIN were obtained from the MIPS (Munich Information Center for Protein Sequences) database http://mips.gsf.de (18 May 2006) , Batada et al. , Reguly et al. , and Yu et al. . Worm and Fly PIN data were obtained from Li et al.  and IM Browser http://proteome.wayne.edu/PIMdb.html, respectively. The datasets of the human PIN were from Rual et al.  and Stelzl et al. , and Malaria parasite PIN was from LaCount et al. . Some of these datasets contain components that are not connected to each other. In these cases, we used the largest component for the analysis. All self-interactions were removed. The yeast GO slim dataset was downloaded from the ftp site of Saccharomyces Genome Database ftp://genome-ftp.stanford.edu/pub/yeast/literature_curation/. The GO dataset for P. falciparum was obtained from PlasmoDraft . The yeast PIN excluding protein complex data was obtained from http://www.umich.edu/~zhanglab/download.htm.
PINs have a modular structure, in which interactions between proteins are much denser within a module than between modules . The modularity m for a particular separation of a network is calculated by , where N is the number of modules, L is the number of links in a network, ls is the number of links within module s, and k s is the sum of the degrees of nodes in module s . The separation that maximizes m is considered to be optimal. The maximum m among all possible separation of a given network is referred to as the modularity of the network and denoted as M. We used the method by Vincent et al.  for searching the optimal separation, since the method gives excellent accuracy for module separation and outperforms other methods in terms of a computational time .
The simulation studies were conducted using a duplication-divergence model in a similar manner to Hase et al.  with a modification. In the DDD model, a new node and new links are added to the network according to the following rules at each time step of a simulation. (1) A node in a network is randomly selected (A). Node A is duplicated to generate a new node (A') with a probability (1 + σk)/1,000 (when 1 + σk >0), where k is the degree of node A, and σ is a parameter determining the duplicability of a node for each species. The probability is defined to be 0 when 1 + σk is lower than 0. The interacting pattern of node A' is identical to that of node A. (2) For a divergence process, two different models were examined: the asymmetric divergence  and the symmetric divergence (Figure 3). In the former, links to node A' is removed with a uniform probability α. In the latter, for each of the nodes connecting to A and A' (e.g. node B), one of the two links (either A-B link or A'-B link) is randomly chosen and is removed with a probability α (Figure 3). (3) A new link between node A and node A' is created with a probability βnN (when βnN ≤ 1), where nN is the number of common neighbors shared by these two nodes. The probability is defined to be one when βnN is greater than 1. If there are no links to node A' after these processes (all links to node A' were removed and no links were generated), node A' is not added to the network.
The processes (1)-(3) were repeated until the number of nodes in a network became the same as those in the PINs for a given species. We used various values of σ, α, and β and performed simulations. The value of σ was changed from -0.05 to 0 by 0.01 and from 0 to 10.0 by 0.1, and the values of α and β were changed from 0 to 1 by 0.01 and 0.001, respectively. For a given set of σ, α, and β, we conducted simulations 100 times. We then calculated the mean of <k> and the mean of <C> from the 100 networks. Moreover, we calculated the mean of <Knn(k)> from the 100 networks. The value of ν represents the slope of the regression line of the mean of <Knn(k)>. In Table 2, the values of σ, α, and β that could reproduce <k>, <C>, and ν in each PIN are shown.
We also examined a model considering link gains. In this model, the following process was added after the process (3) in the DDD model: A link is attached between each of the two duplicated nodes (A and A') and a randomly selected node with a probability ε. The value of ε was changed from 0.01 to 0.1. σ = -0.05 was used for both asymmetric and symmetric divergence. The values of α and β were determined in the same way as the DDD model.
We examined the duplicability of genes in yeast, worm, fly, human, and malaria parasite by using orthologous relationships among closely related species. For yeast genes, we used the dataset of ortholog groups for 19 Ascomycota fungi including S. cerevisiae downloaded from Fungal Orthogroups Repository http://www.broad.mit.edu/regev/orthogroups/. This dataset provides ortholog groups, each of which consists of genes descended from a gene in the last common ancestor of 19 Ascomycota fungi. Duplicability of genes in the yeast PIN was evaluated by considering orthologous relationships between S. cerevisiae and each of the other 18 fungal species. Let us consider the comparison between S. cerevisiae and S. paradoxus, for instance. Because some ortholog groups do not contain any genes from some of the 19 species, we consider only ortholog groups containing at least one gene from both S. cerevisiae and S. paradoxus. Suppose that a given ortholog group contains two genes from S. cerevisiae and three genes from S. paradoxus (and more from other species). We constructed a phylogenetic tree from these five genes by the neighbor-joining (NJ) method  using ClustalW . We then counted the number of duplication events from the tree using Notung (ver. 2.5) . This number is regarded to be duplicabilities for both of two S. cerevisiae genes. In this way, the value of duplicability was assigned to each protein in the yeast PIN. Similarly, we calculated duplicability of genes contained in the worm, fly, human, and malaria parasite PINs. For worm and malaria parasite genes, we used OrthoMCL-DB version 2 http://orthomcl.cbil.upenn.edu, which contains ortholog groups of three nematode species including C. elegans and those of six Haemosporidian species including P. falciparum. For fly and human genes, we used ortholog groups of 12 Drosophila species and those of 11 vertebrate species including seven mammals, respectively, downloaded from OrthoDB http://cegg.unige.ch/orthodb.
protein-protein interaction network
Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR, Lockshon D, Narayan V, Srinivasan M, Pochart P, Qureshi-Emili A, Li Y, Godwin B, Conover D, Kalbfleisch T, Vijayadamodar G, Yang M, Johnston M, Fields S, Rothberg JM: A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature. 2000, 403: 623-627. 10.1038/35001009.
Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y: A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci USA. 2001, 98: 4569-4574. 10.1073/pnas.061034498.
Guldener U, Munsterkotter M, Oesterheld M, Pagel P, Ruepp A, Mewes HW, Stumpflen V: Mpact: The MIPS protein interaction resource on yeast. Nucleic Acids Res. 2006, 34: 436-441. 10.1093/nar/gkj003.
Yu H, Braun P, Yildirim MA, Lemmens I, Venkatesan K, Sahalie J, Hirozane-Kishikawa T, Gebreab F, Li N, Simonis N, Hao T, Rual JF, Dricot A, Vazquez A, Murray RR, Simon C, Tardivo L, Tam S, Svrzikapa N, Fan C, de Smet AS, Motyl A, Hudson ME, Park J, Xin X, Cusick ME, Moore T, Boone C, Snyder M, Roth FP, Barabási AL, Tavernier J, Hill DE, Vidal M: High-quality binary protein interaction map of the yeast interactome network. Science. 2008, 322: 104-110. 10.1126/science.1158684.
Li S, Armstrong CM, Bertin N, Ge H, Milstein S, Boxem M, Vidalain PO, Han JD, Chesneau A, Hao T, Goldberg DS, Li N, Martinez M, Rual JF, Lamesch P, Xu L, Tewari M, Wong SL, Zhang LV, Berriz GF, Jacotot L, Vaglio P, Reboul J, Hirozane-Kishikawa T, Li Q, Gabel HW, Elewa A, Baumgartner B, Rose DJ, Yu H, Bosak S, Sequerra R, Fraser A, Mango SE, Saxton WM, Strome S, Van Den Heuvel S, Piano F, Vandenhaute J, Sardet C, Gerstein M, Doucette-Stamm L, Gunsalus KC, Harper JW, Cusick ME, Roth FP, Hill DE, Vidal M: A map of the interactome network of the metazoan C. elegans. Science. 2004, 303: 540-543. 10.1126/science.1091403.
Giot L, Bader JD, Brouwer C, Chaudhuri A, Kuang B, Li Y, Hao YL, Ooi CE, Godwin B, Vitols E, Vijayadamodar G, Pochart P, Machineni H, Welsh M, Kong Y, Zerhusen B, Malcolm R, Varrone Z, Collis A, Minto M, Burgess S, McDaniel L, Stimpson E, Spriggs F, Williams J, Neurath K, Ioime N, Agee M, Voss E, Furtak K, Renzulli R, Aanensen N, Carrolla S, Bickelhaupt E, Lazovatsky Y, DaSilva A, Zhong J, Stanyon CA, Finley RL, White KP, Braverman M, Jarvie T, Gold S, Leach M, Knight J, Shimkets RA, McKenna MP, Chant J, Rothberg JM: A protein interaction map of Drosophila Melanogaster. Science. 2003, 302: 1727-1736. 10.1126/science.1090289.
Rual JF, Venkatesan K, Hao T, Hirozane-Kishikawa T, Dricot A, Li N, Berriz GF, Gibbons FD, Dreze M, Ayivi-Guedehoussou N, Klitgord N, Simon C, Boxem M, Milstein S, Rosenberg J, Goldberg DS, Zhang LV, Wong SL, Franklin G, Li S, Albala JS, Lim J, Fraughton C, Llamosas E, Cevik S, Bex C, Lamesch P, Sikorski RS, Vandenhaute J, Zoghbi HY, Smolyar A, Bosak S, Sequerra R, Doucette-Stamm L, Cusick ME, Hill DE, Roth FP, Vidal M: Towards a proteome-scale map of the human protein-protein interaction network. Nature. 2005, 437: 1173-1178. 10.1038/nature04209.
Stelzl U, Worm U, Lalowski M, Haenig C, Brembeck FH, Goehler H, Stroedicke M, Zenkner M, Schoenherr A, Koeppen S, Timm J, Mintzlaff S, Abraham C, Bock N, Kietzmann S, Goedde A, Toksöz E, Droege A, Krobitsch S, Korn B, Birchmeier W, Lehrach H, Wanker EE: A human protein-protein interaction network: a resource for annotating proteome. Cell. 2005, 122: 957-968. 10.1016/j.cell.2005.08.029.
LaCount DJ, Vignali M, Chettier R, Phansalkar A, Bell R, Hesselberth JR, Schoenfeld LW, Ota I, Sahasrabudhe S, Kurschner C, Fields S, Hughes RE: A protein interaction network of malaria parasite Plasmodium falciparum. Nature. 2005, 438: 103-107. 10.1038/nature04104.
Barabási AL, Albert R: Emergence of scaling in random networks. Science. 1999, 286: 509-512. 10.1126/science.286.5439.509.
Barabási AL, Oltvai ZN: Network biology: understanding the cell's functional organization. Nat Rev Genet. 2004, 5: 101-113. 10.1038/nrg1272.
Costa LF, Rodrigues FA, Travieso G, Boas V: Characterization of complex networks: A survey of measurements. ADV PHYS. 2007, 56: 167-242. 10.1080/00018730601170527.
Jeong H, Mason SP, Barabási AL, Oltvai ZN: Lethality and centrality in protein networks. Nature. 2001, 411: 41-42. 10.1038/35075138.
Hase T, Niimura Y, Kaminuma T, Tanaka H: Non-uniform survival rate of heterodimerization links in the evolution of the yeast protein-protein interaction network. PLoS ONE. 2008, 3: e1667-10.1371/journal.pone.0001667.
Callaway DS, Hopcroft JE, Kleinberg JM, Newman MEJ, Strogatz SH: Are randomly grown graphs really random?. Phys Rev E. 2001, 64: 041902-10.1103/PhysRevE.64.041902.
Newman ME: Assortative mixing in networks. Phys Rev Lett. 2002, 89: 208701-10.1103/PhysRevLett.89.208701.
Maslov S, Sneppen K: Specificity and stability in topology of protein networks. Science. 2002, 296: 910-913. 10.1126/science.1065103.
Pastor-Satorras R, Vazquez A, Vespignani A: Dynamical and correlation properties of the internet. Phys Rev Lett. 2001, 87: 258701-10.1103/PhysRevLett.87.258701.
Wagner A: The yeast protein interaction network evolves rapidly and contains few redundant duplicate genes. Mol Biol Evol. 2001, 18: 1283-1292.
Solé RV, Pastor-Satorras R, Smith ED, Kepler T: A model of large-scale proteome evolution. Adv Comp Syst. 2002, 5: 43-54. 10.1142/S021952590200047X.
Pastor-Satorras R, Smith E, Solé RV: Evolving protein interaction networks through gene duplication. J Theor Biol. 2003, 222: 199-210. 10.1016/S0022-5193(03)00028-6.
Vazquez A: Growing networks with local rules: preferential attachment, clustering hierarchy and degree correlations. Phys Rev E. 2003, 67: 056104-10.1103/PhysRevE.67.056104.
Ispolatov I, Krapivsky PL, Mazo I, Yuryev A: Cliques and duplication-divergence network growth. New Journal of Physics. 2005, 7: 145-10.1088/1367-2630/7/1/145.
von Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, Bork P: Comparative assessment of large-scale data sets of protein-protein interactions. Nature. 2002, 417: 399-403. 10.1038/nature750.
Aloy P, Russell RB: Potential artefacts in protein-interaction networks. FEBS Lett. 2002, 530: 253-254. 10.1016/S0014-5793(02)03427-0.
Maslov S, Sneppen K: Protein interaction networks beyond artifacts. FEBS Lett. 2002, 530: 255-256. 10.1016/S0014-5793(02)03428-2.
Batada NN, Reguly T, Breitkreutz A, Boucher L, Breitkreutz BJ, Hurst LD, Tyers M: Stratus not altocumulus: a new view of the yeast protein interaction network. PLoS Biol. 2006, 4: e317-10.1371/journal.pbio.0040317.
Hakes L, Pinney JW, Robertson DL, Lovell SC: Protein-protein interaction networks and biology-what's the connection?. Nature Biotechnol. 2008, 26: 69-72. 10.1038/nbt0108-69.
Reguly T, Breitkreutz A, Boucher L, Breitkreutz BJ, Hon GC, Myers CL, Parsons A, Friesen H, Oughtred R, Tong A, Stark C, Ho Y, Botstein D, Andrews B, Boone C, Troyanskya OG, Ideker T, Dolinski K, Batada NN, Tyers M: Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae. J Biol. 2006, 5: 11-10.1186/jbiol36.
Batada NN, Reguly T, Breitkreutz A, Boucher L, Breitkreutz BJ, Hurst LD, Tyers M: Still stratus not altocumulus: further evidence against the Date/Party hub distinction. PLoS Biol. 2007, 5: e154-10.1371/journal.pbio.0050154.
Suthram S, Sittler T, Ideker T: The Plasmodium protein network diverges from those of other eukaryotes. Nature. 2005, 438: 108-112. 10.1038/nature04135.
Wang Z, Zhang J: In search of the biological significance of modular structures in protein networks. PLoS Compt Biol. 2007, 3: e107-10.1371/journal.pcbi.0030107.
Zhang Z, Zhang J: A big world inside small-world networks. PLoS ONE. 2009, 4: e5686-10.1371/journal.pone.0005686.
Wapinski I, Pfeffer A, Friedman N, Regev A: Natural history and evolutionary principles of gene duplication in fungi. Nature. 2007, 449: 54-61. 10.1038/nature06107.
Brehelin L, Dufayard JF, Gascuel O: PlasmoDraft: a database of Plasmodium falciparum gene function prediction based on postgenomic data. BMC Bioinformatics. 2008, 16: 440-10.1186/1471-2105-9-440.
Fernández A: Molecular basis for evolving modularity in the yeast protein interaction network. PLoS Compt Biol. 2007, 3: e226-10.1371/journal.pcbi.0030226.
Ohno S: Evolution by gene duplication. 1970, New York: Springer
Force A, Lynch M, Pickett FB, Amores A, Yan YL, Postlethwait J: Preservation of duplicate genes by complementary, degenerative mutations. Genetics. 1999, 151: 1531-1545.
Lynch M, Force A: The probability of duplicate gene preservation by subfunctionalization. Genetics. 2000, 154: 459-473.
Blanc G, Wolfe KH: Functional divergence of duplicated genes formed by polyploidy during Arabidopsis evolution. Plant Cell. 2004, 16: 1679-1691. 10.1105/tpc.021410.
He X, Zhang J: Rapid subfunctionalization accompanied by prolonged and substantial neofunctionalization in duplicate gene evolution. Genetics. 2005, 169: 1157-1164. 10.1534/genetics.104.037051.
Freilich S, Massingham T, Blanc E, Goldovsky L, Thornton JM: Relating tissue specialization to the differentiation of expression of singleton and duplicate mouse protein. Genome Biol. 2006, 7: R89-10.1186/gb-2006-7-10-r89.
Gibson TA, Goldberg DS: Questioning the ubiquity of neofunctionalization. PLoS Compt Biol. 2009, 5: e1000252-10.1371/journal.pcbi.1000252.
Wagner A: Asymmetric functional divergence of duplicate genes in yeast. Mol Biol Evol. 2002, 19: 1760-1768.
Wagner A: How the global structure of protein interaction networks evolves. Proc R Soc Lond B. 2003, 270: 457-466. 10.1098/rspb.2002.2269.
Kim J, Krapivsky PL, Kahng B, Render S: Infinite-order percolation and giant fluctuations in a protein interaction network. Phys Rev E. 2002, 66: 055101-10.1103/PhysRevE.66.055101.
Chung F, Lu L, Dewey TG, Galas DJ: Duplication models for biological networks. J Comput Biol. 2003, 10: 677-687. 10.1089/106652703322539024.
Isporatov I, Krapivsky PL, Yuryev R: Duplication-divergence model of protein interaction network. Phys Rev E. 2005, 71: 061911-10.1103/PhysRevE.71.061911.
Prachumwat A, Li WH: Protein function, connectivity, and duplicability in yeast. Mol Biol Evol. 2006, 23: 30-39. 10.1093/molbev/msi249.
Liang H, Plazonic KR, Chen J, Li WH, Fernández A: Protein under-wrapping causes dosage sensitivity and decreases gene duplicability. PLoS Genet. 2008, 4: e11-10.1371/journal.pgen.0040011.
Fernández A, Scott R, Berry RS: The nonconserved wrapping of conserved protein folds reveals a trend toward increasing connectivity in proteomic networks. Proc Natl Acad Sci USA. 2004, 101: 2823-2827. 10.1073/pnas.0308295100.
He X, Zhang J: Why hubs tend to be essential in protein networks?. PloS Genet. 2006, 2: e88-10.1371/journal.pgen.0020088.
He X, Zhang J: Higher duplicability of less important genes in yeast genomes. Mol Biol Evol. 2006, 23: 144-151. 10.1093/molbev/msj015.
Marland E, Prachumwat A, Maltsev N, Gu Z, Li WH: Higher gene duplicabilities for metabolic proteins than for nonmetabolic proteins in yeast and E. Coli. J Mol Evol. 2004, 59: 806-814. 10.1007/s00239-004-0068-x.
Remedios CGD, Chhabra D, Kekic M, Dedova IV, Tsubakihara M, Berry DA, Nosworthy NJ: Actin binding proteins: regulation of cytoskeletal microfilaments. Physiol Rev. 2003, 83: 433-473.
Goodson HV, Hwse WF: Molecular evolution of the actin family. J Cell Science. 2002, 115: 2619-2622.
Pasternak ND, Dzikowski R: PfEMP1: An antigen that plays a key role in the pathogenicity and immune evasion of the malaria parasite Plasmodium falciparum. Int J Biochem Cell Biol. 2009, 41: 1463-1466. 10.1016/j.biocel.2008.12.012.
Chen BQ, Barragan A, Fernández V, Sundstrom A, Schlichtherle M, Sahlen A, Carlson J, Datta S, Wahlgren M: Identification of Plasmodium falciparum erythrocyte membrane protein 1 (PfEMP1) as the resetting ligand of the malaria parasite P. falciparum. J Exp Med. 1998, 187: 15-23. 10.1084/jem.187.1.15.
Pacifico S, Liu G, Guest S, Parrish JR, Fotouhi F, Finley RL: A database and tool, IM Browser, for exploring and integrating emerging gene and protein interaction data for Drosophila. BMC Bioinformatics. 2006, 7: 195-10.1186/1471-2105-7-195.
Guimerá R, Amaral LAN: Functional cartography of complex metabolic networks. Nature. 2005, 433: 895-900. 10.1038/nature03288.
Vincent DB, Guillaume JL, Lambiotte R, Lefebvre : Fast unfolding of communities in large networks. J Stat Mech. 2008, 10: P10008-
Saitou N, Nei M: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol. 1987, 4: 406-425.
Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG: ClustalW and ClustalX version 2. Bioinformatics. 2007, 23: 2947-2948. 10.1093/bioinformatics/btm404.
Chen K, Durand D, Farach-Colton M: NOTUNG: a program for dating gene duplications and optimizing gene family trees. J Compt Biol. 2000, 7: 429-447. 10.1089/106652700750050871.
Chen F, Mackey AJ, Stoeckert CJ, Roos DS: OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups. Nucleic Acids Res. 2006, 1: 363-368. 10.1093/nar/gkj123.
Kriventseva EV, Rahman N, Espinosa O, Zdobnov EM: OrthoDB: the hierarchical catalog of eukaryotic orthologs. Nucleic Acids Res. 2008, 36: 271-275. 10.1093/nar/gkm845.
Watts DJ, Strogatz SH: Collective dynamics of 'small-world' networks. Nature. 1998, 393: 440-442. 10.1038/30918.
The authors thank T. Masuda, Y. Fukuoka, T. Kaminuma, K. Mogushi, S. Nagaie, and S. Nakagawa for their useful comments and discussion. This study was supported by the Ministry of Education, Culture, Sports, Science and Technology, Japan, grant 20770192 to YN.
TH, YN, and HT designed the study; TH analyzed data and performed simulation studies; TH and YN wrote the paper. All authors read and approved the final manuscript.
Electronic supplementary material
Additional file 1: Figure S1: Degree distribution in the yeast and human PINs. ( A ) Degree distribution P(k) in the yeast PIN for four different datasets. A dashed line is the same as Figure 1. ( B ) Degree distribution P(k) in the human PIN for two datasets. A dashed line is the same as Figure 1. (TIFF 128 KB)
Additional file 2: Figure S2: < K nn ( k )> in the yeast and human PINs. ( A ) <Knn(k)> in the yeast PIN for four different datasets. Dashed lines in black, blue, green, and red represent k-0.47, k-0.33, k-0.33, and k-0.25, respectively. ( B ) <Knn(k)> in the human PIN for two datasets. Dashed lines in black and red represent k-0.26 and k-0.27, respectively. (TIFF 122 KB)
Additional file 3: Figure S3: Distribution of < K nn ( k )> in the PINs and the networks generated by the NHD and DDD models. Distribution of <Knn(k)> in the PIN (black square) and the networks by DDD+A (red diamond), DDD+S (blue triangle), NHD+A (green cross), and NHD+S (purple plus) for ( A ) yeast, ( B ) worm, ( C ) fly, ( D ) human, and ( E ) malaria parasite. The results for the NHD and DDD models were obtained by taking the mean among 100 networks generated by simulations. A dashed line represents a regression line. The slope (ν) of each regression line is shown in Table 2. (TIFF 1 MB)
Additional file 4: Figure S4: Distribution of < K nn ( k )> in the networks generated by simulations with link gains for ( A ) the DDD+A and ( B ) DDD+S models. ε is the probability of a link gain (see Methods). The results were obtained by taking the mean among 100 networks generated by simulations. A dashed line represents a regression line (ν = 0.51 and 0.48 for the asymmetric and symmetric divergence, respectively). (TIFF 509 KB)
Additional file 5: Figure S5: Gene duplicability dependent on degree in the yeast and human PINs. Duplicability of genes in the yeast and human PINs for ( A ) Batada et al., ( B ) Reguly et al., ( C ) Yu et al., and ( D ) Stelzl et al. (TIFF 614 KB)
Additional file 6: Figure S6: Relationships between mean degrees and mean duplicabilities for different functional categories in ( A ) yeast and ( B ) malaria parasite. A dot indicates each functional category, and its size represents the number of proteins in the category. A dashed line indicates a regression line. (TIFF 240 KB)
Additional file 7: Figure S7: Evolutionary trend toward higher disassortativity in the networks generated by the DDD model. Fernández  categorized yeast proteins into five classes: proteins that are present in all organisms (3.5% of the yeast proteome), in eubacteria (9.5%), in archaebacteria but not in eubacteria (8%), in eukaryotes diverging earlier than fungi (19%), in other fungi (36%), and exclusively in yeast (24%). By using these fractions, we calculated the numbers of nodes contained in ancient networks as 136, 505, 1,556, and 3,268. We generated networks by the DDD model (asymmetric divergence) with σ = -0.05, α = 0.50, and β = 0.019, which were used for regenerating the yeast PIN (see Table 1). For each ancient network, we calculated the mean value of ν from 100 simulation-generated networks. (TIFF 48 KB)
Additional file 8: Figure S8: Disassortative structure in the yeast PIN with and without protein complex data. Distribution of <Knn(k)> in the yeast PIN with (black square) and without protein complex data (red triangle). (TIFF 127 KB)
Additional file 9: Figure S9: Disassortative structures of the yeast sub-PINs constructed from proteins in different subcellular localizations. ν = 0.40, 0.48, 0.29, 0.17, and 0.10 for cytoplasm, cell periphery, punctate composite, nucleolus, and nucleus, respectively. The subcellular localization data were downloaded from http://www.umich.edu/~zhanglab/download/Wang_PLoSCB_Suppl/description.htm. Subcellular localizations containing >100 proteins and >30 interactions were shown. (TIFF 141 KB)
About this article
Cite this article
Hase, T., Niimura, Y. & Tanaka, H. Difference in gene duplicability may explain the difference in overall structure of protein-protein interaction networks among eukaryotes. BMC Evol Biol 10, 358 (2010). https://doi.org/10.1186/1471-2148-10-358
- Malaria Parasite
- Ortholog Group
- Symmetric Divergence
- Duplicate Node
- Asymmetric Divergence