Analysis of gene co-expression networks derived from transcriptomes of different developmental stages of drosophila melanogaster
Transcriptomes from early developmental stages and diverse adult tissues in Drosophila melanogaster studied in [26] were used for identifying new transcripts and calculating the expression values for genes (see Materials and Methods, Additional file 1). Genes involved in the same biological process tend to have expression levels that are highly correlated across samples. Identification of groups of co-regulated genes, or “modules”, should be informative for identifying specific features of developmental stages/tissues/cells. We implemented a weighted gene co-expression network analysis (WGCNA) [27] to identify modules of co-expressed genes during different stages of development in Drosophila melanogaster. This approach has been successfully employed to deduce regulatory networks in the developing human brain [28],[29]. Here, a total of 46 coexpression modules (labeled numerically, e.g., M29, and by color, e.g., black, see Additional file 2) were obtained for Drosophila development. Many of the modules were developmental stage-specific (Additional file 3), for example, M2 is correlated, with high significance, with the 2-4 hr embryonic developmental state (P = 5e-42), and M39 is strongly correlated, with high significance, with adult females 30 days after eclosion (P = 2e-10).
The Bristle Screen online database [30] was used to search for essential genes (here, a gene was deemed as essential for survival if it was completely lethal with constitutive RNAi). This database contained 20,262 transgenic RNAi lines that are predicted to target 11,619 of the 14,139 protein-coding genes (82.2%) in release 5.7 of the Drosophila genome. Lethality described in this database included several categories: completely lethal (no adults of the desired genotype, 1803 lines); some lethality (adult “escapers” had no obvious phenotype, 506 of lines); and lines that showed some phenotype with the accompanying lethality (1,230 lines, 1,069 genes). Of the 15,379 Drosophila reference genes present in our modules, 10,753 were included in the 18,663 lines with knock-down data in the Bristle Screen database, with 1,237 genes (in 1,655 lines) showing completely lethality (~8.868%). The analysis was then expanded to include all of the genes (i.e., not just reference Drosophila genes) represented in our modules. A total of 11,632 genes, represented by 20,262 RNAi lines, from the Bristle Screen database, were represented in our modules, with 1,339 of the genes (from 1,803 lines) showing complete lethality (~8.898%). From this analysis, three modules, M17, M6 and M28, were identified as being significantly enriched with lethal genes (P = 1.84E-22, P = 3.61E-62 and P = 1.03E-22, respectively, by the χ 2 test, after FDR correction) (Figure 1A). Expression of genes in module M17 are significantly associated with the white prepupae (WPP) stages (P = 6e − 04, Figure 1B, Additional file 4), genes in module M6 are significantly correlated with the 0-2 hr embryo stage (P = 4e − 04, Figure 1C, Additional file 5) and genes in module M28 are significantly associated with the 10-12 hr embryo development stage (P = 0.01, Figure 1D, Additional file 6), revealing essential roles for these highly expressed genes in these developmental stages. The 0-2 hr embryo stage, i.e. the specific stage of module M6, is the maternal to zygotic transition period in Drosophila [31]. Consistent with this period in development, gene ontology analysis revealed that genes in module M6 are enriched in biological categories associated with chromatin organization (P = 2.52e-37, after FDR correction), transcription (P = 7.40E-32, after FDR correction), cell cycle (P = 1.07E-29, after FDR correction), development (P = 2.88e-04, after FDR correction) and morphogenesis (P = 8.79e-04, after FDR correction) (Additional file 7). These data suggest an essential function for these genes in the maternal to zygotic transition period.
Developmental stage-specific enrichment of new genes revealed by gene co-expression networks in Drosophila melanogaster
We identified a total of 221 new genes including newly duplicated and young orphan genes (detailed in Material and Methods), specific to Drosophila melanogaster. We then identified which modules, and developmental stages, were enriched with new genes in Drosophila melanogaster. Significant enrichment in new genes was found in 6 modules, M41, M24, M37, M13, M36 and M23 (P = 7.34E-14, P = 1.56E-11, P = 3.52E-09, P = 5.56E-08, P = 8.18E-07, and P = 2.39E-04, respectively, by the χ 2 test, after FDR correction) (Additional file 8), where these modules are significantly correlated with WPP + 2 days Fat (P = 9e-37), stage L1 larvae (P = 1e-06, Figure 2B, Additional file 9), dark blue gut PS(1-2) in stage L3 larvae (P = 3e-15, Figure 2C, Additional file 10), 4 day post-eclosion testes in adult males (P = 3e-15), 12 hr post-molt stage L3 larvae (P = 4e-27) and 20-22 hr embryo stages (P = 1e-05), respectively (Additional file 3). Gene expression analysis of the new genes enriched in these six modules also showed consistent specific high levels of expression in their associated developmental stages, supporting potential roles of these new genes in these tissues/developmental stages (Additional file 11). PCA analysis (Principal Component Analysis) also found an inseparable relationship of the new genes with other genes in these modules, supporting the potential gene co-expression network (Additional file 12).
When we considered young duplicated genes, we found that young duplicated genes are significantly enriched in 7 modules, M37, M13, M24, M36, M23, M12 and M26 (P = 2.91E-06, P = 7.70E-06, P = 5.17E-05, P = 3.04E-04, P = 0.0013, P = 0.030 and P = 0.031, respectively, by the χ 2 test, after FDR correction) (Figure 2A) (Additional file 13), which are significantly correlated with the dark blue gut PS(1-2) in stage L3 larvae (P = 3e-15, Figure 2C, Additional file 10), 4 day post-eclosion testes in adult males (P = 3e-15), stage L1 larvae (P = 1e-06, Figure 2B, Additional file 9), 12 hr post-molt stage L3 larvae (P = 4e-27), 20-22 hr embryos (P = 1e-05), 4 days post-eclosion accessory glands in adult males (P = 4E-32), and 18-20 hr embryo stages (P = 2e-17), respectively (Additional file 3). These observations support the previously reported testis-bias in the expression of some new genes, and that the function of these new genes in the evolution of phenotypes is determined by their expression in specific developmental stages. However, since as more tissues (59 development stages and tissues) and unique highly expressed genes were taken into account, only 40 genes among the 137 young genes are enriched in the M13 module correlated significantly with the testes. In addition, the significant clustering of the young genes with different co-expression modules that correlate with different tissues/developmental stages also suggests that the young genes might operate in many different functional classes, and not only in the testis. For example, gene ontology analysis of these 5 modules with young genes revealed that genes in the module M23 are enriched in functions that are integral component of membranes (intrinsic to membrane) (P = 0.0038, after FDR correction) and that genes in the module M24 are enriched in the aromatic amino acid family metabolic process (P = 0.035, after FDR correction) and the oxidation reduction process (P = 2.24E-07, after FDR correction), revealing potential functions for these new genes in these processes (Additional file 14).
Our searches identified 84 young orphan genes that show no homology with proteins in other species (and were not duplicated in the D. melanogaster lineage). 83 orphan genes were significantly enriched in 6 modules, M13, M37, M24, M36, M41 and M40 (P = 0.021, P = 0.0077, P = 3.02E-07, P = 0.0096, P = 2.88E-22 and P = 0.0052, respectively, by the χ 2 test, after FDR correction) (Additional file 15, Figure 2A). Among these six modules, M13 is correlated with 4 day post-eclosion testes in adult males (P = 3e-15), M37 is correlated with dark blue gut PS(1-2) in stage L3 larvae (P = 3e-15, Figure 2C, Additional file 10), M24 is correlated with stage L1 larvae (P = 1e-06, Figure 2B, Additional file 9), M36 was correlated with 12 hr post-molt stage L3 larvae (P = 4e-27), M41 is corrected with WPP + 2 days Fat (P = 9e-37), and M40 is corrected with Mixed Adult Male and Female Carcass 4 days Post-eclosion (P = 2e-06) (Additional file 3), suggesting that these new genes have roles in these stages of development.
The young genes were then queried against the Bristle Screen online database [30] where we found that among the 111 young duplicated genes (from 181 lines) having knock-down data, 4 genes (from 6 lines) were completely lethal (Additional file 13). The proportion of young genes showing lethality (3.3%) was slightly lower than the genome wide value (8.9%, http://bristlescreen.imba.oeaw.ac.at/data_summary.php). Among the orphan genes, 52 (from 84 lines) have knock-down data in the Bristle Screen database, with only 1 gene (from 1 line) being completely lethal (Additional file 15). Our results, of a low proportion of lethality in young genes, contrasts with a previous conclusion that young genes have a similar proportion of lethality as old genes [21], a result that might have been attributable to the off target effects of RNAi on duplicated genes, producing false positives for the essentiality of the young duplicated genes. More accurate experimental methods for knocking down/out genes in Drosophila are necessary to evaluate the essentiality of young genes.
Roles for specific lncRNAs in development revealed from the gene co-expression network analysis
We assembled 18,385-26,178 new transcripts from the transcriptome data from 7 Drosophila species (Additional file 16). In Drosophila melanogaster, 145 of the newly assembled transcripts, from 133 genes, were identified as linage-specific lncRNAs that have no homology with transcripts in any other species. Among these new lncRNA genes, 129 were found to be significantly associated with 25 modules, with 6 of the modules, M14, M20, M42, M18, M3, and M46, significantly enriched with Drosophila melanogaster specific lncRNAs (P = 6.10E-31, P = 4.57E-04, P = 8.15E-12, P = 8.75E-04, P = 3.25E-07 and P = 3.82E-06, respectively, by the χ 2 test, after FDR correction) (Figure 3A, Additional file 17). Of these six modules, M14 is correlated with the adult male 30 days after eclosion stage (Figure 3D, Additional file 18), M20 was correlated with the pupae 12 hr after WPP (P = 1e-06, Figure 3C, s Additional file 19), M18 is correlated with pupae 2 days after WPP (P = 3e-24), M42 is corrected with stage L3 CNS (P = 9e-20), M3 is corrected with stage L3 Imaginal Discs (P = 3e-28), and M46 is corrected with heads of mated adult males 20 days post-eclosion (P = 8e-04, Figure 3B, Additional files 3 and 20). We then used the genes in each of the modules where lncRNAs are enriched to conduct a gene ontology analysis, which may suggest potential functions for the lncRNAs. For example, genes in module M14 are enriched in odorant binding (P = 2.98e-24, after FDR correction), sensory perception of smell (P = 1.03E-17, after FDR correction), olfactory receptor activity (P = 7.73E-15, after FDR correction) and neurological system process (P = 3.57E-06, after FDR correction), suggesting a potential function of some of the lncRNAs in regulating expression of sensory perception genes. In addition, genes in module M20 are enriched in the defense response (P = 0.047, after FDR correction), genes in module M18 are enriched in proteolysis (P = 0.035, after FDR correction), and genes in module M46 are enriched in photo transduction (P = 1.06e-23, after FDR correction) (Additional file 21), suggesting that lncRNAs may have regulatory roles in these processes. Although the functions of the lncRNAs remain unclear, co-expression data provides information that could be examined in future studies.
Essentiality of positively selected genes in Drosophila
9,746 one-to-one orthologs among Drosophila melanogaster, D. annanassae, D. sechellia, D. yakuba and D. erecta were retrieved from the EnsemblMetazoa database (http://metazoa.ensembl.org). After alignment and trimming, 9,531 orthologs remained for use in the detection of positive selection employing the branch site model of PAML [32]. A total of 263 and 383 Positive Selected Genes (PSGs) were identified in the Drosophila melanogaster and D. sechellia lineages, respectively. Of the one-to-one orthologs, 7,939 (from 13,784 lines) have knock-down data in the Bristle Screen database, with 929 (~11%) genes (from 1,291 lines) being completely lethal. Among the 263 PSGs, 216 genes (from 413 lines) with knock-down data in Drosophila melanogaster, with 36 of these genes (from 54 lines) being completely lethal (~13.08%) (Additional file 22). In Drosophila sechellia, among the 383 PSGs, 1 gene did not have detectable expression in the Drosophila melanogaster transcriptome datasets and 1 gene was placed in the non-sense module. The remaining 381 genes were distributed into 38 modules, with 40 (from 49 lines) of the 304 PSGs (from 553 lines) being completely lethal (Additional file 23).
Excluding one gene that had no detectable expression in the transcriptome datasets, 262 PSGs in Drosophila melanogaster were distributed into 36 modules and were significantly enriched in M6 (P = 0.002196, by the χ 2 test, after FDR correction) (Additional file 22). Module M6 is significantly correlated with the 0-2 hr embryo stage (P = 4e-04), and this module is also significantly enriched with lethal genes (Figure 1C, Additional file 5). Enrichment of lethal genes also points to the importance of module M6. Among the 36 PSGs with completely lethality in Drosophila melanogaster, 18 are distributed to module M6. These data demonstrate that PSGs have essential functions, although they show rapid evolution with amino acid sequence changes driven by positive selection.