Hymenoptera results
The species tree based on concatenated gene regions is discussed in [26] and is presented in Fig. 2. We calculated ICA scores on the species tree given the set of homolog trees. To explore the impact of missing taxa on the ICA measurements, we examined simulated data with missing taxa (Additional file 1: Figure S1). These results suggested that the ICA is generally conservative when data are missing in gene trees with increased uncertainty and noise as missing data increased. For the Hymenopteran results, ICA values ranged from 0.03 to 0.81 (Fig. 3). ICA values along the backbone were lower, ranging from 0.03 to 0.06, while ICA values in many of the nested clades were higher and ranged from 0.08 to 0.81. The highest values were found within Apoidea, with the clade uniting Apis and Sceliphron having the highest value (0.81). The original analyses of [5] recovered support values between 56 % and 100 % using the species tree methods PhyloNet [58] and STAR [59]; analyses by [26] recovered similar values for jackknife support. The ICA values calculated here are notably lower, indicating a great deal of underlying gene tree conflict.
For mapping the statistics presented below, we used the species tree and the 5,863 homolog group dataset. The numbers of bipartitions were 90,354 (no bootstrap filter), 65,758 (bootstrap filter = 20), 38,625 (bootstrap filter = 50), and 19,891 (bootstrap filter = 80). While these can be mapped to any topology, we calculated the concordance and conflict of the bipartition sets against the species tree topology under a bootstrap filter of 50 % (see Fig. 2).
The number of homolog groups concordant with each clade in the species tree varied significantly (see Fig. 2). Specifically, nodes 2, 7-9, and 11-13 each had more than 2,000 concordant homologs and as many as 4,295. The remaining nodes had fewer concordant homologs, ranging from 151 to 744. While no node had an alternative bipartition with higher numbers of concordant homologs compared to the bipartition in the species tree, nodes 3 and 4 both had alternative bipartitions with high numbers of supporting homologs relative to the supporting homologs in the species tree. The major alternative topology for node 3 included a clade with Vespidae wasps and Argochrysis but not ants, with 123 homologs supporting the alternative and 151 supporting the species tree resolution. Node 4 had 147 homologs supporting an alternative clade excluding ants and including wasps as compared to 246 homologs supporting the species tree resolution. These were contrasted with nodes such as node 7 supporting the monophyly of ants and 13 uniting Apis and Bombus with very little conflict as compared to the number of homologs supporting the species tree resolution.
The distribution of alternative topologies supported by conflicting homologs is presented in Additional file 2: Figure S5 with three cases presented in Fig. 2. Gene trees generated from coalescent simulations were plotted to compare distributions. The proportion of the total homologs that support each conflicting alternative resolution are sorted from largest to smallest with the grey lines representing distributions based on coalescent simulations. Distributions of conflicting homologs for nodes 2, 7, 8, 10, 11, 12, and 13 fell within the coalescent simulations while 5, 9, and 14-16 fell just outside of the coalescent distributions. Nodes 1, 3, 4, and 6 fell far outside and/or had different shapes to the distribution than the coalescent gene tree simulations. Concordant homologs had higher average bootstraps for every node and higher mean proportions of informative clades than discordant homologs (Additional file 3: Figure S2 and Additional file 4: Figure S3.
Homologs at nodes 3-6, 10-12, and 14-16 that were concordant with the species tree had average rates that were higher than homologs in conflict with the species tree at those nodes (Additional file 5: Figure S4), whereas concordant homologs at nodes 1, 8-9, and 13 had rates that were lower than those in conflict.
Using a bootstrap filter of 50 %, we detected 175 total gene duplications across 133 total homologs. Of these, 113 duplications representing 81 homologs could be mapped to clades in the concatenated species tree (Fig. 3). The edge with the most gene duplications subtended the ant clade (node 7). There were also a number of duplications found in the bees and Sphecidae wasps (nodes 10-13), and duplications were also found toward the root of the tree.
The distribution of GO terms for genes that were concordant or conflicting with each clade in the species tree topology did not differ. All distributions of GO terms are presented in Additional file 6: Figure S6.
Caryophyllales results
The species tree based on concatenated gene regions was discussed in [42] and is presented in Fig. 4. The bootstrap support was between 88 % and 100 % across the tree, but we found a large variation in ICA values, ranging from 0.08 to 0.97 (Fig. 5).
For example, the placement of Sarcobatus had 89 % bootstrap support but a 0.13 ICA. Values along the backbone ranged from 0.62 for the node separating Microteaceae from remaining core Caryophyllales to 0.12, 0.08, and 0.10 among other backbone nodes. Within major clades, values varied greatly. For example, in Amaranthaceae values were as high as 0.97 and as low as 0.10.
We used the species tree described above and the 4,550 homolog groups that contained at least 60 taxa to calculate the bipartition information (Fig. 4). The total number of bipartitions was as follows: 336,018 (no bootstrap filter), 287,971 (bootstrap filter = 20 %), 205,498 (bootstrap filter = 50 %), and 124,020 (bootstrap filter = 80 %). As with Hymenoptera, we calculated the concordance and conflict of the bipartition sets to the species tree topology using a bootstrap filter of 50 % (Fig. 4).
The number of concordant and conflicting gene regions varied greatly across the species tree. After the split from Microtea, the number of supporting homologs for the three backbone nodes of core Caryophyllales ranged from 502-817 and the number of conflicting homologs for the same nodes ranged from 657-992. These three backbone nodes, along with the split between Phytolaccaceae and Nyctaginaceae and the split between Molluginaceae and Portulacaceae+Cactaceae+Talinaceae+Basellaceae, had the lowest numbers of total informative homologs (i.e., concordant+conflicting homologs). The highest numbers of informative homologs were found nested within Amaranthaceae, Portulacaceae, Aizoaceae, Phytolaccaceae, and Nyctaginaceae. The distribution of genes concordant with alternative topologies is presented in Additional file 7: Figure S10, with specific distributions highlighted in Fig. 4. The proportion of the total homologs that support each conflicting alternative resolution are sorted from largest to smallest with the grey lines representing distributions based on coalescent simulations. With the exception of node 20, i.e., the placement of Sarcobatus, no alternative topology had a higher number of concordant genes than the bipartition found in the species tree from concatenated analyses. The alternative placement of Sarcobatus is supported by 340 homologs and places Phytolacca species sister to the Nyctaginaceae. Distributions of conflicting homologs for all nodes except 1,2,4-6,8,20,25-27,37,38,43,51,55,59,61-65 fell within or near the coalescent simulations (Additional file 7: Figure S10).
The average bootstrap values for each homolog concordant with the species tree were higher than conflicting homologs except for nodes 19-21, 26, and 50. The proportion of informative clades for gene trees of each homolog was higher for homologs concordant with the species tree for every node except nodes 26 and 50. The average rate of each homolog was higher for concordant homologs for nodes 4-8, 10, 30, 38, 59, and 65, and lower for concordant homologs for nodes 9, 15, 21, 23, 28, 31, 40, 51, 53, and 56. For details on each of these results, see Additional file 8: Figures S7, Additional file 9: Figure S8 and Additional file 10: Figure S9.
A much higher number of gene duplications (including repeated duplications) was detected in the Caryophyllales than in the Hymenoptera. Using a bootstrap filter of 50 %, we found 2,390 duplications across 1,532 homologs, resulting in an average of 0.5 duplications per homolog tree. Of these, 2,359 duplications representing 1,515 homologs could be mapped to clades in the concatenated species tree (Fig. 5). The most gene duplications were found within the Nyctaginaceae and Amaranthaceae. There were also high numbers of duplications found at other clades within the Sarcobatus+Phytolaccaceae+Nyctaginaceae clade, the base of Portulacaceae and at the split between Microteaceae and the core Caryophyllales.
As with Hymenoptera, the distribution of GO terms for genes that were concordant or conflicting with each clade in the species tree topology did not differ substantially. All distributions of GO terms are presented in the Additional file 11: Figure S11