In this section we first review the implications of our findings for the analysis of molecular data on deep-level lepidopteran relationships. We then review the bearing of our results on current understanding of those relationships themselves.
Heuristic search efficiency and computational effort
In preliminary analyses prior to those described here we had repeated each GARLI run 20 times (20), a typical number in applications of this program so far [14, 29, 30]. Surprising discrepancies between the initial tree estimates for different character sets prompted us to wonder if these were really the best available estimates. The ensuing, much more extensive tree searches reported here revealed, first, better trees for all data sets. Moreover, for all data sets except nt3, the same best topology was found many times (114-925, or 1.1-9.3%) in 10,000 searches, making it plausible, though not provable, that we actually found the global ML topology. (For nt3, for which the best topology appeared only twice, we are less likely to have found the global ML tree.) Also revealed by these searches were (to us) surprisingly large sets of near-best trees, with only slightly lower likelihood scores, yet including topologies strikingly different from the best tree, underscoring the limited resolving power of our data.
We tentatively conclude from these results that GARLI analyses of data sets the size of ours should routinely include not just tens of searches but hundreds, if one wants to be confident of having the best feasibly obtainable tree. But, given that the differences between our initial trees and the better ones found subsequently are very weakly supported by any measure (see below), is the improvement worth the large additional computational effort? The answer is often likely to be yes, for at least two reasons. First, some types of hypothesis tests for which tree estimates are used do not explicitly take topological uncertainty directly into account, thereby placing a premium on the accuracy of the input tree. For example, in our study, the improved tree estimates generally raised the P values of the significance tests for non-monophyly of previously-proposed groups, as compared to those based on the initial trees found using just 20 GARLI searches (data not shown). In 16 total comparisons, the new P value was equal to the initial one in two, moderately to substantially larger in ten, and smaller (slightly) in just four. These differences were caused specifically by change in the unconstrained tree estimates.
A second reason to value even small, hard-won improvements in likelihood score is that some problems, including ditrysian phylogeny, may be difficult enough to resist strong resolution by any one data source for a long time to come. In the interim, the most convincing means of favoring one hypothesis over another may be congruence among multiple data sets, each providing only weak support by itself. Thus, credibility is lent to the phylogeny estimates presented here, despite their low support levels, by the fact that very similar relationships among lepidopteran superfamilies are emerging from an independent molecular study using a different but comparable gene sample, and a larger taxon sample (M. Mutanen, L. Kaila, N. Wahlberg, personal communication).
Support levels and possible reasons for low bootstrap support at deeper levels
The overall pattern of bootstrap support in our ML analyses is that families and divergences within them are generally strongly supported; superfamilies and divergences within them are only sometimes strongly supported; and relationships among superfamilies almost always have very weak support, with bootstraps often < 20% (see Additional files 4 and 7).
Why is support along the "backbone" so low? There are several possibilities. First, given the extensive search needed to find the best feasibly-obtainable ML trees, low ML bootstrap support at deeper levels might be thought to reflect insufficient effort - a single GARLI search - on each pseudo-replicate. Search effort surely has some effect on bootstrap efforts, but we doubt that it is the main explanation. The literature on per-replicate search effort required for accurate bootstrap percentage estimation [55, 56], while limited thus far to parsimony analyses, suggests that a plateau in mean BP is quickly reached as one increases heuristic search effort from simplified fast methods to somewhat more elaborate methods (e.g. limited branch swapping) to full standard search methods (e.g., those incorporating extensive branch swapping). It further suggests that the plausible prediction of increased BP with more thorough searches is realized mainly for BP values which were low to begin with. Preliminary experiments included in our initial analyses (data not shown) point in the same direction: very low initial BP values sometimes increased substantially (up to 20-30% in absolute value) with increased search effort per pseudo-replicate, but never reached moderate or strong levels (≥ 70%); initially high BP values (≥ 80%) changed little with increased pseudo-replicate search effort.
Strong conflict among genes is a second possible explanation for low BP values at deeper levels, but in our data set, the rare instances of such conflict are restricted to within-family relationships (see Additional file 3). Support might have been higher had our ML analysis modeled synonymous and non-synonymous character sets separately, but the near-identical topologies produced by our unpartitioned ML and partitioned Bayesian analyses for all nucleotides suggests that the effect would not be dramatic. The most plausible explanation for low support is that the branches along the "backbone" are very short, as evidenced in the phylogram of Fig. 3, in contrast to the very long branches subtending some superfamilies (e.g., Tortricoidea, Lasiocampoidea, Hesperioidea) and/or subgroups therein. Short internodes along the "backbone", which may reflect rapid radiation, suggest that very large amounts of sequence, as well as more accurate modeling of character change, will be needed to firmly resolve these nodes.
In contrast to low support under ML, posterior probabilities along the "backbone" in the Bayesian analyses were very high, mostly 1.0. The sole purpose of our Bayesian analysis was to examine the effect of partitioned analysis on tree topology. The interpretation of the associated posterior probabilities is problematic, due to their often-reported tendency toward "overcredibility" (e.g. [41, 57, 58]). Lewis et al.  attribute "overcredibility" to the failure of the tree-proposal step in current Bayesian phylogenetic algorithms to allow the possibility of polytomies. In the absence of true signal, this restriction can artificially confer very high posterior probabilities on arbitrary resolutions. This explanation seems quite consistent with our findings.
Differing properties among character sets and their implications
Several potential benefits motivate our focus on separating and independently analyzing sites undergoing synonymous versus non-synonymous change, as exemplified by our distinction between the character sets "noLRall2 + nt2" and "LRall2 + nt3". These categories can be defined in all protein-coding sequences, and the substitutions they undergo are known to follow markedly different rules. To some extent, then, they provide independent lines of phylogenetic evidence, thereby boosting our confidence in groupings that they separately recover. Separate analysis also allows us to discover the evolutionary properties peculiar to each, and to account for these when considering the two character sets together. Thus, it is reassuring that although there are numerous differences in detail involving nodes with little support, particularly at deeper levels, trees based on noLR + nt2, nt12 and nt123, which though not fully independent span a gradient from entirely non-synonymous to predominantly synonymous evolution, are quite similar overall. They are generally concordant for nodes with modest to strong BP support. Even moderately strong conflict - reciprocal BP ≥ 70% support for incompatible groupings - is essentially absent.
Analysis of nt3 alone, however, complicates the picture. Despite contributing about 90% of the total evolutionary change for nt123, nt3 by itself provides relatively weak resolution, and fails to recover many well-established nodes. Yet when added to nt1 + 2, it can often greatly increase support for those nodes (e.g. Geometridae, Pyraloidea, Yponomeutidae), particularly at shallower levels, providing dramatic examples of "hidden support" . On the other hand, adding nt3 sometimes markedly decreases support for deeper nodes, e.g. Gelechioidea, "core" Zygaenoidea, Yponomeutoidea + Gracillarioidea. Nt3 seems to contain a complex mixture of true phylogenetic signal and conflicting signal from non-phylogenetic sources. The latter undoubtedly stems in part from non-homogeneity of base composition. It appears that for shallower divergences the non-phylogenetic signal in nt3 is relatively easily overcome by the addition of non-synonymous signal. For deeper divergences, in contrast, it appears that either true phylogenetic signal at nt3 is weakened by saturation, or non-phylogenetic signal becomes relatively stronger, or both, leading typically to less resolving power. And yet, nt3 does carry some true signal for deeper divergences, possibly because it undergoes at least some non-synonymous change. For example, it is probably not coincidence that only with nt3 included do we completely recover, albeit with weak support, both Zygaenoidea and Sesioidea + Cossoidea + Zygaenoidea. Our analyses of nt3 alone provide one of the first examples of an influence of compositional heterogeneity on estimated phylogeny at relatively low taxonomic levels; previous demonstrations have mostly involved much deeper divergences (but see Gruber et al. ). One might take comfort in the disappearance of the obvious effects of composition on topology when nt3 is combined with other character sets. Compositional heterogeneity remains a likely contributor, however, to the instability and lower support that inclusion of nt3 brings to some deeper-level groupings. The problem cannot be easily dismissed.
Given that different character sets can differentially support, and/or obscure, each individual node, treating all character sets as belonging to a single population of characters (as in our nt123 ML analysis) is clearly not the most effective way of extracting phylogenetic information from the data set. Ideally, one would analyze all character sets simultaneously, using a model that fully accounted for the differences in evolutionary behavior among them. The widely-available methods for partitioned analyses, however, do not yet include correction for heterogeneity of nucleotide composition, a key point of difference between mainly synonymous and mainly non-synonymous character sets. In the review below, we therefore adopt an interim strategy for assessing progress on ditrysian phylogeny: a group is considered to be supported by the data set as a whole to the degree that it is (a) strongly supported by one or more character sets, and (b) at most weakly contradicted by others. At present, no single analysis can tell the whole story.
Current understanding of ditrysian phylogeny
In this section we ask, how much progress did this exploratory study yield toward a robust phylogeny estimate for Ditrysia, and toward testing the working hypotheses compiled by Kristensen ?
The near-total lack of strong support for nodes subtending multiple superfamilies, with especially low bootstraps along the "backbone", is sobering. We expected more from 6.7 kb of sequence data chosen specifically for their suitability for addressing this problem. It appears that robust node-by-node resolution of among-superfamily ditrysian relationships will require several to many times as much sequence as analyzed here, in addition to expanded taxon sampling, particularly among the non-obtectomeran lineages. Fortunately, two independent efforts to provide such additional data are underway (see http://www.Leptree.net).
Low bootstraps for deep nodes notwithstanding, however, our current data do provide important first steps toward resolving ditrysian phylogeny. How can this be? Conventional bootstrap values can greatly underestimate the amount of structure present in a large, noisy data set, because they take into account only nodes that agree completely between pseudo-replicate trees, ignoring partial agreement on those groupings . Thus, given the taxon sample size, we think that the approximate overall concordance of our trees with the "backbone" hypothesis in Fig. 2A is unlikely to be accidental, despite the lack of bootstrap support. Our results provide some of the first quantitative phylogenetic evidence for broad subdivisions of Ditrysia resembling those of Minet , albeit with important differences. The clearest point of correspondence is that all analyses apart from nt3 yield a clade consisting of most Macrolepidoptera plus one or more non-macrolepidopteran Obtectomera, and excluding all non-obtectomerans.
This mostly macrolepidopteran clade, however, also harbors one of the strongest departures from the working hypothesis. The Pyraloidea, traditionally considered non-macrolepidopterans, invariably group with the "core" Macrolepidoptera identified here (which exclude butterflies), while the butterflies sensu Scoble, always traditionally considered macrolepidopterans, never do so. Despite weak support for individual nodes, the Approximately Unbiased test  provides statistical evidence against monophyly of the Macrolepidoptera as previously defined (Fig. 2A), significantly rejecting it for nt12 (P = 0.02) although not for the other character sets. Minet's  exclusion of Pyraloidea from Macrolepidoptera was based on their supposed lack of his synapomorphy 17, a feature of the base of the forewing. Recent unpublished observations by one of us (MAS), however, strongly suggest that this feature is in fact characteristic of pyraloids. The distribution of this trait deserves further study in other superfamilies as well.
The existence of a clade comprising "core Macrolepidoptera" plus Pyraloidea, which we predict that future work will confirm, is likely to prompt re-examination of hypotheses about the evolution of the thoracic or abdominal ultrasound-detecting "ears" that characterize the superfamilies Noctuoidea, Geometroidea, Drepanoidea and Pyraloidea. These together contain over 90% of species in the putative clade. The "ear" found in each superfamily shows a unique location and anatomy, and has been thought to represent an independent origin. Our result prompts contemplation, at least, of the possibility of fewer origins, conceivably even a single origin in the common ancestor of the proposed clade (though there is disagreement among the authors of this work regarding the plausibility of this hypothesis). This alternative hypothesis would require only a few independent losses of the "ear," in the ancestors of Bombycoidea + Lasiocampidae, Cimeliidae (= Axiidae), and Sematuridae + Epicopeiidae. The observation  that the anatomy of the "ear" and/or the location of its opening can vary between sister families (Pyralidae versus Crambidae) or between sexes within the same family (Uraniidae) suggests that transformation among widely differing types of "ear" is at least plausible.
The unexpected position of the butterflies is also reflected in the strongest result from our tests for non-monophyly, namely the decisive rejection (P < 0.005, all character sets) of the proposed clade aligning butterflies and allies with Geometroidea, Drepanoidea, Cimelioidea (= Axioidea) and Calliduloidea . In our trees, no two of these taxa consistently group with or even near each other. It seems safe to abandon this conjecture. An alternative hypothesis about phylogeny of the major macrolepidopteran groups, grouping Geometroidea with Noctuoidea and these together with Bombycoidea + Lasiocampoidea, is worthy of contemplation because it is supported, albeit weakly, in all our analyses. In contrast, placement of the several small, morphologically isolated and highly divergent macrolepidopteran superfamilies may be very difficult. The problem is illustrated by the unstable position of Mimallonidae and our inability to significantly reject their alliance with Bombycoidea and Lasiocampoidea, despite their failure to ever group near these superfamilies.
Within several large superfamilies of the "core Macrolepidoptera + Pyraloidea" clade, our results provide some of the first strong tests of hypothesized relationships (Figs. 2, 3). In Geometroidea, our findings corroborate, albeit with weak support, the grouping of Sematuridae, which lack abdominal tympanal organs, with Uraniidae and Geometridae, which possess them. A novel result is that all data sets place Epicopeiidae, included in Drepanoidea by the working hypothesis, either next to or within Geometroidea. The strongest signal comes from nt123, which resolves Epicopeiidae as sister group to Sematuridae (BP = 67%), with which they share the lack of tympanal organs. Within Geometridae, one of the largest families of Lepidoptera, we sampled all subfamilies, finding moderate to very strong support for nearly all relationships among these (Fig. 3), and strong agreement with groupings seen in other recent molecular studies of this family [21, 63].
In Noctuoidea, our findings very strongly (BP = 94-100%) corroborate previous morphological and/or molecular evidence for monophyly of: (a) the quadrifid forewing clade of families; (b) within this, the clade of "trifine" hindwing subfamilies; and (c) a clade comprising most quadrifine hindwing Noctuidae plus Arctiidae and Lymantriidae, excluding Nolidae (the "LAQ" clade of Mitchell et al. ), contra recent morphology-based hypotheses . The recently erected family Micronoctuidae  also appears to fall in the "LAQ" clade. The most surprising result is the failure of Doa (Doidae) to group with the remaining Noctuoidea, despite its possession of the very distinctive morphological synapomorphies of the superfamily, including a metathoracic tympanal organ and two MD setae on larval T3 . No position for Doa is strongly supported, however, and noctuoid monophyly is not significantly rejected by the Approximately Unbiased test. Thus, it remains possible that Doa will group with other noctuoids upon further gene and taxon sampling. The postulated sister group relationship between Doidae and Notodontidae , on the other hand, now seems very unlikely, given the strong support (BP = 83%, nt123) for a clade comprising all sampled Noctuoidea except Doa.
Pyraloidea are recovered by all data sets (except nt3), albeit with low support (BP = 65%, nt123). Though our sampling of subfamilies is incomplete, the five genes appear to offer strong resolution of relationships within this superfamily, including very strong bootstrap support (BP ≥ 99%) for monophyly of both Pyralidae and Crambidae as sampled. Divergences among all exemplars of Pyralidae, representing all five subfamilies, were strongly resolved (BP 80-100%). Relationships among subfamilies nearly match a previous morphology-based tree , requiring only a trade in position of the subfamilies Pyralinae and Phycitinae. In Crambidae, relationships of the five (of 14) subfamilies sampled were also strongly resolved, corresponding well to the morphological hypothesis of Yoshiyasu , less well to that of Solis and Maes .
In contrast to their success in the foregoing clades, our data do not strongly resolve relationships of the butterflies sensu Scoble . The three superfamilies do form a clade, but only in analyses excluding nt3, and neither the monophyly of Papilionoidea, nor any of the accepted relationships among the families thereof except the basal position of Papilionidae, is supported by any analysis. On the other hand, the Approximately Unbiased test (Table 1) does not reject monophyly for Papilionoidea (P > 0.38), and bootstrap supports are mostly low for groupings contradicting expectation, the highest being 68% for the unexpected pairings of Hedylidae with Hesperiidae and Pieridae with Nymphalidae. Thus, apart from qualitatively corroborating Scoble's grouping of Macrosoma (Hedylidae) with butterflies rather than Geometridae, we are unable to strongly confirm or refute any previous hypothesis about butterfly relationships. Our results raise the possibility, however, that butterfly relationships will undergo significant revision as more data accumulate.
Evidence both for and against predicted relationships is less strong in the lower ditrysian lineages than in Obtectomera, which were much more extensively sampled. Nonetheless, some tentative conclusions emerge. Two non-apoditrysian lineages are always nested well inside the Apoditrysia as currently defined, the monophyly of which is consequently never supported. The pairing of Yponomeutoidea and Gracillarioidea, a novel grouping so far as we are aware, is the most strongly supported among-superfamily relationship in our study (BP = 79%, nt12).
Within the lower Apoditrysia ("A-O" in Fig. 3), one of the few previous postulates of among-superfamily relationships  groups Zygaenoidea with Sesioidea + Cossoidea. Agreement between our analyses and previous hypotheses is ambiguous throughout this putative clade, suggesting signal too weak to be decisive. The proposed groupings are only sometimes monophyletic, and yet never strongly contradicted. The trio of superfamilies is fully recovered by nt123 (Fig. 2B), albeit with very weak support, and is at least somewhat coherent in the other analyses. In the nt12 tree (Fig. 2E), for example, it is basal and paraphyletic with respect to all other taxa except Tineidae. Thus, the proposed grouping Sesioidea + Cossoidea is never monophyletic, but in nt123 it is at least paraphyletic, comprising the two lineages most closely related to Zygaenoidea. Within this assemblage, however, neither Sesioidea (Sesiidae + Castniidae) nor Cossidae are monophyletic in any analysis.
Our data provide similarly qualified support for Zygaenoidea, the monophyly of which is uncertainly established by morphology . The eight families sampled (of 12) were grouped together by one data set, nt123, but with weak support. We did however find a consistently monophyletic core group of six zygaenoid families, with bootstrap support as high as 82% (nt12, Fig. 2E). Excluded from the "core Zygaenoidea" were the parasitic families Epipyropidae and Cyclotornidae. The weakly-supported sister group relationship between these families seen in nt123 (Fig. 2B) seems credible, despite the exceptionally long branches subtending both, because of their bizarre shared larval habit of ectoparasitism on Auchenorrhyncha. Within the "core Zygaenoidea" we find qualified support for the morphologically-defined "limacodid group" of families , here represented by Limacodidae, Dalceridae, Megalopygidae and Aididae. These form a clade, weakly supported, in the noLRall2 + nt2 tree, and a paraphyletic group in other analyses. Relationships within the "limacodid group" agree partially with the morphological cladogram of Epstein , in that Dalceridae are most often grouped with Limacodidae (nt123, noLRall2 + nt2), albeit with weak support. The only strongly supported node, however, unites Aididae with Megalopygidae (BP ≥ 99%), in which they were formerly included, contra Epstein's hypothesis grouping them with Dalceridae + Limacodidae. Within Zygaenidae our data moderately to strongly support the relationships reported by Niehuis et al.  for the three subfamilies sampled.
In Tortricoidea, finally, though our sampling is limited to two of three subfamilies and six of 21 tribes, the five genes appear able to offer decisive resolution, as all nodes are strongly supported. Support is 99-100% for monophyly of the two subfamilies as thus far sampled. The representative of Cochylini, in the past treated as a separate family [71, 72], was strongly placed as sister group to the two other sampled tribes of Tortricinae, consistent with the proposal of Kuznetsov and Stekolnikov [73, 74].