Trait data and character matrix
To test our hypotheses, we gathered data on decay mode and host associations for Agaricomycetes. For decay mode, we used the “decay.type” as published in Tedersoo et al. , which is available on the genus level, and we also conducted a literature search for additional genera. Tedersoo et al. (2014) investigated lifestyle-dependent global fungal diversity and therefore coded the trophic status (six states, e.g., biotroph), the lifestyle (17 states, e.g., ectomycorrhizal) and decay type (four states, e.g. white rot, brown rot) for more than 10,000 genera. We used only species with either white or brown rot in our analysis and excluded other lifestyles (e.g., mycorrhizal). This gave us the decay mode of particular species in the genera. We then extrapolated this decay mode to the remaining species of a genus (with one exception, see below). Our justification is that decay mode has often been a focus of taxonomists and thus was widely used to distinguish genera such as Antrodia and Antrodiella , Lentinus and Neolentinus-Heliocybe , and Daedalea and Daedaleopsis . We found only three genera where more than one decay mode has been reported: Clitocybula [38, 39], Hyphoderma , and Mucronella [40, 41]. Clitocybula and Mucronella were deleted from the dataset because no host data were available. For Hyphoderma we used only the two species where decay mode references were found. To estimate how this strategy might affect our interpretations, we re-sampled a single species per genus from our final dataset (hereafter “one-genus-subset”) and repeated the analyses described below 100 times.
To gather data on host associations, we used the R package “rusda”, written for this study, as an interface to the USDA Fungus-Host Distribution Database (FHDD) . The FHDD contains fungus-host combinations, but does not provide information on the occurrence frequencies on a particular host (other than the number of published records on each host). The “rusda” package makes it possible to retrieve (“query”) host data for fungal species, and vice versa. For a detailed description, basic usage and evaluation of the R package “rusda”, see Additional file 1 and for repositories refer to Availability of data and material.
To retrieve host associations from the FHDD we used the function associations, which takes an input of species names and provides an output list of fungus-host combinations. As input we used the NCBI taxonomy for fungi and re-classified the order level where necessary (Additional file 1: Table S3). We then produced a dataset of plant phyla by matching host genera to the Spermatophyta taxonomy downloaded from NCBI taxonomy using the R package “megaptera” . Thus, we retrieved the phylum information “Acrogymnosperma” and “Magnoliophyta” for each host species. We refer to “Magnoliophyta” as angiosperm (A) and “Acrogymnosperma” as gymnosperm (G) and stored the number of gymnosperm and angiosperm associations for each fungus species in a table. Species which did not belong to either Acrogymnosperma or Magnoliophyta were deleted from the dataset. We further deleted all non-woody plants based on the woodiness dataset which classified more than 35,000 plants into woody and non-woody . Thus, the final host dataset included only woody plants from Acrogymnosperma or Magnoliophyta; seedless vascular plants, bryophytes, algae, and non-plant hosts were excluded. The FHDD covers mainly temperate North America and Europe .
The host association data were used to calculate the number of angiosperm and gymnosperm host species for each fungus species. We defined the “gymnosperm association” by dividing the number of gymnosperm host tree species (NG) by the sum of the number of angiosperm (NA) and gymnosperm host tree species: gymnosperm associations [%] = NG/(NG + NA). Thus, a gymnosperm association of 100% means that a fungus is reported exclusively on gymnosperm hosts in the Fungus-Host database, whereas 0% means only angiosperm hosts are reported. We classified host preferences into three states: (1) generalism, (2) angiosperm specialization, or (3) gymnosperm specialization. Based on the distribution of gymnosperm association [%] (Fig. 4c), we defined specialization based on the gymnosperm association [%] with a threshold of ≥90% for gymnosperm specialization and a threshold of ≤10% for angiosperm specialization (hereafter “90–10 specialization”). Previous studies used exclusivity as a measure of host association , but missing or incorrect data for a single fungus observation may then lead to misclassification of a species. Nonetheless, we also inferred our final model (see Statistics and models of host specialization) using the exclusivity coding (hereafter “100–0 exclusivity”). However, in the exclusivity coding, generalists and non-exclusive specialists are coded in one state (“generalists”) and thus results might be hard to interpret.
To test dynamics of host switching, we used phylogenetic comparative methods (PCMs). For this purpose, we applied a mega-phylogeny approach using the R package “megaptera”, a pipeline for large-scale automated sequence-retrieval and alignment  (version available on https://github.com/heibl/megaptera). The mega-phylogeny approach aims at maximising taxon sampling integrating previous knowledge (e.g. taxonomic information, backbone trees) into the tree inference . For our mega-phylogeny approach, we used a backbone guide tree based on phylogenomic analyses [24, 26, 29] to provide information for deep splits (order level), as resolving such ancient divergences can be difficult due to sequence saturation . Further, mega-phylogeny approaches often lead to a high number of gaps or missing data, often more than 90% (e.g. Smith et al. ). To reduce the bias of missing data, we computed a reliability measure for each column of the alignment, which is then supplied to the tree inference program. In this way, uncertain regions in the alignment are down-weighted in the phylogeny inference step.
First, we used the R package “megaptera” to download all sequences for the species with decay mode and host association information from GenBank  (queried February 2017). We selected seven DNA regions: 18S, 28S and 5.8S rRNA (nuclear ribosomal RNA genes), genes encoding RNA polymerase b (rpb1, rpb2), translation elongation factor 1 (tef1), and ATP synthetase (atp6). We chose the rRNA regions to obtain high species numbers and the other regions for resolution of deeper nodes . Only sequences of samples identified to species level were accepted.
We used single sequences where only one sequence for a particular species and DNA region was available. If multiple sequences were available, all sequences of the same DNA region and organism (putatively conspecific sequences) were aligned and a majority rule consensus sequence was calculated. In the next step, all sequences were compared to three to six Agaricomycotina reference sequences for each DNA region as a quality check (Additional file 1: Table S4). We used the R package “megaptera” to calculate the identity (proportion of nucleotides identical) and coverage (proportion of nucleotide positions in common) with the reference. Based on the coverage and identity values, thresholds can be adjusted aiming to maximize both quality and number of taxa. The default values are 0.75 for identity and 0.5 for coverage. Based on visual inspection of the alignments, we chose identity thresholds between 0.5 and 0.75 and coverage thresholds between 0.25 and 0.5 for the seven gene regions. All sequences outside these limits were discarded.
We aligned the remaining sequences for each gene region separately, using GUIDANCE2 [48, 49] with the multiple sequence alignment program MAFFT . GUIDANCE2 computes a reliability score for each column based on alternative alignments produced by bootstrap guide trees and four co-optimal alignments based on each bootstrap alignment, created by the heads or tails algorithm . We passed the resulting column score as character weights to the phylogeny inference program RAxML (flag -a; see additional details on phylogenetic inference below) rather than filtering the alignment using the column score, which is not recommended . We used IQ-TREE version 1.5.3 with specification “-TESTMERGEONLY” [53, 54] to select a partition scheme among the gene regions. IQ-TREE found six blocks as the best partitioning scheme (merging the 5.8S rRNA and 28S rRNA into one partition; Additional file 1: Table S1). The final alignment had 37,466 sites and the proportion of gaps was 92.07% with 16,814 distinct alignment patterns.
We produced a comprehensive backbone guide tree by first assembling an order-level “genomic” based backbone tree (Additional file 1: Figure S1 A) from the literature [21, 26, 29] and then attaching all species on the order-level tips of the genomic backbone tree (Additional file 1: Figure S1 B). We performed maximum likelihood estimation, using the concatenated supermatrix of the seven DNA regions, with RAxML  on the CIPRES Science Gateway v.3.3 (RAxML -HPC2 on XSEDE 8.1.11) [56, 57] under the GTRGAMMA model with partitioning as described above, the GUIDANCE2 column score (flag –a) and the comprehensive backbone tree (flag –g). We subsequently conducted 1000 approximate Shimodaira–Hasegawa likelihood ratio tests (SH-aLRT branch support). SH-aLRT which are fast, accurate and robust even for larger phylogenies .
We estimated divergence times of the resulting phylogeny using penalized likelihood as implemented in the R function chronos from the R package “ape” . We used two calibration points, a Late Cretaceous mushroom fossil Archaeomarasmius legetti , which bears a strong resemblance to extant Agaricales (particularly Marasmiaceae), and a Middle Eocene ectomycorrhizal fossil, which has been interpreted as a representative of Boletales . We followed the strategy of Kohler et al.  and used the ectomycorrhizal fossil to calibrate Boletales with a stem age of 40–60 MYA and A. legetti to date Agaricales with a stem age range of 70–110 MYA. We also tried the approach of  and used 50 and 90–94 MYA as age priors, which yielded almost identical divergence time estimates (results not shown).
We applied chronos with three different models of substitution rate variation among branches: “relaxed”, “correlated” and “strict” and compared the model fits using ɸIC . The “correlated” model had lowest ɸIC values and thus was used for further analysis. We are aware that penalized likelihood does not make use of the sequence data and does not incorporate phylogenetic uncertainty. However, algorithms that perform joint inferences of the tree and divergence times currently do not implement an option for character weights, e.g. BEAST  or character weights and guide tree, e.g. ExaBayes .
To account for phylogenetic uncertainty at nodes with low support values, we produced alternative trees based on the maximum likelihood phylogeny (Additional file 1: Figure S2). We created hard polytomies on nodes with SH-like support values < 80 based on the non-ultrametric ML tree (Additional file 1: Figure S3). We then used the function multi2di from the R package “ape”  and resolved the polytomies randomly and used chronos (as described above) to estimate divergence times. We repeated this 100 times and summarized the dated trees using TreeAnnotator  to calculate a maximum clade credibility tree (MCCT) with the node option “Common ancestor heights” (because the nodes did not share the same ancestors since polytomies were created at random). We displayed confidence intervals of the divergence time estimates as HPD (highest posterior density) for the brown rot clades and the root. Furthermore, we use the 100 ultrametric trees as input for the transition rates estimation to measure robustness of the results against phylogenetic uncertainty.
Statistics and models of host specialization
We first tested preferences of host species among extant fungi of the two decay modes using a phylogenetic linear model in the R package “phylolm” . We tested whether the number of host species (host range) differed between decay modes as a binary predictor variable. As an evolutionary model for the residual variance-covariance matrix we used the lambda model . The number of host tree species was log10-transformed.
We modelled dynamics and pattern of host specialization evolution in white and brown rot lineages using multistate likelihood-based models. We used the function rayDISC from the R package “corHMM” , which implements a multi-state version of a continuous-time Markov model, where the Markov process is characterized by a Q-matrix. The Q matrix specifies the transition rates between the character states and hence the model of discrete character evolution. All models were based on our six-state character coding and the transition rate matrix was a 6 × 6 matrix: (1) white rot/angiosperm specialist, (2) brown rot/angiosperm specialist, (3) white rot/gymnosperm specialist, (4) brown rot/ gymnosperm specialist, (5) white rot/generalist, and (6) brown rot/generalist.
The first model allows for all transitions to occur in single steps, e.g. an angiosperm specialist can switch directly to a gymnosperm specialist without first passing through a generalist state. Further, in this model transitions between white rot and brown rot are allowed in both directions. All models allow white rot to brown rot transitions. We call this the “Uncorrelated” model, because switches between the states are not conditioned on previous states. This model may not be biologically realistic. Transitions from an angiosperm specialist to a gymnosperm specialist may require a transition first through a generalist, before passing to a gymnosperm specialist, and thus could require two “steps”. Thus, we coded further models implementing correlated (dependent) character evolution. In the second model, we prohibited transitions leading directly from one specialist to another by setting the direct transition parameters to zero. We call this the “Correlated hosts” model. Both the “Uncorrelated” and the “Correlated hosts” model allow for brown rot to white rot reversals. However, brown rot evolution is correlated with complete losses of genes encoding ligninolytic class II peroxidases (AA2) and reductions in other decay enzymes, making reversals to white rot unlikely . Accordingly, we constructed a third model where we further disallowed transitions from brown rot states to white rot states. We call this the “Correlated hosts – norev” model. For the coding of the Q matrices, see Additional file 1: Figure S4.
We fitted the three models with equal rates (ER) and all rates different (ARD) and compared the fit of the models by Akaike’s information criterion (AIC)  from the log-likelihoods. For model selection we applied a simple root state with equal weights among the six character states (root.p = NULL). Brown rot has been shown to evolve repeatedly from white rot ancestors [70, 71], so we applied an additional root state treatment which only allows white rot as root state. Thus, after model selection we ran the final (best) model using an additional root state coding, which assumed zero probability for brown rot and equal probabilities for each of the three white rot states, and compared the models.
Another framework to estimate pattern of host evolution is the coding as three independent binary states: white rot – brown rot; angiosperm – no angiosperm; gymnosperm – no gymnosperm (e.g. using the function corDISC, from the R package “corHMM”). However, this model requires unobserved states (no angiosperm and no gymnosperm host). Such unobserved states may yield high rates as a methodological artefact . Thus, we decided to use the multi-state implementation in the function rayDISC.
We computed phylogenetic signal in decay mode, gymnosperm association, and the six-state character coding (as defined above). For the decay mode (binary state) we used the phylogenetic D statistic, which is calculated as the sum of sister-clade differences based on reconstructed values on all nodes of the tree . The observed D is then compared against (1) a random expectation (random shuffling of trait values along the tips), and (2) a trait simulated according to a Brownian motion model of character evolution along the tree, after the values were converted to a binary according to a threshold. For the computation we used the function phylo.d in the R package “caper”  with 1000 permutations.
For the gymnosperm association we calculated two measures of phylogenetic signal: Pagel’s lambda  using the function phylosig from the R package “phytools” , and phylogenetic correlograms using the function phyloCorrelogram from the R package “phylosingal” . Lambda measures the phylogenetic dependence of a trait under the assumption of a pure Brownian motion model of evolution. Lambda is a transformation (weight) of the variance-covariance matrix, if other factors than the phylogenetic history had an effect on the trait. If lambda equals 1 the model fits a Brownian motion model of evolution. Phylogenetic correlograms measure phylogenetic signal in dependence of the phylogenetic distance (that is distance in branch lengths). For a single trait, phylogenetic signal is measured as the autocorrelation (Moran’s I) based on a sequence of phylogenetic weights matrices differing in their mean (phylogenetic distance if method = “lag-norm”). We conducted 100 bootstraps for 100 points to generate a confidence interval. If the confidence interval falls below or above 0 the signal becomes significant. We rescaled the phylogeny to a tree height of 1 for this analysis.
For the six state character coding we calculated the phylogenetic signal following the method described in Bush et al.  (function phylo.signal.disc, the script is available at: https://github.com/juliema/publications/blob/master/BrueeliaMS/Maddison.Slatkin.R). A parsimony score of the discrete trait along the tree is compared to a randomized parsimony score inferred by randomizing tip states. If the parsimony score falls outside the random distribution, this indicates a higher conservation than under a random expectation.