Yeast strains
We used 74 meiotic segregants previously generated in [37] from a cross between the prototrophic yeast laboratory strain BY (MATa, derived from a cross between BY4716 and BY4700) and the prototrophic vineyard strain RM (MATα hoΔ::hphMX4 flo8Δ::natMX4 AMN1-BY; derived from RM11-1a), as well as the parental strains BY and RM.
Yeast growth and preparation for metabolite extraction
All segregants and parental strains were inoculated from glycerol stocks into 2 mL of standard synthetic complete media (SCM) (Yeast Nitrogen Base with Ammonium Sulfate (Fisher Cat. #50-489-163), Dropout Mix Complete (US Biological Cat. #D9515), and 2% glucose) in 14 mL Falcon™ round-bottom culture tubes (Falcon™ 352059) and grown overnight, shaking at 30 °C.
All overnight cultures were measured for their optical density (OD) and recorded. These measurements were used to determine the volume of overnight culture needed to inoculate a 30 ml flask of SCM to a starting OD of approximately 0.3. Flasks were then incubated, shaking, at 30 °C for 3–4 h until their OD reached approximately 0.6–0.65.
Flasks were then removed, and their OD was measured and recorded. 21 mL of culture from each flask was transferred to 50 mL Falcon® conical tubes (Corning 352070), and centrifuged at 3000 rpm at 4 °C. Supernatant was discarded and pellet was resuspended in 2 mL of MS-grade H2O, mixed by gentle pipetting, and 1 mL was distributed to each of two screw-cap tubes (Bio Plas 4202SLS). Tubes were then centrifuged at 3000 rpm at 4 °C to pellet yeast.
Supernatant was discarded, and pellet was left as dry as possible. Pellets were immediately frozen at −80 °C in preparation for metabolite extraction. Information on strain growth was recorded (Additional file 11).
Metabolite extraction
Chilled Lysing Matrix C beads (MP Biomedicals) were added to frozen cell pellets on dry ice. Chilled methanol was then added to the tube containing beads and pellets. Samples were disrupted on FastPrep-24™ Benchtop Homogenizer (MP Biomedicals) and centrifuged at maximum speed for 10 min. Supernatant was transferred into an Eppendorf® Protein LoBind Tube and dried using a TurboVap® Evaporator (Biotage). The evaporated samples were reconstituted in methanol, vortexed briefly, centrifuged, and the supernatant was collected in an amber screw-top vial (Waters). The metabolite extract was analyzed immediately on LC–MS or stored temporarily at −20 °C prior to analysis.
LC–MS analytical methods
Targeted metabolite quantification was performed on a 1260 Infinity UHPLC coupled with a G6538A UHD Accurate Mass Q-TOF Mass Spectrometer (Agilent). UHPLC-MS conditions were optimized in terms of peak shape, reproducibility and retention times of different metabolites analyzed.
Chromatography was performed using an Acquity UPLC BEH Phenyl Column (Waters, 130 Å, 1.7 µm, 2.1 mm × 50 mm) kept at 60 °C. Separation was performed using gradient elution with 0.05% (v/v) acetic acid in 50%/50% methanol/water (A) and 0.05% (v/v) acetic acid in 100%/0% methanol/water (B) at a flow rate of 0.5 mL/min. Starting conditions were 100% A and 0% B for 1 min, changing non-linearly to 95% B over the next 15 min, followed by re-equilibration for 4 min prior to the next injection. Mass spectrometry analysis was performed in positive atmospheric pressure chemical ionization (APCI+) mode. Gas temperature was 350 °C, vaporizer temperature was 450 °C, capillary voltage was set at 3.5 kV, and drying gas flow rate was 9 L/min.
For each yeast strain, two biological replicates were analyzed. QC samples were also analyzed per 15 injections. For each injection, 5 µL of sample was injected into LC–MS. Data was collected in centroid mode with a scan range of 50–1000 m/z and acquisition rate of 1.5 spectra/s. Reference Mass Solution (Agilent) was injected at a flow rate of 0.4 mL/min and reference mass correction was enabled to perform mass correction.
LC–MS data processing
LC–MS data was converted into the mzXML format and was processed using MZmine 2.33 [45], employing targeted peak picking and aligning. The ion intensities for each targeted peak were then normalized within each sample to the sum of all the peak intensities in that sample. The generated peak tables were exported for further analysis (Additional files 8, 9, 12).
QTL mapping
QTL were mapped using normalized peak intensities for each segregant. Segregating genetic markers coded as −1 and 1 between the set of segregants used in this study were determined using genetic marker data on these strains from Albert et al. [33] (Additional file 7). Normalized metabolite peak levels for the segregants were regressed on each segregating genetic marker, and their log-odds (LOD) score was determined using the formula -number of segregants*log(1−r2)/(2*log(10)), where r is the Pearson correlation between the marker and the metabolite level. In addition, all unique pairwise ratios of metabolites were determined, and the log2ratio values were regressed upon segregating genetic markers, and LOD scores were determined as described above. To determine significance, each metabolite level or ratio was permuted 2000 times with respect to the genetic markers, and LOD scores were calculated on this permuted data. For genome-wide error rate control, the second highest LOD score among each permutation was taken, and the 90th percentile of these values was taken to provide a cutoff LOD score for a GWER1 < 0.10 [36, 38]. In each round of QTL mapping, only the maximum LOD score marker was taken, and kept as a QTL if it passed the LOD threshold, to prevent the possibility of shadow QTL [46]. To map additional QTL, the significant QTL from the previous round of regression was regressed out of the metabolite levels, and the residuals were used for mapping using the same protocol as described above. This was repeated, regressing out all QTL from previous rounds, until no additional QTL were mapped in a given round for a given metabolite. Boundaries of QTL were defined by a 1.5 LOD drop from the peak LOD, and expanded to all perfectly linked markers with these boundaries. For one segregant, one technical replicate had values far outside of the range of the rest of the strains for squalene and epoxysqualene (Additional file 1: Fig S1). For this segregant, the technical replicate whose values fell within the range of the rest of the segregants was used for QTL mapping for these metabolites. In addition, there were three segregants (including the one mentioned above) whose CDO levels were off diagonal on the technical replicate correlation plots. For each of these segregants, one replicate was chosen to use for the QTL mapping, based on having higher average correlation between metabolites.
Permutation test for metabolite QTL and eQTL overlap
Metabolite QTL were collapsed such that any overlapping QTL were treated as a single QTL, with the narrowest possible QTL boundaries, due to the assumption that they represented the same causal locus (maximum lower confidence interval position, minimum higher confidence interval position). This yielded sixteen total metabolite QTL. Metabolite QTL lengths were kept constant, and their positions along the genome were permuted 10,000 times, ensuring that none of the permuted QTL spanned multiple chromosomes, and none of the permuted QTL overlapped. All eQTL from [33] for the 24 genes within the GO term GO:0006696: ergosterol biosynthetic process [47, 48] were obtained. If the peak marker for any of these eQTL was within the boundaries of a permuted metabolite QTL range, this was treated as an overlap. The number of permuted metabolite QTL containing a peak marker for any eQTL was recorded for each permutation. The permutation p-value was obtained by calculating the fraction of permutations with as many or more metabolite QTL overlaps than in the unpermuted data.
V-test
The v-test was performed as described in Eq. 2 of Fraser [39] for all of the traits. The heritability (H2) of the segregants was calculated as the difference between the variance of the F2 segregants and the variance of the parents, scaled by the variance F2 segregants. Notably, the technical and environmental variation in the parental zymosterol measurements was larger than the biological variation, making it impossible to correct for the parental heritability, and so only the heritability within the F2 segregants was corrected for, which makes the test more conservative in the case of stabilizing selection. The DCD and epoxysqualene metabolite level F2 distributions were tested for normality using the Shapiro-Wilks test, and both deviated significantly from normality (DCD p = 3.73 × 10–5, epoxysqualene p = 0.025). Due to these violations of the v-test assumptions, we developed a non-parametric version of the test. For one segregant, one technical replicate had values far outside of the range of the rest of the strains for squalene and epoxysqualene (Additional file 1: Fig S1). For this segregant, the technical replicate whose values fell within the range of the rest of the segregants was used for the v-test for these metabolites. In addition, there were three segregants (including the one mentioned above) whose CDO levels were off diagonal on the technical replicate correlation plots. For each of these segregants, one replicate was chosen to use for the v-test, based on having higher average correlation between metabolites.
Non-parametric metabolite level selection test
We first calculated the absolute difference between each segregant’s metabolite level and the mean level of the F2 segregants for each trait to determine the F2 segregants’ spread. Next, to determine the parental dispersion for each metabolite, we calculated the absolute difference in metabolite level between each BY biological replicate and the mean of that BY replicate and an RM replicate, using the mean of the two technical replicates for each of these measurements as we did for the segregants. This led to three independent measurements of dispersion for the parents from the three biological replicates each of BY and RM with no replicate being used in more than one comparison. Since there were multiple ways to pair the three BY and RM replicates, we repeated this procedure with three possible pairings of BY and RM biological replicates with no pairings repeated in separate tests (set 1: BY1/RM1, BY2/RM2, BY3/RM3; set 2: BY1/RM2, BY2/RM3, BY3/RM1; set3: BY1/RM3, BY2/RM1, BY3/RM2—numbering arbitrary) to ensure the results were consistent regardless of the choice of replicate pairings. We then compared the segregant and parental distributions for each metabolite using the Kruskal–Wallis test, a non-parametric test which allows us to determine whether parental spread values are significantly higher or lower than the segregant spread values (Additional file 5: Fig S5). The p-values from the Kruskal–Wallis test were not well-calibrated for many of the metabolites based on permutations (Additional files 4, 5: Fig S4, S5), possibly due to the unequal sample sizes of the parent sets (three) and the segregants (74). Thus, to assess the significance of this test, we used a permutation approach, wherein we randomly selected 6 strains’ metabolite levels from the combined set of parents and segregants, split them into two groups of three to represent the three parental replicates, and then obtained the Kruskal–Wallis test statistic as described above, using the six randomly chosen strains instead of the true parental strains. We repeated this procedure 20,000 times to get a permutation test statistic distribution. We then calculated the adjusted permutation p-values for each trait by counting the number of permutation test statistics greater than or equal to the actual test statisic for the parent-segregant comparison, dividing by 20,000, and multiplying by eight to perform Bonferroni multiple-test correction for tests on the eight metabolites. The highest corrected p-value is reported in the text, and for all three traits identified as being under selection, all of the parent sets showed significant corrected p-values at alpha = 0.10 (zymosterol p = 0.0048, 0.0272, 0.006; DCD p = 0.082, 0.0684, 0.096; epoxysqualene p = 0.0124, 0.0048, 0.0048). For one segregant, one technical replicate had values far outside of the range of the rest of the strains for squalene and epoxysqualene (Additional file 1: Fig S1). For this segregant, the technical replicate whose values fell within the range of the rest of the segregants was used for selection tests for these metabolites. In addition, there were three segregants (including the one mentioned above) whose CDO levels were off diagonal on the technical replicate correlation plots. For each of these segregants, one replicate was chosen to use for selection tests, based on having higher average correlation between metabolites.
Epistasis test
The epistasis test was performed as described in [41]. Briefly, the Δ- statistic was calculated as \(\Delta = \mu_{F2} - \frac{{\mu_{BY} + \mu_{RM} }}{2}\), where μF2 is the mean of the F2 segregants, μBY is the mean of the BY replicates, and μRM is the mean of the RM replicates. The squared standard error of the mean (SSEM) for this statistic was calculated as \(SSEM = \frac{{var\left( {F2} \right)}}{{n_{F2} }} + \frac{{\frac{{var\left( {BY} \right)}}{{n_{BY} }} + \frac{{var\left( {RM} \right)}}{{n_{RM} }}}}{4}\), where var(F2) is the variance of the F2 segregants’ metabolite levels, var(BY) is the variance of the BY replicates, var(RM) is the variance of the RM replicates, nF2 is the number of F2 segregants, and nBY and nRM are the number of replicates for the respective parents. Dividing the Δ- statistic by the square root of the SSEM yielded t-values for each of the metabolites. To assess significance, the parental and segregant levels were appended, the choice of “parent” measurements was randomized 1000 times, and the t-value was calculated. The Bonferroni-corrected threshold for significance was 0.05/8 due to the 8 metabolites tested, and so real t-values greater than the 99.375% of the absolute values of the permuted t-values were called significant.