Dated tribe-wide whole chloroplast genome phylogeny indicates recurrent hybridizations within Triticeae

Bernhardt, Nadine; Brassac, Jonathan; Kilian, Benjamin; Blattner, Frank R.

doi:10.1186/s12862-017-0989-9

Research article
Open access
Published: 16 June 2017

Dated tribe-wide whole chloroplast genome phylogeny indicates recurrent hybridizations within Triticeae

Nadine Bernhardt ORCID: orcid.org/0000-0003-0716-6284¹,
Jonathan Brassac¹,
Benjamin Kilian^1,2 &
…
Frank R. Blattner^1,3

BMC Evolutionary Biology volume 17, Article number: 141 (2017) Cite this article

7098 Accesses
67 Citations
1 Altmetric
Metrics details

Abstract

Background

Triticeae, the tribe of wheat grasses, harbours the cereals barley, rye and wheat and their wild relatives. Although economically important, relationships within the tribe are still not understood. We analysed the phylogeny of chloroplast lineages among nearly all monogenomic Triticeae taxa and polyploid wheat species aiming at a deeper understanding of the tribe’s evolution. We used on- and off-target reads of a target-enrichment experiment followed by Illumina sequencing.

Results

The read data was used to assemble the plastid locus ndhF for 194 individuals and the whole chloroplast genome for 183 individuals, representing 53 Triticeae species and 15 genera. We conducted Bayesian and multispecies coalescent analyses to infer relationships and estimate divergence times of the taxa. We present the most comprehensive dated Triticeae chloroplast phylogeny and review previous hypotheses in the framework of our results. Monophyly of Triticeae chloroplasts could not be confirmed, as either Bromus or Psathyrostachys captured a chloroplast from a lineage closely related to a Bromus-Triticeae ancestor. The most recent common ancestor of Triticeae occurred approximately between ten and 19 million years ago.

Conclusions

The comparison of the chloroplast phylogeny with available nuclear data in several cases revealed incongruences indicating past hybridizations. Recent events of chloroplast capture were detected as individuals grouped apart from con-specific accessions in otherwise monopyhletic groups.

Background

The economically important grass tribe Triticeae Dumort. consists of approximately 360 species and several subspecies in 20-30 genera. Triticeae taxa occur in temperate and dry regions of the World and harbour the important cereals bread wheat (Triticum aestivum), barley (Hordeum vulgare), rye (Secale cereale) and their wild relatives [1, 2]. Yet there is no good understanding of the relationships among Triticeae taxa, although a multitude of molecular phylogenies have been produced [3,4,5,6,7,8,9,10,11]. The acceptance levels of taxa vary greatly among authors on the genus-level and below (for recent reviews see [1, 12, 13]). One important reason is the complex mode of evolution within Triticeae. The majority of species are allopolyploids and many of them likely have originated repeatedly, involving genetically different parent species [14,15,16,17,18,19]. Bread wheat is the most prominent polyploid and evolved via consecutive hybridizations of three diploids and thereby combines three related genomes (named A, B and D) [7, 20]. In Triticeae and many other crops such genomes were defined through cytogenetic characterization of chromosomes together with the analysis of their pairing behaviour in interspecific and intergeneric crosses (for reviews see [1, 12, 21]). It is assumed that diploid species and monogenomic taxa are the basic units within Triticeae and that the heterogenomic polyploids form a second level of taxonomic entities [22, 23].

Triticeae are known to have low barriers against hybridization, which result in mixed or even recombinant phylogenetic signals from nuclear data [10, 20]. In contrast, phylogenetic analyses of plastid sequences provide clear information on maternal lineages, as organelles are mostly uniparentally inherited and non-recombining in angiosperms [24], although chloroplast capture [25] can result in deviating phylogenetic hypotheses. Yet, plastid sequence data is limited for Triticeae. Studies based on a tribe-wide taxon sampling are rare and focused on single or few plastid markers [9, 26,27,28]. To date, the number of whole plastid genome sequences is increasing [29,30,31,32,33,34], however, entire chloroplast genomes are mainly available for the domesticated taxa and their closest relatives. These previous studies provide only limited insight in the maternal phylogeny of Triticeae, as only one to few accessions per taxon were included and often support values for the taxonomic units are low [26, 28, 35].

Here we present phylogenetic analyses of chloroplast sequences based on a comprehensive set of monogenomic Triticeae species plus allopolyploid representatives of the wheat group (i.e. taxa belonging to the genera Aegilops and Triticum). For each species we included multiple individuals to sample part of the intraspecific variation. We performed a target-enrichment and next-generation sequencing (NGS) approach that, among nuclear loci (which will be published elsewhere), targeted the chloroplast ndhF gene. Since chloroplasts occur in high copy number in the plant cell, they represent a large fraction of the off-target reads when sequencing reduced complexity libraries, which can be used to assemble almost complete chloroplast genomes [36]. Our dataset was complemented by chloroplast genomes stored in the GenBank database. Multispecies coalescent (MSC) analyses based on trnK-matK, rbcL and ndhF were used for dating of the major splits within the evolution of the tribe and to reconsider the monophyly of the Triticeae chloroplast lineages.

Methods

Plant materials

Aiming at a good representation of taxa for phylogenetic inference we analysed 194 individuals representing approximately 53 species belonging to 15 genera (depending on taxonomic treatment applied) of the grass tribe Triticeae and included Bromus and Brachypodium accessions as outgroup taxa (Table 1, Additional file 1: Table S1). The accessions were acquired from the International Center for Agricultural Research in the Dry Areas (ICARDA), the seed bank of the Leibniz Institute of Plant Genetics and Crop Research (IPK), the National Small Grain Collection of the US Department of Agriculture (USDA), the Czech Crop Research Institute, and the Laboratory of Plant Genetics (Kyoto University). Additional seed material was collected during field trips. Multiple accessions per species and intra-specific entities were selected if possible. All materials were grown from seed and identified based on morphological characters if an inflorescence was produced. Plant material obtained from germplasm repositories that was found to be in conflict with its taxonomic affiliation was only included in the analyses if the taxon could be unequivocally determined. Vouchers of the morphologically identified materials (Additional file 1: Table S1) were deposited in the herbarium of IPK (GAT).

Table 1 Overview of Triticeae and outgroup taxa considered

Full size table

Laboratory work

Flow-cytometric measurements were conducted to determine the ploidy level for all accessions. All analyses followed the protocol of Doležel et al. [37] on a CyFlow Space flow cytometer (Partec). At least 7500 nuclei were counted. Only measurements with a coefficient of variation (CV) for sample and standard peak <4% were accepted. Samples that recurrently produced CV values >4% were repeated in Galbraith’s buffer containing 1% polyvinylpyrrolidon (vol/vol) and 0.1% Triton X-100 (vol/vol). At least three measurements per species were carried out. If only a single accession of a species could be retrieved from a seed bank, its ploidy level was estimated three times. Samples of the same species were processed on at least two different days to account for instrument drifts.

Genomic DNA was extracted either from 10 mg silica-dried leaves using the DNeasy Plant Mini Kit (Qiagen) or from 5 g of freeze-dried leaves using the cetyltrimethyl-ammonium bromide (CTAB) method [38]. DNA quantifications were done using the Qubit dsDNA BR Assay (Life Technologies) or the Quant-iT PicoGreen dsDNA Assay Kit (Invitrogen) on a Tecan Infinite 200 microplate reader according to the manufactures instructions. The LE220 Focused-Ultrasonicator (Covaris) was used to shear 3 μg genomic DNA in 130 μl TE buffer for every sample into fragments having an average length of 400 bp with the following settings: instantaneous ultrasonic power (PIP) 450 W, duty factor (df) 30%, cycles per burst (cpb) 200. The treatment was applied for 100 s. The sheared DNA was used in a sequence-capture approach (SureSelect^XT Target Enrichment for Illumina Paired-End Sequencing, Agilent Technologies) targeting at 450 nuclear single-copy loci aiming for 0.01–0.02% of a Triticeae genome. Baits complementary to chloroplast ndhF based on 628 bp of available Hordeum vulgare, Aegilops tauschii, Pseudoroegneria spicata, Triticum urartu (identical to EF115541.1, JQ754651.1, KJ174105.1 and AF056180.1, respectively) and 2073 bp of Brachypodium distachyon (identical to AF251451.1) sequences were designed as well. The pairwise sequence identity was larger then 99% among Triticeae taxa and 96% when comparing the Triticeae taxa with Brachypodium. Baits were designed to cover the entire 2073 bp of ndhF as well as each polymorphism between the reference sequences at least five times. After the enrichment procedure all samples were barcoded and pooled (following [39, 40]) at equimolar ratios. Capture libraries were sequenced on the Illumina HiSeq 2000 or MiSeq. The flowcells were loaded aiming for a sequencing coverage of at least 40X.

Data assembly

We used the captured ndhF and the off-target read fraction (i.e. reads for which no capture probes were designed in the target-enrichment experiment) to assemble whole chloroplast genomes. Geneious versions R8–R10 (Biomatters Ltd.) were used for quality control and downstream analyses. Read pairs were set with an average insert size of 300 bp and bases with an error probability above 5% were trimmed. Chloroplast genomes were assembled in a two-step procedure consisting of (1) the generating of a species-specific reference sequences followed by (2) the creation of individual-based chloroplast assemblies.

In the first step we assembled species-specific chloroplast sequences by combining reads of multiple accessions of a single species. This increased the coverage for a species-specific chloroplast genome compared to the usage of data of an individual sample only. In a few cases single accessions were found to contribute a large amount of variation to these assemblies. These accessions were excluded from species-specific assemblies (Additional file 1: Table S1). The reads were either mapped to GenBank sequences of conspecific or closely related taxa (for Aegilops, Hordeum and Triticum species), or to Hordeum vulgare (EF115541), a well-studied basal organism in Triticeae, for taxa lacking conspecific chloroplast genomes in GenBank. One inverted repeat was cleaved off the GenBank sequences as no sequence variation has been found between the inverted repeats of the same chloroplast genome. A careful comparison of Triticeae chloroplast genomes available in GenBank showed a large amount of insertions and deletions (indels) among the sequences from single species. In case several chloroplast genomes per species were retrieved from GenBank, those were aligned and an annotated consensus was created as reference to check for intraspecific indels. Then a stringent read mapping approach was used that considered only reads with mates mapping in proper distance according to the insert size (±50%). This was done to avoid the inclusion of chimerical Illumina reads, which have been reported to occur frequently [41]. All read mappings were performed using the Geneious mapper with five iterations, allowing at maximum 15% of mismatches per read and a maximum gap size of 1000 bp to encompass large deletions. The assembly results were compared and manually checked for inconsistencies (i.e. indels the assembler was unable to resolve). Consensus sequences were called using the 50% majority rule. Up to five rounds of mapping and inspection were performed, each time using the contig obtained previously.

In the second step, for each sequenced individual chloroplast sequences were assembled by mapping all reads to their species-specific consensus sequence generated in step (1). Read mappings were performed using the Geneious mapper with five iterations, allowing at maximum 10% of mismatches per read and a maximum gap size of 100 bp. The assembly results were manually checked for inconsistencies. No global coverage threshold was applied as the read coverage for single accessions were relatively low. However, single nucleotide polymorphisms (SNPs) compared to the reference covered by a single read were masked. Finally consensus sequences were called using the ‘Highest Quality’ option, which is able to resolve conflicts between reads because it takes the relative residue quality into account. ‘N’ were called for positions without coverage. Whole chloroplast sequences with more than 50% missing data were excluded from further analyses.

A multiple sequence alignment of the whole chloroplast genomes generated in step (2) plus a set of GenBank-derived sequences (Additional file 1: Table S1) was generated using Mafft 7 (http://mafft.cbrc.jp/alignment/software; accessed in November 2016; [42]) applying the auto algorithm in combination with the ‘nwildcard’ option. The alignment was manually curated. The sequences generated in the scope of this study were annotated by comparing them to the annotations of the GenBank accession number KJ592713 [43] in Geneious. All sequences were submitted to GenBank (accession numbers KX591961-KX592154 and KY635999-KY636181). The number of parsimony-informative positions was inferred using PAUP* 4.0b10 [44].

Phylogenetic analyses

We performed a Bayesian phylogenetic analysis for ndhF, as the sequence of this locus could be retrieved for all individuals without any missing data. First, unique ndhF haplotypes were identified using TCS 1.2.1 [45]. The best-suited model of sequence evolution was identified on the data matrix of unique haplotypes with jModelTest 2.1.4 [46] using the default parameters. The Bayesian information criterion (BIC; [47]) was selected for model choice because of its high accuracy [46] and its tendency to favour simpler models than the Akaike information criterion (AIC; [48]). Bayesian inference (BI) was performed in MrBayes 3.2.6 [49] using the model inferred by jModelTest. BI consisted of four independent analyses each running for 20 million generations and sampling a tree every 1000 generations.

BI of the whole chloroplast genome alignment were run with MrBayes 3.2.6 on the CIPRES (Cyberinfrastructure for Phylogenetic Research) Science Gateway 3.3 [50] for two datasets: (1) the complete alignment and (2) one alignment with positions having more than 50% of missing data being masked in Geneious version R10. The best-fitting models of sequence evolution were estimated by making the Monte Carlo Markov chain (MCMC) sampling across all substitution models ([51]; ‘lset nst = mixed’). For each dataset we performed three analyses, testing the impact of different rate settings, i.e. (1) a gamma-distributed rate variation, (2) a proportion of invariable sites and (3) with both combined to be able to identify the best-suited substitution model by comparing the posterior probabilities with AIC through MCMC (AICM; [52]), which is less computing intensive though not as accurate as the application of Bayes factors [53], in Tracer. Each analysis was performed with two independent Metropolis coupled MCMC analyses each with four sequentially heated chains (temperature set to 0.05) until the standard deviation of split frequencies reached 0.009, a maximum of 10 million generations or the maximum runtime of CIPRES. Trees were sampled every 500 generations. For all Bayesian analyses conducted Brachypodium distachyon (EU325680) was set as outgroup and the convergence of the runs was assessed in Tracer v. 1.6 [54]. A consensus tree was computed after deleting a burn-in of the first 25% of trees.

Additionally, a Bayes factor (BF; [55]) analysis was carried out for the ndhF dataset to further evaluate the monophyly of Triticeae chloroplasts. Mean marginal log-likelihoods were computed using the stepping-stone sampling [56] in MrBayes 3.2.6 [49] for monophyletic and polyphyletic relationships of Triticeae and the substitution model as identified in jModelTest. Each analysis consisted of two million generations with four independent runs of four parallel chains. The BF was evaluated using ten as a cut-off value [57].

Estimating divergence times using trnK-matK, rbcL and ndhF

We inferred a calibrated phylogeny for the three plastid loci trnK-matK, rbcL and ndhF. First, we tested the robustness of the calibration of the most recent common ancestor (MRCA) of Brachypodium and Triticeae when increasing the sampling for Triticeae from 12 to 37 species compared to Marcussen et al. [20]. For this a Bayesian coalescence analysis based on trnK-matK, rbcL and ndhF for the subfamily Pooideae was performed. The same GenBank sequences were assembled to form a contiguous sequence as described and used in Marcussen et al. [20]. This set of GenBank accessions was complemented with sequences assembled in this study whenever additional taxa or more sequence information from a certain taxon could be added. Following Marcussen et al. [20] we restricted ourselves to one sequence per species. We used the species-specific sequences from step (1) of the sequence assembly procedure, over the selection of a single accession per taxon, comparable to Pelser et al. [58]. This allowed us to employ all phylogenetic information available for a taxon and to overcome stretches of missing data. Conspecific sequences used for consensus inference showed 99.96 – 100% of identical sites. The best partitioning schemes and DNA substitutions models were inferred with PartitionFinder [59, 60] comparing all possible partitioning schemes. The analysis was carried out using the combination of age priors for analyses 2, 4, 6, 10 and 17 as published in Marcussen et al. [20] in Beast 2.4.1 [61]. For each setting one replicate was performed. Priors on the root age were estimated as stem node (‘use originate’). We found the divergence time of Brachypodium and Triticeae as inferred by Marcussen et al. [20] to be robust. Second, we performed a multispecies coalescent (MSC) analysis using it as the secondary calibration point in million years ago (Ma) as normally distributed priors for the root of Brachypodium-Triticeae (mean 44.44 Ma ± 3.53) on trnK-matK, rbcL and ndhF for all Triticeae accessions. We excluded gene sequences of trnK-matK and rbcL if they showed more than 50% of missing data and sequences of all polyploid wheat accessions. Sequences of Zea mays, Oryza sativa, Brachypodium distachyon and two Bromus species were included as outgroup taxa. The taxa Triticum monococcum and T. boeoticum, Secale cereale and S. vavilovii, Pseudoroegneria tauri and Ps. libanotica, Taeniatherum caput-medusae and Tae. crinitum, Agropyron cristatum and Agr. cimmericum were each subsumed under the same species name (Additional file 1: Table S1), as no pronounced genetic variation were detected in the analysis of whole chloroplast sequences. Hereby we were following existing taxonomic treatments, which already unify these taxa under a single species name (see, e.g. [62]).

We performed MSC analyses for a dataset including Psathyrostachys and another one without it to evaluate the impact of this taxon on divergence times. Monophyly of Triticeae was not enforced for either analysis as suggested by the Bayes Factor analysis. For each dataset, first, the best partitioning schemes and DNA substitution models were inferred with PartitionFinder searching all partitioning schemes. The analysis was run with the substitution models being linked, the Yule species tree prior, as well as the piecewise linear and constant root population model. Since the rate constancy was systematically rejected for all loci by the likelihood-ratio test [63], an uncorrelated lognormal clock model ([64]; uniform ucld.mean: min 0, max 0.01) was used. Trees were sampled every 5000 generations. Four independent analyses were performed and each was run for 600 million generations. All MSC analyses were run using the Beagle library [65]. Effective sample sizes (ESS) and convergence of the analyses were assessed using Tracer v. 1.6 [54]. An appropriate burn-in was estimated from each trace file, and the four analyses were combined with LogCombiner as part of the Beast package. Then a maximum clade credibility (MCC) tree was summarised with TreeAnnotator and visualized with FigTree 1.4.2 [66].

Results

Ploidy levels

Flow cytometric measurements were performed for all accessions to be able to distinguish between different ploidy levels for accessions of the same species (Additional file 1: Table S1). We identified di- and tetraploid accession for Agropyron cristatum, Eremopyrum bonaepartis, Pseudoroegneria stipifolia and Ps. strigosa, and detected tetra- and hexaploid cytotypes for Aegilops crassa and Ae. neglecta. Flow cytometric measurements were used as additional information to confirm species affiliations [67]. For example, comparing of the genome sizes measured for the diploid species Thinopyrum bessarabicum and Th. elongatum to the data from the Kew Angiosperm DNA C-values database revealed that the analysed accessions actually represent polyploids instead of diploids. For more information on problematic material from seed banks see Additional file 1: Table S1.

Sequence assembly

The target-enrichment protocol and Illumina sequencing were applied to 194 accessions, covering 53 species of 15 genera (dependent on the applied classificatory system) and three outgroup species (i.e. Bromus and Brachypodium, Table 1, Additional file 1: Table S1). Whole chloroplast genomes were assembled in a two-step procedure via (1) an intermediate step of generating a species-specific reference if there was none available in GenBank and (2) the assembly of the chloroplast of each accession via read mapping to sequences from step (1).

The average coverage of the chloroplast genome varies largely between single samples and depends mainly on the actual sequencing depth. Between approximately 50% and 90% of the reads mapping to the chloroplast mapped to ndhF (Additional file 2: Table S2), which was included in the bait design. Thus, the ndhF gene could be assembled for all accession without missing data. We identified 64 unique haplotypes when comparing the ndhF gene data plus the sequences downloaded from GenBank (Additional file 1: Table S1). The alignment of these 64 haplotypes had a total length of 2232 bp with 186 (8.3%) parsimony-informative sites.

The entire alignment of whole chloroplast genome sequences comprised 222 sequences, 39 of them were downloaded from GenBank. This alignment ranged from psbA in the large single-copy region to partial ndhH in the small single-copy region and had a total length of 123,531 bp. It had 9064 (7.3%) parsimony-informative positions. The data matrix included 9.3% of missing data (‘N’). These randomly distributed stretches of missing data occur in the alignment in regions where the sequencing coverage was insufficient. Additionally the matrix revealed 7.5% of gaps due to length variation between taxa. In several cases taxa showed long indels in intergenic regions, thus the same 900 bp deletion was found between rpl23 and ndhB in Pseudoroegneria, Thinopyrum and Dasypyrum. Many short indels (3–40 bp) were found in introns of coding genes (e.g. ysf3) and intergenic spacers. A variant of this alignment, having regions with 50% of missing data being removed, had a total length of 114,788 bp. In this case 8717 (7.6%) positions of the alignment were parsimony informative, while 9.2% of the characters were constituted by N’s and 0.8% of gaps. Alignment masking mainly excluded regions of length variation due to short repeat motives in intergenic regions. With only few substitutions per chloroplast intraspecific variation was generally very low.

The alignment revealed insertions unique to some GenBank sequences whose true occurrence could not be confirmed by our data: no reads from our analysed individuals mapped to these insertions. Moreover, BLAST searches of these regions returned mitochondrial and/or nuclear genomic data as best hit (e.g. KC912690, KC912692, KC912693, KC912694) indicating assembly artefacts. Those GenBank sequences were excluded from further analyses (Additional file 1: Table S1).

Phylogenetic analyses

We performed a BI analysis on the set of 64 unique ndhF haplotypes with the model of sequences substitution set to GTR + G [68, 69], as identified by jModelTest. The phylogenetic tree obtained from ndhF (Fig. 1) shows Triticeae to be paraphyletic, as the lineage of Psathyrostachys appears to have diverged before the lineage of Bromus, although the position of Bromus is with a posterior probability (pp) of 0.88 not well supported. The branch lengths for the Bromus group are considerably longer compared to Psathyrostachys. The topology shows that individuals of most species and/or genera form monophyletic groups. However, Eremopyrum bonaepartis is polyphyletic, as the diploid plastid type of E. bonaepartis groups as sister to Henrardia persica, while the haplotypes of all tetraploid E. bonaepartis and diploid E. triticeum form a clade with Agropyron cristatum. A common maternal ancestor can be hypothesized for Agropyron, Australopyrum, Eremopyrum and Henrardia as these taxa form a well supported clade, which is sister to the clade of Hordeum species. The clades formed by the genera Heteranthelium, Secale and Taeniatherum are placed on a polytomy together with a clade formed by taxa having a B, G or S genome [i.e. Aegilops speltoides (S) and all polyploid Triticum taxa (B/G)], the clade of taxa with an E, St or V genome (i.e. Thinopyrum, Pseudoroegneria and Dasypyrum), and the clade of all remaining Aegilops, Amblyopyrum and diploid Triticum taxa. Pseudoroegneria appears paraphyletic, as Dasypyrum and Thinopyrum haplotypes group within this clade. The backbone of this clade represents a polytomy. Notably the placement of the otherwise monophyletic Dasypyrum is not supported. Several different haplotypes can be distinguished for various species of Pseudoroegneria itself (e.g. Ps. spicata, Ps. strigosa, Ps. tauri, Ps. stipifolia). Furthermore, the two A-genome species Triticum urartu and T. monococcum are monophyletic. Also all D-genome species (i.e. Ae. tauschii, Ae. cylindrica and Ae. ventricosa) form a clade. Both genomic groups are located on a polytomy together with the remaining Aegilops species and Amblyopyrum. Aegilops crassa and Ae. juvenalis (D’) group apart from the other D taxa and show a ndhF haplotype with less nucleotide differences to S* than to D chloroplast lineages (i.e. one SNP difference to S* vs. three and five SNPs to D). All diploid and polyploid S* species sequenced in the scope of this study share the same ndhF haplotype. Aegilops comosa (M) and Ae. uniaristata (N) are sister species. All U-genome taxa fall into the same clade together with Aegilops geniculata (M°) and Amblyopyrum muticum (T). Aegilops triuncialis accessions possess U as well as C haplotypes.

Sometimes, single accessions of a species group within the otherwise monophyletic clade of another species. Thus, the accession AE_1831 of Aegilops markgrafii (C) falls into the clade of Amblyopyrum muticum (T) while KP_2012_119 of Aegilops biuncialis (U) falls within Ae. geniculata (M°). The accession AE_586 of Aegilops neglecta (U) groups together with Ae. markgrafii (C). Further, intraspecific variation within ndhF was found in several cases, for example, for Aegilops comosa, Ae. speltoides, Amblyopyrum muticum, and Dasypyrum villosum. With a score of 36.4, BF strongly favours Triticeae chloroplasts as paraphyletic (Additional file 3: Table S3) when Psathyrostachys is included in the analysis.

As the resolution of the phylogenetic tree from the ndhF dataset is not sufficient to distinguish between more recently diverged taxa, the whole chloroplast genome dataset was phylogenetically analysed by BI using an alignment of the entire chloroplast genomes and a variant of it were positions having more then 50% of missing data have been masked. In both cases MrBayes revealed a gtrsubmodel in combination with gamma-distributed rate variations as best-suited substitution model. The topologies (Fig. 2, Additional file 4: Figure S1) returned from both analyses are mainly congruent to each other and to the ndhF tree. However, nodes of deep splits supported moderately for the complete plastid data matrix show higher support in the dataset where low-covered regions have been masked. This is, for example, the case for the split of the ancestor of Bromus. The branch length differences between Bromus and Psathyrostachys are in agreement with the ndhF tree. In contrast to the ndhF dataset, the whole chloroplast phylogenies are able to provide a hypothesis of the relationships between all major genomic groups. They suggest that the E, St, and V clade (i.e. Thinopyrum, Pseudoroegneria and Dasypyrum) diverged before Heteranthelium, which in turn split before Secale and Taeniatherum. Pseudoroegneria spicata forms its own clade that diverged first from Dasypyrum and the remaining taxa within this clade. However, the Dasypyrum chloroplast genomes are characterized by rather long branches compared to other taxa in this clade. Furthermore, Dasypyrum comprises two well-differentiated haplotypes. Aegilops speltoides and the polyploid wheat species form three groups: (1) most Ae. speltoides accession form a clade of their own (S), (2) some Ae. speltoides accessions group together with Triticum timopheevii, T. zhukovskyi and the artificially synthesized wheat T. kiharae (G) and (3) all accessions of T. turgidum and T. aestivum share the same haplotype (B). The not supported placement of one Ae. speltoides accession (PI_48721) close to the S group, shifts to a supported position in the G group when regions with an high extent of missing data were masked. Additionally, the usage of entire chloroplast genomes resolves that diploid Triticum species (A) diverged before the D-genome taxa and the remaining Aegilops species and Amblyopyrum. The phylogeny also indicates that D’ is closely related but distinct from D. Further, M°, T and U taxa form a clade, that diverged before the split of taxa having a C, N, M or S* genome. Within this clade the sister species relationship of Aegilops comosa and Ae. uniaristata is confirmed. Aegilops comosa (M) groups distinct from the other M° plastid type. The species Ae. searsii, Ae. bicornis, Ae. longissima, Ae. sharonensis form a clade together with the polyploid Ae. kotschyi and Ae. peregrina (S*) indicating only very little sequence variation. Concordant to the ndhF tree, one sequence each of Ae. markgrafii (AE_1831), Ae. biuncialis (KP_2012_119) and Ae. neglecta (AE_586) group apart from the other sequences of their respective taxon.

Ages of clades

Divergence times were estimated based of trnK-matK, rbcL and ndhF sequences for each accession included in the study and using an uncorrelated lognormal clock model and a secondary calibration on the MRCA of Brachypodium distachyon and Triticeae in *BEAST. Different ages for the split of Triticeae and Bromus were obtained depending on the in- or exclusion of the genus Psathyrostachys. Including Psathyrostachys, Triticeae are paraphyletic and the ages are slightly older but with larger and overlapping 95% highest-posterior densities (HPD) compared to the dataset that does not comprise Psathyrostachys (Additional file 5: Figure S2, Additional file 6: Figure S3). In the analysis including Psathyrostachys the most recent common ancestor (MRCA) of Triticeae and Bromus occurred approximately 19.44 Ma (95% HPD = 12.66-27.20). The split of Bromus and the remaining Triticeae (termed “core Triticeae”) occurred approximately 15.77 Ma (95% HPD = 9.38-22.75). The age of this split does not seem affected by the absence of Psathyrostachys (15.41 Ma, 95% HPD = 10.72-20.83). However, the MRCA of the core Triticeae occurred approximately 12.17 Ma (95% HPD = 7.65-17.44) including Psathyrostachys and nearly 2.5 million years later (9.68 Ma, 95% HPD = 7.42-12.21) in the analysis omitting this early diverging lineage. The MRCA of Aegilops, Triticum and Amblyopyrum (plus Taeniatherum) occurred around 4.14 Ma (95% HPD = 2.48-6.44) including Psathyrostachys and 3.38 Ma (95% HPD = 2.35-4.47), when omitting it.