A multi gene sequence-based phylogeny of the Musaceae (banana) family
BMC Evolutionary Biology volume 11, Article number: 103 (2011)
The classification of the Musaceae (banana) family species and their phylogenetic inter-relationships remain controversial, in part due to limited nucleotide information to complement the morphological and physiological characters. In this work the evolutionary relationships within the Musaceae family were studied using 13 species and DNA sequences obtained from a set of 19 unlinked nuclear genes.
The 19 gene sequences represented a sample of ~16 kb of genome sequence (~73% intronic). The sequence data were also used to obtain estimates for the divergence times of the Musaceae genera and Musa sections. Nucleotide variation within the sample confirmed the close relationship of Australimusa and Callimusa sections and showed that Eumusa and Rhodochlamys sections are not reciprocally monophyletic, which supports the previous claims for the merger between the two latter sections. Divergence time analysis supported the previous dating of the Musaceae crown age to the Cretaceous/Tertiary boundary (~ 69 Mya), and the evolution of Musa to ~50 Mya. The first estimates for the divergence times of the four Musa sections were also obtained.
The gene sequence-based phylogeny presented here provides a substantial insight into the course of speciation within the Musaceae. An understanding of the main phylogenetic relationships between banana species will help to fine-tune the taxonomy of Musaceae.
The global annual production of bananas and plantains (Musa spp.) amounts to > 120 Mt , making this species one of the world's most important fruit crops. As well as their prominence as a dessert fruit, they provide a vital source of carbohydrates to many inhabitants of the humid tropics. Musa production, like that of all crop species, is endangered by a range of pests and diseases, affecting both the yield and quality of the fruit. While the large-scale commercial plantations can secure production by frequent applications of fungicide and pesticide, this form of crop management is increasingly recognized as environmentally irresponsible. Meanwhile, smallholders, who together account for at least 85% of world production, can seldom afford the expense of chemical control, and their crop remains vulnerable to diseases and pests. Improvement of cultivated banana via breeding is hampered by the absence of sexual reproduction and narrow genetic basis. As a result, attention has turned to non-cultivated wild relatives as sources of new genes for banana improvement. This, underlines a renewed interest to analyze and conserve genetic diversity within Musa spp., which in turn has raised a number of questions related to their taxonomy.
The banana family (Musaceae) has been assigned to the order Zingiberales in the clade commelinids in the monocots  and has been conventionally divided into the three genera Musa, Ensete and Musella. The genus Musa is characterized by a set of morphological descriptors, and has a basic chromosome number (x) of 9, 10 or 11. The genus has been sub-divided into the four sections Eumusa (x = 11; comprising most of the cultivated species), Rhodochlamys (x = 11), Australimusa (x = 10) and Callimusa (x = 9, 10) [3, 4]. More recently, Argent  added a fifth section, Ingentimusa (x = 7), containing just a single species M. ingens. However, since this one species (x = 7) grows within the Australimusa region (New Guinea), its section-status is not evident when compared to M. beccarii (x = 9), which grows in the Callimusa region (Borneo) and remains classified as a Callimusa.
With the application of DNA-based tools, this conventionally-based taxonomy has become increasingly difficult to justify. Thus, based on RFLP genotyping, Gawel et al.  proposed a merger between Eumusa and Rhodochlamys, a suggestion consistent with nuclear genome sizes and the distribution of rDNA loci , as well as with the phylogenetic analysis based on the ITS and organellar DNA . Jarret and Gawel  further proposed combining Australimusa and Callimusa into a single section, a suggestion supported by AFLP genotypes acquired by Wong et al. . However, the results of AFLP genotyping led Ude et al.  to argue that the conventional taxonomy of Musa was in fact tenable.
The ease of DNA sequencing has revolutionized phylogenetic methodology. The most frequent targets for this type of analysis have been extra-nuclear DNA i.e. chloroplast and mitochondrial genes [12–16] and the internal transcribed spacers (ITS) separating the tandem organized ribosomal genes in the 45S rDNA locus [17–19]. The prevalently uniparental mode of inheritance of the chloroplast and mitochondrion limits to some extent the usefulness of extra-nuclear sequences, and moreover, it has been established that this DNA tends to evolve more slowly than do the nuclear genes, which presents difficulties in employing it for phylogenetic purposes . Concerted evolution , a bias due to analyzing a single locus and hidden paralogy all militate against relying solely on ITS variation for molecular systematics and evolutionary analysis [22, 23].
Single and low copy nuclear gene sequences are thought to provide a higher level of discrimination than either extra-nuclear genes or ribosomal spacers [24–26]. The lower frequency of informative sites within these sequences can, however, prevent their use for the resolution of phylogeny both at lower taxonomic levels and among rapidly diversifying lineages. The greater resolving power of low copy nuclear sequence has been recently demonstrated in rice . Low copy nuclear genes also suffer less homoplasy than does ITS  and are seldom subjected to concerted evolution. Intronic sequence is particularly useful, since the level of selection pressure on its non-coding DNA is relaxed . The major drawback to the use of low copy sequence is the need to distinguish between paralogs and orthologs. As yet in the Musaceae family, however, all published sequence-based phylogenetic studies have targeted extra-nuclear and/or ribosomal DNA sequence.
The phylogeny of the Musaceae remains controversial. Typing via organellar and ribosomal DNA has been employed by Boonruangrod et al. [29, 30]. Li et al.  and Liu et al.  applied sequence analysis of ribosomal ITS coupled with the chloroplast gene evidence. More generally, evolutionary relationships within the monocotyledonous species [32–34] and in the Zingiberales in particular [35, 36], have produced date estimates for the divergence of the Musaceae (61-110 Mya) and the genus Musa (51 Mya). Based on a study of genome duplication, Paterson et al.  suggested that the divergence of Musa occurred 142 Mya, although this estimate was conceded to require further sequence information before it could be accepted. Clearly, a more robust picture of banana phylogeny and divergence time requires a systematic sampling of gene sequences distributed throughout the genome. Thus, we set out to clarify main frame of evolutionary relationships within the Musaceae, and to date the divergence of particular Musa sections, using a set of single or low copy nuclear gene sequences.
The sample of Musaceae species included representatives of Musella, Ensete and each of the four Musa sections (Table 1). Strelitzia nicolai Regel et Koern (family Strelitziaceae, order Zingiberales) was chosen to serve as an outgroup due to its relatively close relationship to the Musaceae family and the highest efficiency of amplification of selected gene markers. Sampling of additional outgroup species was abandoned after a series of preliminary tests, which revealed major difficulties with the amplification of selected genes (data not shown). In vitro rooted M. balbisiana 'PKW' plants were donated by François Côte (CIRAD, Guadeloupe, French West Indies) and Musella lasiocarpa plants were purchased from a commercial nursery. The other entries were obtained from International Transit Centre (ITC, Catholic University, Leuven, Belgium) in the form of in vitro rooted plants. All plant materials were maintained in a greenhouse after their transfer to soil. Leaf tissue of S. nicolai was provided by Dr. M. Dančák (Palacký University, Olomouc, Czech Republic). Genomic DNA was extracted from young leaf tissues using Invisorb® Spin Plant Mini kit (Invitek, Berlin, Germany), following the manufacturer's instructions.
Target gene selection and primer design
The gene sequences targeted for phylogenetic analysis were selected from the collection of banana ESTs deposited in GenBank as of March 30, 2009. The threefold basis for the choice of genes was that they were single copy, that their genomic locations spanned the entire genome and that they contained at least one intron. Genes belonging to the same gene family were avoided. These criteria were applied by reference to their rice orthologs, which were identified by BLAST analysis , using a threshold of e-10. To maximize dispersion across the banana genome, we chose genes whose rice orthologs mapped to different chromosome arms. Gene structure in banana was assumed to be identical to that in rice. Primers (see Additional File 1) were designed to amplify intron-spanning gene fragments in the panel of Musaceae species and S. nicolai, following Lessa . Primers which either failed to amplify or amplified multiple fragments from any one of the 13 Musaceae entries were discarded. The final set comprised 19 genes (Table 2), sampling each of the rice chromosome arms except the long arms of chromosomes 4, 5, 11 and 12, and the short arm of chromosome 12. Nine of the 19 primer pairs (Table 2) amplified successfully from S. nicolai template.
Gene fragment amplification, cloning and seqeuncing
A standard amplification protocol was applied to each of the 19 primer pairs. Each reaction contained 40 ng template, with the PCR program composed of an initial denaturation step (94°C/5 min), followed by 35 cycles of 94°C/30 s, 57°C/30 s and 72°C/35 s, and ending with an extension step of 72°C/10 min. Amplicons were treated with exonuclease/alkaline phosphatase (ExoSAP-IT®, USB, Cleveland, OH, USA) and then either sequenced directly, or first cloned into the TOPO vector (Invitrogen, Carlsbad, USA) before sequencing. Cycle sequencing was performed on three independent amplicons per gene target, using a BigDye® Terminator v3.1 Cycle Sequencing kit (Applied Biosystems, Foster City, USA), following the manufacturer's instructions. Sequencing reaction products were purified using a CleanSEQ kit (Agencourt Bioscience Corp., Beckman Coulter, Beverly, USA), and then separated on an ABI 3730Xl DNA analyzer (Applied Biosystems). All the resulting sequences have been deposited within GenBank [GenBank: HM118565-HM118820]. Raw sequence data were assembled and edited using DNA Baser v2 software . Consensus sequences were aligned by ClustalW  using default parameters, as implemented in the MEGA4 software package . Multiple DNA sequence alignments were inspected and any ambiguously aligned segments were removed prior to phylogenetic analysis.
Maximum likelihood (ML), maximum parsimony (MP) and Bayesian inference (BI) methods were applied to infer phylogenetic relationships. Sequence gaps were treated as missing data. Two datasets were considered - the first (dataset A) consisted of all 19 gene fragments across the 13 Musaceae entries, but not S. nicolai, and the second (dataset B) comprised nine gene fragments across all the entries. Exonic and intronic sequences were analyzed separately in a similar fashion. MP and ML analyses were performed using PAUP* v4.0b software . The most parsimonious tree for each dataset was found by a heuristic search of 1,000 random sequence-addition replicates by means of a tree-bisection-reconnection (TBR) branch swapping algorithm. The strict consensus tree was rooted by S. nicolai as an outgroup or, where no sequence was obtainable from this species, by E. ventricosum. Statistical support for individual nodes was estimated from 1,000 bootstrap replicates. The best model, as suggested by MrModeltest v2.3 software , based on the Akaike information criterion (AIC, see Table 3) was implemented in the ML and BI parameter settings for each target gene fragment, as well as for the full datasets. The ML-based optimal tree was derived from 100 simple sequence-addition replicates using TBR branch swapping, and bootstrap support values were calculated from 100 replicates. BI analysis was conducted in BEAST v1.4.8  using four independent Markov Chain Monte Carlo (MCMC) runs, starting from a randomly chosen topology, and run for 1,000,000 generations, with sampling every 1,000 generations. Logfile outputs were inspected in Tracer  software to confirm convergence. Treefiles from individual runs were combined by LogCombiner  software. The maximum clade credibility tree and corresponding posterior probabilities were calculated using TreeAnnotator  software, after removal of the 25% burn-in samples. The phylogenetic trees generated were graphically adjusted in FigTree v1.3.1 software .
Systematic bias and congruence testing
The incongruence length difference (ILD) test  (implemented in PAUP* v4.0b as the partition homogeneity test) was applied to estimate the level of potential incongruence in the data. The data set was partitioned into individual genes and analyzed under heuristic search with 1000 replicates. A χ2 test for base composition homogeneity across taxa was conducted in TREE-PUZZLE v5.2  software. The level of nucleotide substitution saturation was evaluated in DAMBE  software by plotting transitions and transversions against pairwise genetic distance. ML mapping using the quartet puzzling method  was applied to investigate whether the phylogenetic information content of the data was sufficient for inference purposes. ML mapping was also performed within TREE-PUZZLE v5.2 software with all possible quartets, applying the corresponding evolutionary model and exact model parameter estimation settings.
Dating of nodes
BEAST software v1.4.8 software was used to estimate the divergence times for the major Musaceae clades. This approach has the advantage of simultaneous estimation of substitution model parameters, topology, branch lengths and fossil-based date calibration, using the Bayesian inference and MCMC method. Calibration was based on the carbon dating of Ensete oregonense fossil seeds, given as 43 Mya according to Manchester and Kress . The analysis was conducted over four independent MCMC runs, each consisting of 1,000,000 generations under the relaxed clock model, with an uncorrelated lognormal distribution. The fossil calibration was set as the most recent common ancestor (tMRCA) parametric tree prior. The results were retrieved after combining the individual MCMC runs' tree files and the maximum clade credibility tree was constructed after the initial 25% burn-in generations were discarded.
Results and Discussion
Taxon and gene sampling
The amount of available sequence information for Musa species is confined at present and hence the development of low-copy gene markers for phylogenetic studies in this species has been laborious and time consuming. Despite this, we were able to develop 19 markers from gene regions. Only single- or low copy genes were selected with expected random distribution in the genome of Musa to make sure that unlinked loci are compared. As the genome sequence of Musa is not yet available, the selection of random distributed loci assumed colinearity with the rice genome [52, 53].
The 19 gene-based markers [GenBank: HM118565-HM118820] developed and used in the present study represent until now by far the largest set of gene markers ever used in the Musaceae. Ideally, a phylogenetic study should comprise all taxa and a high number of unlinked DNA markers. However, from practical reasons these numbers are reduced and, in fact, may not be necessary. While some authors argue that incomplete taxon sampling has a negative impact on the phylogenetic accuracy [54, 55], other authors do not support this view and prefer increasing the number of nucleotide characters sampled over the number of taxa in order to reveal the correct phylogeny without a major distortion of accuracy of the main evolution relationships [56–58]. Here, we favored the latter approach with partial taxon sampling of representatives [stratified sampling; ], rather than analyzing a few genomic loci on a large set of species. However, if felt necessary, the marker set developed in this work can be easily applied in other species and subspecies of Musaceae.
Sequence data characterization and systematic bias testing
The 19 gene fragments covered a length of 16,012 bp, of which 26.9% was exonic. The genic sequences were treated independently as a single-gene data and in two matrixed-modes according to the ability to amplify the genes from the outgroup species S. nicolai (see Table 2 for details); namely the dataset A (containing all 19 gene sequences from 13 genotypes, excl. S. nicolai) and the dataset B (containing sequences of 9 genes from all 14 genotypes, incl. the outgroup species S. nicolai). Dataset A (all 19 fragments from the 13 Musaceae entries) was based on 16,012 bp of sequence, of which 1,056 bases were informative, while dataset B (nine gene fragments from the Musaceae entries plus S. nicolai) was based on 7,404 bp of sequence, which included 492 informative sites. The χ2 test used to detect heterogeneity in base composition indicates that there was no significant variation in the AT/GC content among species for individual genes (P = 0.382-1.000). The overall reduced proportion of GC in most of the sequences (see Table 2) may be an artifact of the deliberate maximization of intronic sequence in the sample, since plant intronic sequence has an AT bias . The GC content of the intronic fraction was 34.6%, compared to 45.0% in the exonic fraction.
Nucleotide sequences are considered to be phylogenetically informative until they reach the substitution saturation. At this point, it is no longer possible to deduce whether an observed similarity between a pair of sequences results from their common ancestry or whether this has occurred by chance . To avoid the inclusion of non-informative sequence, the level of substitution saturation was evaluated by plotting transitions and transversions against the genetic distance for both datasets A and B, as well as for the exonic and intronic sequence separately. This procedure showed that the frequency of both transitions and transversions increased linearly along with divergence (Figure 1) with transitions outnumbering transversions. This indicates that the saturation plateau was not reached, and the data still retained sufficient phylogenetic signal.
The constancy of the evolutionary rate was verified using a relative rate test, which revealed some heterogeneity in the sequences (data not shown). However, after a re-analysis based on RY-coded (purines/pyrimidines) sequence, which ignores transitions by focusing on the slower evolving transversions , the topologies generated were similar to those obtained from the full nucleotide sequence data. This implied that the rate heterogeneity was not large enough to significantly bias the deduced phylogenies.
Phylogenetic reconstruction based on individual gene fragments
The reconstruction of phylogenetic relationships between the selected taxa representing the Musaceae family was performed by two different criterion-based methods (maximum parsimony; MP and maximum likelihood; ML) and by a third complementary approach based on the Bayesian inference method (BI). Data were first executed in MrModeltest v.2.3  in order to select the most appropriate model of evolution to be used for phylogenetic analyses. The Akaike Information Criterion was chosen  to be implemented in maximum likelihood and Bayesian analysis, as it was reported to have preferable performance in model selection compared to likelihood ratio tests . The evolutionary models selected for the phylogenetic reconstruction are detailed in Table 3. The MP analysis based on the individual gene fragment sequences produced more than one most parsimonious tree for eight of the 19 sequences (Additional File 2). In 15 of the 19 phylogenies there were unresolved polytomies. Clades I (Eumusa + Rhodochlamys) and II (Australimusa + Callimusa) were fully recovered (Figure 2), except for gene fragment g-4, the sequence of which comprised one of the shortest intron sequences and the lowest proportion of phylogenetically informative positions. A similar result was obtained by ML analysis, in which partially resolved phylogenies applied to 15 of the 19 sequences, with an altered topology appearing within either clade I or II for gene fragments g-5, g-12, g-17 and g-19 (Additional File 2).
The BI analysis generated fully resolved phylogenies, albeit with topology alterations within clades I and II. The level of internal resolution within clades I and II varied according to the phylogenetic informativeness of the sequences. Unresolved relationships emerged within both clades I (between M. acuminata, M. mannii and M. ornata), and II (between M. textilis/M. maclayi/Fe'i and M. beccarii/M. coccinea). When the phylogenetic content of the sequences was evaluated by the likelihood-mapping approach, it was clear that each of the single gene fragment-based phylogenies contained a significant fraction of unresolved quartets (Table 4), showing that a single sequence is insufficient for making inference regarding evolutionary relationships. However, for both of the combined datasets A and B, there were no unresolved or partially resolved quartets and thus we investigated a possibility of combining individual gene data into a single data set for the phylogenetic reconstruction.
Based on the ILD analysis, the individual gene fragment partitions were highly incongruent (P < 0.001) and thus not directly combinable. However, it has been suggested that the ILD test should not be used as an exclusive measure of data partition combinability , as it is known to be susceptible to both types I [false positives; ] and II [false negatives; ] error. When Rokas et al.  combined sequence data derived from a set of different genes, conflicting signals from individual gene sequences were resolved and the resulting phylogeny was strongly supported. The joint use of a set of gene sequences for phylogenetic inference depends largely on nucleotide composition bias and substitution saturation . Since the χ2 test applied to the Musaceae sequence data indicated the absence of any base composition bias, and substitution saturation of the aligned sequences could be excluded (Figure 1), the combined set of gene fragment sequences was then used for phylogenetic reconstruction.
Phylogenetic reconstruction based on the combined sequence data
MP analysis of dataset A yielded a single fully resolved most parsimonious tree (length = 2333; CI = 0.8678 excluding non-informative characters; RI = 0.9337; RC = 0.8648) with significantly high level of bootstrap support for each of the individual branches (Figure 2). The internal branches among the M. acuminata accessions and the Rhodochlamys species, as well as within the Australimusa/Callimusa clade were dichotomous. The ML analysis supported an identical tree topology with high bootstrap support values. Although the BI analysis also produced a fully resolved tree with a high posterior probability for all nodes (Additional File 3), the monophyly of Ensete and Musella at the genus level was not supported. Due to the lack of an outgroup for dataset A, E. ventricosum was used as a surrogate, a choice which probably accounted for the MP and ML-based phylogenies. The fact that these phylogenies were likely artefactual was confirmed by the use of the midpoint rooting method, which generated the same topology as emerged from the BI analysis and from dataset B (see below).
The MP analysis of dataset B also produced a single most parsimonious tree (length = 2253; CI = 0.7536 excluding non-informative characters; RI = 0.8483; RC = 0.7779) with high bootstrap support for all nodes. The same topology was supported by both the ML and BI analyses (Figure 3), and was the same as emerged from the BI analysis of dataset A (Additional File 3). A similar phylogeny was suggested when the individual gene fragments were analyzed separately with the S. nicolai sequence as the outgroup (Additional File 2). Thus the choice of outgroup was clearly responsible for the conflicting phylogenies. Various Zingiberales (Strelitziaceae, Heliconiaceae, Zingiberaceae) species have been selected as outgroups in other taxonomic studies of the Musaceae [8, 31, 69, 70], and some of these have questioned the position of Musella as a separate genus. Nevertheless, the evolutionary relationships within Musa (clades I + II, Figure 2 and 3) were not affected in either dataset by the choice of either outgroup or rooting method.
In order to assess how much phylogenetic information was contributed by the coding and non-coding fractions, the exonic and intronic sequences were analyzed separately. This was possible given that substitution saturation was not reached in either partition (Figure 1). As expected, the intronic sequence outnumbered the exonic, both in terms of the frequency of variable bases (15.2% vs 7.1%) and of parsimony informativeness (7.9% vs 3.3%). The phylogenies reconstructed by ML, MP and BI analysis consisted of a single tree with strong statistical branch support. The trees' topology was identical to that of combined dataset. Thus, the inclusion of non-coding sequence did not introduce erroneous phylogenetic signals, but rather enhanced the robustness of the phylogenetic reconstruction.
Taxonomic implications of the sequence-based phylogeny
The final topology (Figure 3) confirmed the Musaceae family in general, and the Musa genus in particular, to be monophyletic. The monotypic genus Musella appeared as a sister species to the E. ventricosum. The validity of Musella as a genus has been questioned in previous studies and a merger between Musella and Ensete species has been suggested . On the contrary, the recent study of Li et al.  based on ITS and chloroplast loci did not come to a similar definite conclusion and underlined a need for sampling more molecular markers in order to provide the answer. Although more representatives of both of the genera would be necessary to elucidate this issue, the large set of phylogenetic markers presented here provides an excellent tool for addressing this question in future studies.
For many years, Musa has been divided into four sections, on the basis of morphological descriptors and basic chromosome number . However, it is important to quote Cheesman's flexible view: "The groups have deliberately been called sections rather than subgenera in an attempt to avoid the implication that they are of equal rank. I am inclined to regard the division between Eumusa and Rhodochlamys as unessential, though it is convenient to maintain as long as it remains as well marked in the field as it is at present. On the other hand the seed of Callimusa almost justifies its segregation as a distinct genus, and would do so were not Australimusa intermediate in some characters between it and Eumusa" . Recently, several DNA sequence-based analyses have indeed questioned the validity of some of the four sections. In particular, Eumusa and Rhodochlamys representatives have been in some cases demonstrated to be more closely related to one another than to their sectional relatives, as was shown for some Australimusa and Callimusa species [6, 7, 9, 10].
The present data indicate a close relationship between the species of Rhodochlamys and M. acuminata (Eumusa). The position of M. ornata within the A-genome group of Eumusa section (Figure 3) agrees with the findings of other authors [7, 10, 31, 70], and indicates that Rhodochlamys and Eumusa are not reciprocally monophyletic. Various Eumusa × Rhodochlamys hybrids have been observed, and are likely to be numerous in the monsoon region of SE Asia . Although the current molecular data in relation to the morphological observation indicate that the claims for merging of Rhodochlamys and Eumusa [6, 8, 10] were justified, final resolution of this issue will require a better representation of species within both sections. The new set of phylogenetic markers developed in this study can be applied easily in future to analyze in detail phylogenetic relationships between and within Musaceae taxa.
In contrast to the clustering of M. balbisiana with M. textilis (section Australimusa), as reported by Liu et al. , the present data identified a clearly separated group of M. balbisiana entries within clade I, suggesting that this species is phylogenetically quite distinct from other Eumusa species. The distance between M. acuminata and M. balbisiana appears to be greater than between it and the Rhodochlamys species (Figure 3), as has also been noted by others [8, 11, 31]; these relationships are consistent with conclusions based on cytogenetic and hybridization studies [72, 73]. The clear separation between M. balbisiana and M. acuminata is particularly interesting given that almost all varieties of edible (polyploid) banana are thought to have evolved from natural hybrids between these two species .
Based on the gene fragment sequences, M. textilis fell, as expected, into the Australimusa section within Clade II (Figure 3), which also includes the Callimusa species. The two representatives of the section Callimusa included in this study differ in the basic chromosome number (Table 1), reflecting the noted controversy of Callimusa as a natural section [9, 10, 74]. M. beccarii and M. coccinea did not form a strictly separated Callimusa cluster; instead, their close relationship to Australimusa species was apparent (Figure 3). The only representative of Fe'i bananas (parthenocarpic edible types distributed throughout Pacific islands) in this study appears to be most closely related to M. maclayi, in line with Simmonds , who considered M. maclayi to be a wild progenitor of the Fe'i banana.
Estimation of time of divergence
The reconstructed phylogeny emerging from dataset A was used to estimate the times of divergence of the major Musaceae clades (Table 5). When the dating was solely constrained by the minimum age of the Ensete fossil record, the crown node age of the Musaceae family could be placed in the early Paleocene (69.1 Mya), consistent with the rapid radiation of the Zingiberales in the early Tertiary . A better supported estimate of this time required the inclusion of a relevant outgroup-calibration point within the dataset B, but the lack of fossil records forced us to use of an external calibration point. Estimates for the age of the Zingiberales vary between 88 and 124 Mya [32–34, 36]. Here we have adopted the most distant of these dates for the age of the most recent common ancestor of the Musaceae and Strelitziaceae. When this two-point calibration (Ensete fossil record and the external calibration with the Zingiberales age) was applied to dataset B, which included the outgroup species, the divergence time of the Musaceae was placed at 61.5 Mya. Estimates of divergence date from both, dataset A and B lie within the Musaceae crown-stem age interval of 61-87 Mya made by Janssen and Bremer . Thus, the estimates (Table 5) emerging from the dataset A comprising nearly doubled amount of phylogenetic information, were considered strongly supported. Based on this data, the rapid diversification of the Zingiberales probably occurred at the Cretaceous/Tertiary boundary (> 65 Mya).
Despite the fact that the estimated age of the Musaceae family (69 Mya) is much younger than the 110 Mya postulated by Kress and Specht , the two estimates for the age of the Musa genus (50.7 Mya and 51.4 Mya) are indistinguishable. As the Musaceae are over-represented in our sample (as compared to other Zingiberales families), the current estimate probably represents a minimal age for the radiation of the Musaceae. The present data can be used to date the speciation events within both Australimusa/Callimusa and Rhodochlamys/Eumusa to some 28 Mya (Figure 3, nodes C, D). Within the Clade I, the B genome lineage (M. balbisiana species) was the first to diverge, followed by the M. mannii lineage, representing the Rhodochlamys section, at 20 Mya. Speciation within the A genome lineage (M. acuminata species) began 11.4 Mya. The minimum age of M. ornata, which appears to belong to the A genome group within Eumusa section, is estimated to be 8.7 Mya (Figure 3; node I).
Although M. mannii is an "imperfectly understood small species up to 1.3 m high with purplish-red bracts that do not curl back" , it undoubtedly belongs to section Rhodochlamys, which is confined to the monsoon-affected areas of Southeast Asia. Its characteristic dry-season die-back is presumably an adaptation to drought, and contrasts with the behavior of the Eumusa species endemic to the same geographical region, which survive the dry season, although often in poor condition . The monsoon regime was established following the formation of the Himalayas and the Tibetan plateau, and is thought to have stabilized in its current form around 20-25 Mya . The estimated divergence date of M. mannii (20 Mya, Table 5) could therefore reflect an adaptation to climate change. The later divergence time of the other Rhodochlamys member, M. ornata, could be explained by its probable derivation from a hybrid between M. velutina (section Rhodochlamys) and M. flaviflora, belonging to a taxon intermediate between Rhodochlamys and Eumusa .
The speciation of the Callimusa species can be dated between 8.8 and 28.7 Mya, while the divergence of the Australimusa species occurred ~5 Mya (Figure 3, nodes H, J). The relatively recent emergence of the section Australimusa is consistent with its perception as an evolutionarily rather young group . Shepherd  determined that the "species" within this section behave genetically as a single species, which he therefore designated Musa textilis Née. The current phylogeny (Figure 3) supports this view, implying that M. textilis could well be the founding species of the entire section. Numerical taxonomy has placed M. textilis equidistant from the four Musa sections . In this context it is worth noting that robust and sterile diploid hybrids ('Canton') between M. textilis (x = 10) and 'Pacol' (a form of M. balbisiana, x = 11) are common in The Philippines.
The divergence of M. coccinea appears to be rather older than that of the members of the Australimusa section (Table 5). Unsuccessful attempts to cross two Callimusa species M. coccinea and M. borneensis led Shepherd  to suggest that they differentiated from one another long before the evolution of the Australimusa species. The seed morphology of Callimusa species is very different from that of any of the other Musa sections, being cylindrical, barrel- or top-shaped, and marked externally by a transverse line or groove. When ripe, they develop a large, empty chalazal (perisperm) chamber above the groove [10, 77]. Although the molecular data alone indicate the paraphyletic position Callimusa to Australimusa entries (Figure 3), given the above mentioned morphological aspects and the flexibility of the term "section" by Cheesman  we believe that merging the two Musa sections with x = 10, as proposed by Wong et al.  and indicated by Li et al. , is not tenable.
The gene sequence-based phylogeny presented here provides a substantial insight into the course of speciation within the Musaceae. The data tend to sustain the close relationship of Rhodochlamys and Eumusa species, supporting the possibility of merging the two sections into a single one. A greater number of species sampled could generate an improved classification, and could help in clarifying the relationship between the species Rhodochlamys and M. acuminata, as well as to confirm the generic status of Musella and Ensete. Based on the largest amount of nucleotide characters for Musaceae obtained to date, this study provides the first estimates of divergence times for individual Musa sections and genome groups within the Musaceae. Although limited by the number of species sampled from individual sections and subgroups, we provide a plausible reconstruction of speciation events within the Musaceae, a family which has given rise to one of mankind's major crops.
Angiosperm Phylogeny Group: An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG III. Bot J Linn Soc. 2009, 161: 105-121. 10.1111/j.1095-8339.2009.00996.x.
Cheesman EE: Classification of the bananas II. The genus Musa L. Kew Bulletin. 1947, 2: 106-117. 10.2307/4109207.
Simmonds NW, Shepherd K: The taxonomy and origins of cultivated bananas. Bot J Linn Soc. 1955, 55: 302-312. 10.1111/j.1095-8339.1955.tb00015.x.
Argent GCG: The wild bananas of Papua New Guinea. Notes R Bot Gard Edinburgh. 1976, 35: 77-114.
Gawel NJ, Jarret RL, Whittemore AP: Restriction fragment length polymorphism (RFLP)-based phylogenetic analysis of Musa. Theor Appl Genet. 1992, 84: 286-290. 10.1007/BF00229484.
Bartoš J, Alkhimova O, Doleželová M, De Langhe E, Doležel J: Nuclear genome size and genomic distribution of ribosomal DNA in Musa and Ensete (Musaceae): taxonomic implications. Cytogenet Genome Res. 2005, 109: 50-57.
Li LF, Häkkinen M, Yuan YM, Hao G, Ge XJ: Molecular phylogeny and systematics of the banana family (Musaceae) inferred from multiple nuclear and chloroplast DNA fragments, with a special reference to the genus Musa. Mol Phylogenet Evol. 2010, 57: 1-10. 10.1016/j.ympev.2010.06.021.
Jarret RL, Gawel NJ: Molecular markers, genetic diversity and systematics. Bananas and plantains. Edited by: Gowen S. 1995, London: Chapman and Hall, 67-83.
Wong C, Kiew R, Argent G, Set O, Lee SK, Gan YY: Assessment of the validity of the sections in Musa (Musaceae) using AFLP. Ann Bot. 2002, 90: 231-238. 10.1093/aob/mcf170.
Ude G, Pillay M, Nwakanma D, Tenkouano A: Analysis of genetic diversity and sectional relationships in Musa using AFLP markers. Theor Appl Genet. 2002, 104: 1239-1245. 10.1007/s00122-001-0802-3.
Olmstead RG, Palmer JD: A chloroplast DNA phylogeny of the Solanaceae: Subfamilial relationships and character evolution. Ann Missouri Bot Gard. 1992, 79: 346-360. 10.2307/2399773.
Doyle JJ, Doyle JL, Ballenger JA, Dickson EE, Kajita T, Ohashi H: A phylogeny of the chloroplast gene rbcL in the Leguminosae: Taxonomic correlations and insights into the evolution of nodulation. Am J Bot. 1997, 84: 541-554. 10.2307/2446030.
Beckert S, Steinhauser S, Muhle H, Knoop V: A molecular phylogeny of bryophytes based on nucleotide sequence of the mitochondrial nad5 gene. Plant Sys Evol. 1999, 218: 179-192. 10.1007/BF01089226.
Graham SW, Olmstead RG: Utility of 17 chloroplast genes for inferring the phylogeny of the basal angiosperms. Am J Bot. 2000, 87: 1712-1730. 10.2307/2656749.
Swangpol S, Volkaert H, Sotto RC, Seelanan T: Utility of selected non-coding chloroplast DNA sequences for lineage assessment of Musa interspecific hybrids. J Biochem Mol Biol. 2007, 40: 577-587.
Baldwin BG: Phylogenetic utility of the internal transcribed spacers of nuclear ribosomal DNA in plants: An example from the Compositae. Mol Phylogenet Evol. 1992, 1: 3-16. 10.1016/1055-7903(92)90030-K.
Compton JA, Culham A, Gibbings JG, Jury SL: Phylogeny of Actaea including Cimicifuga (Ranunculaceae) inferred from nrDNA ITS sequence variation. Biochem Syst Ecol. 1998, 26: 185-197. 10.1016/S0305-1978(97)00102-6.
Kress WJ, Prince LM, Williams KJ: The phylogeny and a new classification of the gingers (Zingiberaceae): Evidence from molecular data. Am J Bot. 2002, 89: 1682-1696. 10.3732/ajb.89.10.1682.
Eyre-Walker A, Gaut BS: Correlated rates of synonymous site evolution across plant genomes. Mol Biol Evol. 1997, 14: 455-460.
Dover G: Concerted evolution, molecular drive and natural selection. Curr Biol. 1994, 4: 1165-1166. 10.1016/S0960-9822(00)00265-7.
Álvarez I, Wendel JF: Ribosomal ITS sequences and plant phylogenetic inference. Mol Phylogenet Evol. 2003, 29: 417-434.
Feliner GN, Roselló JA: Better the devil we know? Guidelines for insightful utilization of nrDNA ITS species-level evolutionary studies in plants. Mol Phylogenet Evol. 2007, 44: 911-919. 10.1016/j.ympev.2007.01.013.
Small RL, Ryburn JA, Cronn RC, Seelanan T, Wendel JF: The tortoise and the hare: Choosing between noncoding plastome and nuclear ADH sequences for phylogeny reconstruction in a recently diverged plant group. Am J Bot. 1998, 85: 1301-1315. 10.2307/2446640.
Bailey CD, Doyle JJ: Potential phylogenetic utility of the low-copy nuclear gene pistillata in dicotyledonous plants: Comparison to nrDNA ITS and trnL intron in Sphaerocardamum and other Brassicaceae. Mol Phylogenet Evol. 1999, 13: 20-30. 10.1006/mpev.1999.0627.
Schulte K, Barfuss MHJ, Zizka G: Phylogeny of Bromelioideae (Bromeliaceae) inferred from nuclear and plastid DNA loci reveals the evolution of the tank habit within the subfamily. Mol Phylogenet Evol. 2009, 51: 327-339. 10.1016/j.ympev.2009.02.003.
Zou XH, Zhang FM, Zhang JG, Zang LL, Tang L, Wang J, Sang T, Ge S: Analysis of 142 genes resolves the rapid diversification of the rice genus. Genome Biol. 2008, 9: R49-10.1186/gb-2008-9-3-r49.
Whittall JB, Medina-Marino A, Zimmer EA, Hodges SA: Generating single-copy nuclear gene data for a recent adaptive radiation. Mol Phylogenet Evol. 2006, 39: 124-134. 10.1016/j.ympev.2005.10.010.
Boonruangrod R, Desai D, Fluch S, Berenyi M, Burg K: Identification of cytoplasmic ancestor gene-pools of Musa acuminata Colla and Musa balbisiana Colla and their hybrids by chloroplast and mitochondrial haplotyping. Theor Appl Genet. 2008, 118: 43-55. 10.1007/s00122-008-0875-3.
Boonruangrod R, Fluch S, Burg K: Elucidation of origin of the present day hybrid banana cultivars using the 5'ETS rDNA sequence information. Mol Breeding. 2009, 24: 77-91. 10.1007/s11032-009-9273-z.
Liu AZ, Kress WJ, Li DZ: Phylogenetic analyses of the banana family (Musaceae) based on nuclear ribosomal (ITS) and chloroplast (trnL-F) evidence. Taxon. 2010, 59: 20-28.
Bremer K: Early Cretaceous lineages of monocot flowering plants. Proc Natl Acad Sci USA. 2000, 97: 4707-4711. 10.1073/pnas.080421597.
Janssen T, Bremer K: The age of major monocot groups inferred from 800+ rbcL sequences. Bot J Linn Soc. 2004, 146: 385-398. 10.1111/j.1095-8339.2004.00345.x.
Anderson CL, Janssen T: Monocots. The timetree of life. Edited by: Hedges SB, Kumar S. 2009, New York: Oxford University Press, 203-212.
Kress WJ, Prince LM, Hahn WJ, Zimmer E: Unraveling the evolutionary radiation of the families of the Zingiberales using morphological and molecular evidence. Syst Biol. 2001, 50: 926-944. 10.1080/106351501753462885.
Kress WJ, Specht CD: The evolutionary and biogeographic origin and diversification of the tropical monocot order Zingiberales. Aliso. 2006, 22: 619-630.
Paterson AH, Bowers JE, Chapman BA: Ancient polyploidization predating the divergence of the cereals, and its consequences for comparative genomics. Proc Natl Acad Sci USA. 2004, 101: 9903-9908. 10.1073/pnas.0307901101.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215: 403-410.
Lessa EP: Rapid surveying of DNA sequence variation in natural populations. Mol Biol Evol. 1992, 9: 323-330.
DNA Baser sequence assembly software. [http://www.dnabaser.com/]
Thompson JD, Higgins DG, Gibson TJ: Clustal W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22: 4673-4680. 10.1093/nar/22.22.4673.
Tamura K, Dudley J, Nei M, Kumar S: MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol Biol Evol. 2007, 24: 1596-1599. 10.1093/molbev/msm092.
Swofford DL: PAUP*, Phylogenetic Analysis Using Parsimony (*and Other Methods) v4.0b10. 2003, Sunderland: Sinauer Associates
Nylander JAA: MrModeltest v2. Program distributed by the author. 2004, Uppsala: Evolutionary Biology Centre
Drummond AJ, Rambaut A: BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol. 2007, 7: 214-10.1186/1471-2148-7-214.
Molecular evolution, phylogenetics and epidemiology software. [http://tree.bio.ed.ac.uk/software/figtree/]
Farris JS, Källersjö M, Kluge AG, Bult C: Testing significance of incongruence. Cladistics. 1994, 10: 315-319. 10.1111/j.1096-0031.1994.tb00181.x.
Schmidt HA, Strimmer K, Vingron M, von Haeseler A: TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics. 2002, 18: 502-504. 10.1093/bioinformatics/18.3.502.
Xia X, Xie Z: DAMBE: Software package for data analysis in molecular biology and evolution. J Hered. 2001, 92: 371-373. 10.1093/jhered/92.4.371.
Strimmer K, Von Haeseler A: Likelihood mapping: A simple method to visualize phylogenetic content of a sequence alignment. Proc Natl Acad Sci USA. 1997, 94: 6815-6819. 10.1073/pnas.94.13.6815.
Manchester SR, Kress WJ: Fossil bananas (Musaceae): Ensete oregonense sp. nov. from the Eocene of western North America and its phytogeographic significance. Am J Bot. 1993, 80: 1264-1272. 10.2307/2445709.
Cheung F, Town CD: A BAC end view of the Musa acuminata genome. BMC Plant Biol. 2007, 7: 29-10.1186/1471-2229-7-29.
Lescot M, Piffanelli P, Ciampi AY, Ruiz M, Blanc G, Leebens-Mack J, Da Silva FR, Santos CMR, D'Hont A, Garsmeur O, Vilarinhos AD, Kanamori H, Matsumoto T, Ronning CM, Cheung F, Haas BJ, Althoff R, Arbogast T, Hine E, Pappas GJ, Sasaki T, Souza MT, Miller RNG, Glaszmann JC, Town CD: Insights into the Musa genome: Syntenic relationships to rice and between Musa species. BMC Genomics. 2008, 9: 58-10.1186/1471-2164-9-58.
Zwickl DJ, Hillis DM: Increased taxon sampling greatly reduces phylogenetic error. Syst Biol. 2002, 51: 588-598. 10.1080/10635150290102339.
Hillis DM, Pollock DD, McGuire JA, Zwickl DJ: Is sparse taxon sampling a problem for phylogenetic inference?. Syst Biol. 2003, 52: 124-126. 10.1080/10635150390132911.
Poe S, Swofford DL: Taxon sampling revisited. Nature. 1999, 398: 299-300. 10.1038/18592.
Rosenberg MS, Kumar S: Incomplete taxon sampling is not a problem for phylogenetic inference. Proc Natl Acad Sci USA. 2001, 98: 10751-10756. 10.1073/pnas.191248498.
Rosenberg MS, Kumar S: Taxon sampling, bioinformatics and phylogenomics. Syst Biol. 2003, 52: 119-124. 10.1080/10635150390132894.
Hillis DM: Taxonomic sampling, phylogenetic accuracy, and investigator bias. Syst Biol. 1998, 47: 3-8. 10.1080/106351598260987.
Lorković ZJ, Wieczorek DA, Lambermon MHL, Filipowicz W: Pre-mRNA splicing in higher plants. Trends Plant Sci. 2000, 5: 160-167.
Jeffroy O, Brinkmann H, Delsuc F, Philippe H: Phylogenomics: the beginning of incongruence?. Trends Genet. 2006, 22: 225-231. 10.1016/j.tig.2006.02.003.
Phillips MJ, Delsuc F, Penny D: Genome-scale phylogeny and the detection of systematic biases. Mol Biol Evol. 2004, 21: 1455-1458. 10.1093/molbev/msh137.
Akaike H: A new look at the statistical model identification. IEEE Trans Autom Control. 1974, 19: 716-723. 10.1109/TAC.1974.1100705.
Posada D, Buckley TR: Model selection and model averaging in phylogenetics: Advantages of Akaike information criterion and Bayesian approaches over likelihood ratio tests. Syst Biol. 2004, 53: 793-808. 10.1080/10635150490522304.
Yoder AD, Irwin JA, Payseur BA: Failure of the ILD to determine data combinability for slow loris phylogeny. Syst Biol. 2001, 50: 408-424. 10.1080/106351501300318003.
Planet PJ: Tree disagreement: Measuring and testing incongruence in phylogenies. J Biomed Inform. 2006, 39: 86-102. 10.1016/j.jbi.2005.08.008.
Ramírez MJ: Further problems with the incongruence length difference test: ''hypercongruence'' effect and multiple comparisons. Cladistics. 2006, 22: 289-295.
Rokas A, Williams BL, King N, Carroll SB: Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature. 2003, 425: 798-804. 10.1038/nature02053.
Ruangsuttapha S, Eimert K, Schröder MB, Silayoi B, Denduangboripant J, Kanchanapoom K: Molecular phylogeny of banana cultivars from Thailand based on HAT-RAPD markers. Genet Resour Crop Evol. 2007, 54: 1565-1572. 10.1007/s10722-006-9169-2.
Hřibová E, Čížková J, Christelová P, Taudien S, De Langhe E, Doležel J: The ITS1-5.8S-ITS2 sequence region in the Musaceae: structure, diversity and use in molecular phylogeny. Plos One. 6: e17863-
Simmonds NW: Botanical results of the banana collecting expeditions, 1954-5. Kew Bulletin. 1956, 11: 463-489. 10.2307/4109131.
Simmonds NW: Isolation in Musa, sections Eumusa and Rhodochlamys. Evolution. 1954, 8: 65-74. 10.2307/2405666.
Shepherd K: Cytogenetics of the genus Musa. 1999, Montpellier: INIBAP
Wong C, Kiew R, Ohn S, Lamb A, Lee SK, Gan LH, Gan YY: Sectional placement of three Bornean species of Musa (Musaceae) based on AFLP. Gardens' Bulletin Singapore. 2001, 53: 327-341.
Argent GCG: Musaceae. The European Garden Flora. Volume II. Monocotyledons (Part 2): Juncaceae to Orchidaceae. Edited by: Walters SM, Brady A, Brickell CD, Cullen J, Green PS, Lewis J, Matthews VA, Webb, DA, Yeo PF, Alexander JCM. 1984, New York: Cambridge University Press, 117-119.
Harris N: The elevation history of the Tibetan Plateau and its implications for the Asian monsoon. Palaeogeogr Palaeoclimatol Palaeoecol. 2006, 241: 4-15. 10.1016/j.palaeo.2006.07.009.
Simmonds NW: The evolution of the bananas. 1962, London: Longman
De Langhe E: Diversity in the genus Musa: its significance and its potential. Acta Hort. 2000, 540: 81-88.
Shepherd K: Observations on Musa taxonomy. Identification of genetic diversity in the genus Musa: Proceedings of an international workshop held at los Baños, Philippines, 5-10 September 1988. Edited by: Jarret RL. 1990, Montpellier: INIBAP, 158-165.
Tamura K, Nei M: Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol Biol Evol. 1993, 10: 512-526.
We thank our colleagues Marie Seifertová, MSc. and Ms. Radka Tušková for their excellent technical assistance. We are grateful to Ir. Ines Van den Houwe for providing much of the plant material, François Côte for the gift of in vitro rooted plants of M. balbisiana 'PKW' and Dr. Martin Dančák for supplying leaves of S. nicolai. This research was jointly supported by the Academy of Sciences of the Czech Republic (grant award IAA600380703), Internal Grant Agency of Palacký University (grant award no. Prf-2010-001) and by the Ministry of Education, Youth and Sports of the Czech Republic and the European Regional Development Fund (Operational Programme Research and Development for Innovations No. CZ.1.05/2.1.00/01.0007).
PC carried out the experimental work, participated in the sequence alignment and phylogenetic analysis, and drafted the manuscript. MV participated in the design and coordination of the study, in the phylogenetic analysis and helped to compile the manuscript. EH participated in the sequence alignment and phylogenetic analysis. JD participated in the design and coordination of the study. EDL helped with the taxonomic expertise. JD and EDL revised manuscript critically for important intellectual content. All authors read and approved the final manuscript.
Electronic supplementary material
About this article
Cite this article
Christelová, P., Valárik, M., Hřibová, E. et al. A multi gene sequence-based phylogeny of the Musaceae (banana) family. BMC Evol Biol 11, 103 (2011). https://doi.org/10.1186/1471-2148-11-103