Whole genome based insights into the phylogeny and evolution of the Juglandaceae

Background The walnut family (Juglandaceae) contains commercially important woody trees commonly called walnut, wingnut, pecan and hickory. Phylogenetic relationships and diversification within the Juglandaceae are classic and hot scientific topics that have been elucidated by recent fossil, morphological, molecular, and (paleo) environmental data. Further resolution of relationships among and within genera is still needed and can be achieved by analysis of the variation of chloroplast, mtDNA, and nuclear genomes. Results We reconstructed the backbone phylogenetic relationships of Juglandaceae using organelle and nuclear genome data from 27 species. The divergence time of Juglandaceae was estimated to be 78.7 Mya. The major lineages diversified in warm and dry habitats during the mid-Paleocene and early Eocene. The plastid, mitochondrial, and nuclear phylogenetic analyses all revealed three subfamilies, i.e., Juglandoideae, Engelhardioideae, Rhoipteleoideae. Five genera of Juglandoideae were strongly supported. Juglandaceae were estimated to have originated during the late Cretaceous, while Juglandoideae were estimated to have originated during the Paleocene, with evidence for rapid diversification events during several glacial and geological periods. The phylogenetic analyses of organelle sequences and nuclear genome yielded highly supported incongruence positions for J. cinerea, J. hopeiensis, and Platycarya strobilacea. Winged fruit were the ancestral condition in the Juglandoideae, but adaptation to novel dispersal and regeneration regimes after the Cretaceous-Paleogene boundary led to the independent evolution of zoochory among several genera of the Juglandaceae. Conclusions A fully resolved, strongly supported, time-calibrated phylogenetic tree of Juglandaceae can provide an important framework for studying classification, diversification, biogeography, and comparative genomics of plant lineages. Our addition of new, annotated whole chloroplast genomic sequences and identification of their variability informs the study of their evolution in walnuts (Juglandaceae). Supplementary Information The online version contains supplementary material available at 10.1186/s12862-021-01917-3.

recognition within Carya might be appropriate for this taxon, but they also found that it shares a number of characteristics with the walnuts (genus Juglans).
The evolution of the Juglandaceae remains a di cult problem too; hypothesized to have both ancient and recent extinctions and radiations [21,[49][50], it is species poor. The species that remain, however, are highly divergent in their ecology (wind versus animal-dispersed fruit) [30,43], and ower development [22,40].
The primary goal of this study was to increase the resolution of the molecular phylogeny of the Juglandaceae by maximizing the number of taxa sampled and the number of genetic markers [22,27,30]. We selected 27 Juglandaceae taxa, slightly more than half of the ~ 50 recognized species from three subfamilies (Engelhardioideae, Juglandoideae, and Rhoipteleoideae), and from seven of the nine worldwide genera, making this the most comprehensive study to date. We used sequence data from matrilineally (chloroplast genomes and mitochondrial protein-coding genes) and biparentally (whole genome re-sequencing of nuclear genome SNPs) inherited DNA to illuminate the evolutionary history of the Juglandaceae. We also reanalyzed phylogenetic relationships of 55 species using ITS (Internal transcribed spacers) sequences. Our goal was to 1) reconstruct the phylogenetic relationships of the family Juglandaceae based on whole chloroplast genomes, whole genome resequencing of nuclear genome SNPs (nrSNPs), ITS, and sixteen mitochondrial protein-coding genes (mtCDS), with an eye toward the major unresolved systematic questions in this family, 2) compare the plastid genomes of Juglandaceae, and identify the location and extent of genetic variation in these genomes across within the Juglandaceae, 3) reconstruct a timecalibrated phylogeny of the Juglandaceae based on whole chloroplast genomes, 4) reveal the timing of diversi cation for important nodes within the family.

Materials And Methods
Taxon Sampling, Genomic DNA Extractions, Library, and Sequencing We analyzed twenty-seven species of Juglandaceae from seven genera that span the taxonomic, geographic, and morphological range of the family. These were contextualized using published plastomes of nine species of Fagales (include four species for Betulaceae, and ve species for Fagaceae), three species of Cucurbitales, and four species of Rosales (Table   S1). The voucher specimens were deposited in the herbarium of Key Laboratory of Resource Biology and Biotechnology in Western China (Ministry of Education), Northwest University (Table 1). We collected fresh leaf samples from eld, and the samples were stored in air tight bags lled with silica gel desiccant for later DNA extraction.

Plastomes Assembly and Annotation
The sequenced and assembled plastomes were quality controlled using the NGSQC toolkit v2.3.3 trim tool to remove low quality reads, unknown bases, adapter sequences, and sequencing errors [53]. Short reads were assembled into long contigs using SPAdes Genomic Assembler v3.6.0 [54], followed by manual checking and nishing. The contigs were assembled in Geneious v8.0.2 [55]. To exclude nuclear DNA, we used BLAST to remove contigs that did not align to a reference plastome from J. regia (Genbank accession number KT963008) [56]. A reference-based assembly allowed us to reconstruct each of all other species [13].
After we identi ed the boundaries between the inverted repeats (IR) and the single copy regions, i.e., the Large Single Copy (LSC) and Small Single Copy (SSC) regions, the completed plastomes were annotated using the online software DOGMA based on the J. regia reference [56-57]. We manually annotated start and stop codons and other regions of interest using Geneious v8.0.2 [55]. A circular representation of each plastome was visualized in OGDraw [58]. Finally, gene content, order, and variability were analyzed in Geneious and R [59]. The plastid genomes data were deposited in National Center for Biotechnology Information (NCBI), the accession numbers were KX703001 to KX703038 ( In this study, we called the nuclear SNPs from all samples of Juglandaceae (Table S3). The Illumina paired-end reads from each sample were rst processed to remove adaptor and low-quality sequences using Trimmomatic http://sourceforge.net/projects/picard/). We used GATK's Haploype Caller (local haplotype assembly) algorithm for SNPs and InDels based on each sample.

Partition Strategy and Phylogenetic Analysis
To infer the evolutionary relationships among the 27 Juglandaceae plastomes and to test the phylogenetic signal from different regions of the plastomes, we reconstructed the Juglandaceae phylogeny using the following four datasets based on the exons of protein-coding genes; to avoid large amounts of missing data in the phylogenetic analyses, sixty-one protein coding genes that were shared by all 44 taxa were extracted and aligned (Table S4). Best-t partitioning schemes and models were selected using the greedy search mode implemented in PartitionFinder v2.  [71]. BI trees were produced by MrBayes v3.2.6 set at 10,000,000 generations. Two independent Markov chain Monte Carlo (MCMC) chains were run, each with one cold chain and three incrementally heated chains. Trees were sampled every 10,000 generations, with the rst 25 % of the trees discarded as burn-in. Stationarity was considered reached when the average standard deviation of split frequencies was < 0.01. The Maximum Likelihood (ML) trees were generated using RAxML v8.1.24 using a GTRGAMMA model [71]. For ML analysis, difference general time reversible models were performed with all data sets. For all analyses, 10 independent ML searches were conducted, bootstrap support was estimated with 1,000 bootstrap replicates, and bootstrap (BS) proportions were drawn on the tree with highest likelihood score from the 10 independent searches. We generated multiple mtCDS sequence alignments using ClustalX with default parameters [72]. The phylogenetic tree analysis was performed using MEGA7 [73].
For the phylogenetic tree analysis based on nuclear genome data, we selected a total of 1,161,468 SNPs with minor allele frequency (MAF) ≥ 5 % and missing rate per site ≤ 10% for phylogenetic analyses. A Maximum Likelihood (ML) tree was constructed using RAxML v8.1.24 in 1,000 bootstrap replicates [71]. In order to gain a better understanding of the species relationships, we selected 55 species to represent all extant genera in the Juglandaceae for which internal transcribed spacer (ITS) sequence data are available in NCBI (Table S6). We generated multiple ITS sequence alignments using ClustalX with default parameters [72], and a phylogenetic tree analysis using Maximum Likelihood analysis [71].

Divergence-time Estimation and Fossil Calibration
Penalized likelihood (PL) dating analyses were conducted using the treePL v1.0 program [74]. This program allows for better optimization with large trees by combining stochastic optimization with hill-climbing gradient-based methods. To identify the appropriate level of rate heterogeneity in the phylogram, a data-driven cross-validation analysis was conducted with treePL v1.0. One thousand bootstrap replicates with branch lengths were also generated using RAxML v8.  Table S1). The coding regions contained 137 genes, including 81 protein coding genes (eight duplicated in the IR), 33 tRNA genes (seven duplicated in the IRs) and four rRNA genes (four duplicated in the IRs) (Fig. 2). There were four introns (rpl2, rpl16, rps16, and rpoC1) located in the IRs region and 13 introns in the LSC region in each of the plastomes (Fig. S1). Seven tRNA genes, trnI-CAU, trnL-CAA, trnV-GAC, trnI-GAU, trnA-UGC, trnR-ACG, and trnN-GUU were duplicated and scattered in the inverted repeat (Fig. 2). We aligned each of the protein-coding genes (CDS) of all species. Three potential pseudogenes (infA, rpl22, and ycf15) were identi ed and their sequence veri ed using Sanger sequencing (Shagon Biotech, Shanghai, China) ( Fig. S2; Table S8).

Variant Analysis
Comparison of the whole chloroplast genome sequences revealed a total of 18,050 SNPs and 2,496 Indels (insertions and deletions), for a total of 6,594 high-quality non-redundant variant positions, or approximately 5.66 SNPs/kb (  Note: cp-SNPs = The number of SNPs of chloroplast genomes, nr-SNPs = The number of SNPs of whole genome resequencing, cp-Indels = The number of Indels of chloroplast genomes, Ts/Tv ratio = The transition divided transversion ratio based on chloroplast genomes and whole genome resequencing data respectively, Mapped = the mapped ratio of whole genome resequencing data used common walnut genome sequence data, Het-ratio = The Heterozygosity ratio of each samples based on the whole genome resequencing data.

Phylogenetic Analysis
Based on best-t partitioning schemes and models, the phylogenies returned from the RAxML and MrBayes analyses using sixty-one chloroplast protein-coding genes with all branches highly supported (Fig. 3a). At the family level, there are six wellsupported major clades with all species of the Juglandaceae, Myricaceae, Betulaceae, Fagaceae, and Cucurbitaceae families (Fig. 3b). Within the Fagales, members of the Juglandaceae were closest to the Myricaceae and Betulaceae (Fig. 3b). Species within the Juglandaceae divided into three groups corresponding to the three previously described sub-families (Juglandoideae, Engelhardioideae, and Rhoipteleoideae) with 100 % bootstrap (BS) support based on mtCDS and chloroplast genomes by maximum likelihood (ML) analysis (Fig. 3a, b).
Within the Juglandoideae subfamily, the species divided into ve groups, corresponding to the ve genera Carya, Platycarya, Cyclocarya, Pterocarya, and Juglans that were strongly supported as monophyletic (Fig. 3b). The genus Pterocarya was most closely related to Juglans (Fig. 3). The wheel wingnut (Cyclocarya paliurus) is the sole member of its genus in Juglandaceae. It was monophyletic and most closely related to Pterocarya based on chloroplast genomes ( Fig. 3b; Fig. 4). In Carya, Pecan (C. illinoinensis a North American species) was joined with the other four species of Carya (Asian hickories) with 100% BS.
Based on 1,161,468 nuclear SNPs, the phylogenetic analysis showed a generally well-supported clustering topology with high bootstrap values when rooted against Populus trichocarpa as outgroup (Fig. 4). The resulting phylogeny identi ed and provided 100 % support for the three sub-families that we observed in the plastome-based phylogeny of the Juglandaceae ( Fig. 3; Fig. 4): Clade I (Rhoipteleoideae), clade II (Engelhardioideae), and clade III (Juglandoideae). Clade III (Juglandoideae) contained ve genera Platycarya, Carya, Cyclocarya, Pterocarya, and Juglans, however, the relative placement of the three genera, Carya, Platycarya, and Cyclocarya was not consistent in the phylogenies based on the combined Cp and mitochondrial genomes versus the nuclear data. Although we only used one species in Platycarya, our results strongly supported the model that Cyclocarya and Platycarya are monophyletic with long branches and taxa-speci c SNPs ( Fig. 3; Table S3). Based on nuclear SNPs we found a strong sister relationship of Cyclocarya to Pterocarya and, secondarily, to Juglans (Fig. 4), as suggested by Manos et al. (2007) [16] and Larson-Johnson (2016) [34].
We reconstructed the Bayesian and ML trees based on ITS sequences of 55 Juglandaceae species (Fig. S4). The resulting phylogenetic tree showed that the three subfamilies, Juglandoideae, Engelhardioideae, and Rhoiptelioideae, cluster as monophyletic branches, however, support for the genera within the Juglandoideae was weak (< 50%) (Fig. S4). ITS alone produced cladograms markedly different than accepted topologies.  (Fig. 5).
In previous studies, it was suggested the genus Cyclocarya is sister to genus Platycarya [16] based on fossil, chloroplast DNA fragments, and morphological data. Our data also con rm this relationship (Fig. 5). Alternatively, it was suggested by Xiang et al. (2014) that Platycarya is sister to Juglans based on ve chloroplast markers [30], that Carya and Platycarya are sister groups [30]. Others considered Cyclocarya and Juglans to be sister groups [28]. Using criteria based on fruit morphology, however, Carya and Juglans are sister groups [34], this relationship was not con rmed by our DNA-based analysis (Fig. 5), and Cyclocarya and Pterocarya are sister groups [this relationship was supported in our data (Fig. 3, Fig. 5) [34,43]. Previously, Smith and Doyle (1995) [81], based on chloroplast DNA and morphological data, concluded that Platycarya evolved earlier than Carya; our results based on nuclear resequencing (Fig. 4) supported this conclusion. Our results based on sequencing the entire chloroplasts, however, indicated that the differentiation of Carya preceded Platycarya (Fig. 3, Fig. 5; Fig.  S4), as suggested by Zhang et al. (2013), although their differentiation, about 57 Mya, was roughly simultaneous.

The Phylogenetic Relationships within genus of Juglandaceae
Our analyses fully resolved some previously unresolved intergeneric relationships and added additional evidence supporting some of the recently altered generic circumscriptions based on analyses with much more appropriate representation at the species level. The species C. sinensis (Chinese Hickory, beaked walnut, or beaked hickory) was resolved into Carya (Annamocarya sinensis) [82]. The generic circumscription of Annamocarya (also C. sinensis) has frequently been altered, and many genera have been segregated from or merged with Carya [26, 79,83].
The previously unresolved intrageneric relationships of Pterocarya were also resolved with high support. P. stenoptera var. zhijiangensis and P. hupehensis were clustered together (Fig. 3). These two species are sympatric and P. stenoptera var. zhijiangensis may be a subspecies of P. hupehensis (Fig. 3, Fig. 5, but see Fig. 4). The taxonomy of sub-species P. stenoptera var. zhijiangensis and P. macroptera var. insignis con icted with the previous study of Wu and Raven (1999) [82]. We consider these taxa subspecies based on our data ( Fig. 3; Fig. 4, but see Fig. 5), however we did not complete a detailed phylogeny of Pterocarya because our sample pool was too small.
Based on sequence data from 16 mtCDS and 61 chloroplast protein-coding genes, our results supported the uni cation of J. mandshurica, J. ailantifolia, and J. cathayensis within sect. Cardiocaryon ( Fig. 3; Fig. S4), consistent with a previous conclusion based on genotyping by sequencing data [22,86]. We also con rmed that the Ma walnut (J. hopeiensis) arose from the resent hybridization of J. regia and J. mandshurica based on both matrilineal and biparental inheritance data ( Fig. 3; Fig. 4) [12,86]. The placement of J. cinerea into Rhysocaryon (black walnuts) based on plastome sequence was clear (Fig. 3), however, it belongs to Cardiocaryon (Asian butternuts) based on nuclear sequences (Fig. 4), and its morphology is consistent with Cardiocaryon [12,15]. In addition, J. cinerea can hybridize with members of Cardiocaryon and even Dioscaryon, but not with Rhysocaryon [87]. All other North American Rhysocaryon freely hybridize. The discordance between the J. cinerea nuclear genome and its plastome is almost certainly the result of a chloroplast capture [15,31]. It is notable that the chloroplast of J. cinerea is not an ancient one (ancestral to the Rhysocaryon) but is instead most like J. nigra (Fig. 5).
Our results indicated that the capture of a Rhysocaryon chloroplast by J. cinerea capture was relatively recent (Fig. 5 76.58-80.50 Mya). The major diversi cation of the family is recorded in the pollen and megafossil record of the early Tertiary (~ 65Mya) at the K-T boundary [24]. The three subfamilies diverged during the Late Cretaceous to Early Palaeocene  [30,34], and ~ 18 Mya between Pterocarya and Cyclocarya [34]. During the end of the Eocene, Cyclocarya and Platycarya became extinct in North America but survived in Eurasia [24]. Our results indicated Carya emerged as an animal-dispersed genus about 58 Mya, considerably earlier than the estimate (44 Mya) of Larson-Johnson (2016) [34], although we agree that the overwhelming majority of winged and wingless fruited genera diverged or diversi ed during the Paleogene, probably re ecting adaptation to changing regeneration regimes [92].
During the early Tertiary to the Neogene there was likely extensive migration and exchange among North Atlantic, North America, western Europe, and Asia [24]. Interestingly, most species within the extant genera diversi ed between 18.54 and 8.52 Mya in warm and dry environments of the Early Miocene (Fig. 5), a period of especially rapid speciation within Juglans and Pterocarya. Some closely related species pairs within Juglans appear to have diverged relatively recently, under the in uence of climate change during the Quaternary glacial period ( Fig. 5; Bai et al. 2017). For example, J. regia and J. sigillata, J. mandshurica and J. hopeiensis, and Carya hunanesis and C. kweichwensis (Fig. 5). Overall, the Juglandaceae re ect a complex evolutionary history and diversi cation affected by changes in geography, distinctive distributions, climate changes, coevolution with animals. Biotic interactions (e.g., pathogens) no doubt also had a role in driving species abundance and distribution [93], but biotic interactions of that type are di cult to detect from current data [35][36][37][38][39].

Conclusion
Our results are a rst attempt to use whole genomes to elucidate the characterize sequence divergence and evolutionary history in the Juglandaceae. Evidence of early lineage diversi cation, hybridization and extinction lead us to predict complex evolutionary histories for the extant species in the Juglandaceae. A fully resolved, strongly supported, time-calibrated phylogenetic tree of Juglandaceae will provide an important framework for studying classi cation, diversi cation, biogeography, phenotypic evolution, gene function and comparative genomics of this important family. Our results supported some recently clari ed circumscriptions of controversial genera, although our taxonomic sampling is insu cient to stand alone as de nitive. Wider plastid phylogenomics, whole genomes (nuclear data), a more complete fossil record, better dating of the fossil record and more studies of morphology will all be needed to fully reconstruct the phylogeny of woody plant families such as the Juglandaceae and other families of Fagales.

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.

Availability of data and materials
The annotated newly chloroplast genomic sequence were deposited into GenBank (MH188288-MH188304, MH189594-MH189595; Details see Table 1). territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. This map has been provided by the authors.  The Maximum Likelihood (ML) phylogenetic trees of Juglandaceae. Trees are based on sixteen mtCDS (a) and sixty one chloroplast protein-coding genes in the chloroplast (b). For both trees, the PartitionFinder method for the best model combinations (Table S5) was inferred by RAxML. Numbers at nodes correspond to ML bootstrap percentages (10,000 replicates). The three subfamilies are indicated with colored shading; Rhoipteleoideae (grey), Engelhardioideae (pink), and Juglandoideae (blue). Fruit morphology is shown using one species from each genus; the black solid circles indicate wingless fruits, hollow circles indicate winged fruits. Details for the outgroups (orange bar) are in Table S1.

Figure 4
The Maximum Likelihood (ML) phylogenetic tree of Juglandaceae plus outgroup taxa (Populus trichocarpa) based on nuclear SNPs from whole genome resequencing data. Numbers at nodes correspond to ML bootstrap percentages (10,000 replicates). The three subfamilies are indicated with shading (Rhoipteleoideae (grey), Engelhardioideae (red), and Juglandoideae (blue). The fruit of one species from each genus is shown. The triangles indicate taxa with discordance between nuclear and chloroplast phylogeny.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. SupplementaryMaterial.rar