Genome-wide characterization of PEBP family genes in nine Rosaceae tree species and their expression analysis in P. mume

Background Phosphatidylethanolamine-binding proteins (PEBPs) constitute a common gene family found among animals, plants and microbes. Plant PEBP proteins play an important role in regulating flowering time, plant architecture as well as seed dormancy. Though PEBP family genes have been well studied in Arabidopsis and other model species, less is known about these genes in perennial trees. Results To understand the evolution of PEBP genes and their functional roles in flowering control, we identified 56 PEBP members belonging to three gene clades (MFT-like, FT-like, and TFL1-like) and five lineages (FT, BFT, CEN, TFL1, and MFT) across nine Rosaceae perennial species. Structural analysis revealed highly conserved gene structure and protein motifs among Rosaceae PEBP proteins. Codon usage analysis showed slightly biased codon usage across five gene lineages. With selection pressure analysis, we detected strong purifying selection constraining divergence within most lineages, while positive selection driving the divergence of FT-like and TFL1-like genes from the MFT-like gene clade. Spatial and temporal expression analyses revealed the essential role of FT in regulating floral bud breaking and blooming in P. mume. By employing a weighted gene co-expression network approach, we inferred a putative FT regulatory module required for dormancy release and blooming in P. mume. Conclusions We have characterized the PEBP family genes in nine Rosaceae species and examined their phylogeny, genomic syntenic relationship, duplication pattern, and expression profiles during flowering process. These results revealed the evolutionary history of PEBP genes and their functions in regulating floral bud development and blooming among Rosaceae tree species.


Abstract
Background Phosphatidylethanolamine-binding proteins (PEBPs) constitute a common gene family found among animals, plants and microbes. Plant PEBP proteins play an important role in regulating owering time, plant architecture as well as seed dormancy. Though PEBP family genes have been well studied in Arabidopsis and other model species, less is known about these genes in perennial trees.

Results
To understand the evolution of PEBP genes and their functional role in owering control, we identi ed 56 PEBP members belonging to three gene clades (MFTlike, FT-like, and TFL1-like) and ve lineages (FT, BFT, CEN, TFL1, and MFT) across nine Rosaceae perennial species. Structural analysis revealed highly conserved gene structure and protein motifs among Rosaceae PEBP proteins. Codon usage analysis showed slightly biased codon usage across ve gene lineages. With selection pressure analysis, we detected strong purifying selection constraining divergence within most lineages, while positive selection driving the divergence of FT-like and TFL1-like genes from the MFT-like gene clade. Spatial and temporal expression analyses revealed the essential role of FT in regulating oral bud breaking and blooming in P. mume. By employing a weighted gene co-expression network approach, we inferred a putative FT regulatory module required for dormancy release and blooming in P. mume.

Conclusions
We have characterized the PEBP family genes in nine Rosaceae species and examined their phylogeny, genomic syntenic relationship, duplication pattern, and expression pro les during owering process. These results revealed the evolutionary history of PEBP genes and their functions in regulating oral bud development and blooming among Rosaceae tree species.

Background
Proper timing of owering is a key adaptive strategy in plant species, especially temperate woody perennials [1][2][3]. The owering time in annual or biennials is largely determined by the timing of the transition from vegetative growth to reproductive growth [4,5]. However, in temperate tree species, ower buds initiate and develop during summer, undergo a short period of dormancy, exit dormancy after exposure to chilling temperatures and nally bloom in suitable environments [6]. Therefore, the blooming time of temperate woody perennials is mainly determined by intrinsic state of ower buds and external environments [7,8]. Within the context of global climate change, warm winters and irregular occurrences of extreme weather have disrupted the timing of spring phenological events in tree species, increased the risk of frost damage, and caused abnormal fertility and poor fruit setting due to insu cient winter chill [9][10][11][12]. Therefore, it is important to study the owering time control in perennial species and understand their adaptation mechanisms in synchronizing the timing of oral bud breaking and reproduction with local climate [10,13,14].
Phosphatidylethanolamine-binding proteins (PEBPs) form a superfamily of genes containing a PEBP domain, which is highly conserved across taxa, from bacteria and insects to mammals and plants [15][16][17]. Mammalian PEBPs are globular proteins composed of a functional binding site for acetate, phosphate groups and phosphorylethanolamine [18,19]. Plant PEBP homologs share similar conserved motifs, except their C-terminal part is deleted [20,21]. Animal PEBP proteins were reported to function as serine proteases or Raf kinase inhibitors, controlling cell growth and differentiation [22][23][24][25]. In plants, PEBP genes are central regulators in determining the owering time, plant architecture and seed germination [26][27][28][29][30]. In angiosperms, members of the PEBP family fall into three clades of genes: FLOWERING LOCUS T (FT), TERMINAL FLOWER 1 (TFL1) and MOTHER OF FT AND TFL1 (MFT) [31,32]. It was reported that MFT-like genes exist in both basal land plants and seed plants, while FT-like and TFL1-like genes were only found in gymnosperms and angiosperms, indicating that the MFT clade might be the evolutionary ancestor to FT-like and TFL1-like genes [32,33]. Despite extensive sequence similarity among PEBP members, their functions have diverged from each other [34].
FT and TFL1 are two major PEBP proteins that are well studied in Arabidopsis and in many other plant species [35][36][37][38]. In Arabidopsis, FT acts as a oral signal transducer, moving from leaves to the shoot apical meristem to promote owering, while TFL1 maintains in orescence meristem identity in shoot apex by antagonizing FT function [39][40][41]. The balance of FT and TFL1 modulates oral transition and in orescence architecture by affecting determinacy of meristem identity [30,42]. FT and TFL1 share ∼60% of their amino acid sequence identity, but only a few amino acid changes can convert FT from a oral promoter to a TFL1-like oral repressor [37,43]. In addition to FT and TFL1, the Arabidopsis PEBP gene family includes MOTHER OF FT AND TFL1 (MFT), TWIN SISTER OF FT (TSF), BROTHER OF FT AND TFL1 (BFT), and CENTRORADIALIS (CEN) [27]. MFT integrates abscisic acid (ABA) and gibberellic acid (GA) signaling pathways and acts in a PIF1-dependent manner to repress seed germination under far-red light [28,44]. TSF encodes the closest homolog of FT and resembles FT as a oral inducer under non-inductive SD conditions [45]. BFT and CEN are two oral repressors in Arabidopsis, and the overexpression of either one resulted in a late owering phenotype similar to plants overexpressing TFL1 [46][47][48].
Although the PEBP gene family has been recognized as key oral regulators in model species, their molecular evolution and function remains less clear in woody perennials. The Rosaceae family consists of over 2500 species from approximately 90 genera, most of which are native to temperate zones around the world [49][50][51]. Prunus is a large genus belonging to the tribe Amygdaleae and contains about 430 species, many of which are important fruit crops, such as plums, cherries, apricots and peaches [52]. Additionally, Prunus includes a large number of spring-blooming trees with high ornamental and economic value. Prunus mume is one of the earliest owering species, which blooms in late winter or early spring, followed by apricots, peaches, cherries and plums that ower during March to April. Apple and pear trees from the tribe Maleae bloom much later, around April to May in Northern China [53]. With the divergent owering times among Rosaceae tree species, it is of great interest to investigate the evolution of PEBP family genes and their functional roles in governing owering time among Rosaceae tree species.
The phylogeny structure inferred from the alignment of PEBP domains was generally in accordance with that of the whole protein sequence alignment, suggesting the PEBP domain as the major factor driving the evolution of Rosaceae PEBPs (Figure 3a). Five motifs covering 160 amino acids were identi ed by the MEME program among Rosaceae PEBP proteins (Figure 3b). Among these, Motifs 1, 2, 4, and 5 together spread over the whole PEBP domain ( Figure  3b). Motifs DPDXP (Asp-Pro-Asp-X-Pro) and GIHR (Gly-Ile-His-Arg), which are essential for anion-binding activity, were present in the fourth exon of the PEBPs ( Figure 1). We also found that residues distinguishing FT-like genes from TFL1-like proteins were conserved among the two gene lineages ( Figure S3).

Microsynteny and duplication analysis of PEBP genes
To understand the evolution origin of PEBP family genes, we performed inter-and intra-genomic synteny analysis with MCScanX for Arabidopsis and seven Rosacea species with chromosome-level genome assemblies. We observed large interspecies collinear blocks between four Prunus species, P. avium, P. persica, P. armeniaca, and P. mume, which indicates high level of macrosynteny among Prunus species ( Figure S4). The genome comparisons between R. occidentalis and M. domestica and between M. domestica and P. avium revealed large-scale chromosomal rearrangements including translocation and fusion-ssion events that possibly occurred during the genome evolution of Rubus, Malus, and Prunus genera ( Figure S4). Based on intra-genomic comparisons, we classi ed the duplication origin of orthologous gene pairs for Arabidopsis and other Rosacea species (Table S1). Among all duplication types, whole-genome duplication (WGD)/segmental duplication was the major type for M. domestica, tandem duplicated genes were mostly found in A. thaliana, P. armeniaca, and P. persica, and dispersed duplication events were enriched in the genomes of R. occidentalis, P. mume, P. avium, and P. dulcis (Table  S1). Furthermore, we characterized the duplication modes of PEBP family genes across species (Table S2; Figure 4; Figure S5). In Arabidopsis, R. occidentalis, and P. armeniaca, all PEBP gene members were predicted to be originated from dispersed duplications ( Figure S5; Table S2). In four Prunus species, FTs, MFTs, and BFTs were classi ed as having dispersed duplication, while TFL1-like and CEN-like genes were classi ed as exhibiting WGD/segmental duplication ( Figure S5; Table S2). The inter-genomic comparison of Prunus species con rmed that TFL1 and CEN genes were within shared syntenic blocks between species, indicating a shared duplication origin of TFL1-like and CEN-like genes in Prunus species (Figure 4). Within the genome of M. domestica, we detected seven syntenic blocks consisting of three WGD/segmental duplication gene pairs, including MdTFL1-MdTFL2, MdCEN1-MdCEN2, and MdBFT1-MdBFT2, and two dispersed duplication events related to MdFT and MdMFT ( Figure S5; Table S2). The inter-genomic comparisons between M. domestica and R. occidental and between M. domestica and P. avium also con rmed that the duplicated gene pairs MdCEN1-MdCEN2, MdTFL1-MdTFL2, and MdBFT1-MdBFT2 are likely resulted from an independent WGD event unique to the Malus tribe ( Figure 4).

Codon usage bias and other gene parameters
We observed differentially preferred codons and different gene features across ve Rosaceae PEBP gene lineages ( Figure 5; Figure S6; Table 2). For arginine, codons AGA and AGG were most frequently used by all lineages compared with other codons ( Figure S6). Codon UCC encoding serine was mostly used in CEN-like and MFT-like proteins, while codon UCU was mostly employed by the FT and TFL1 lineages ( Figure S6). We also observed signi cant differences in other gene parameters among different PEBP lineages (all Kruskal-Wallis tests pval<0.01) ( Figure 5; Table 2). The codon adaptive index (CAI) of MFT and TFL1 genes is signi cantly larger than those of the other gene groups (Kruskal-Wallis test pval= 2.75e -7 ) ( Figure 5; Table 2). In contrast, the effective number of codons (ENC) estimated for MFT and TFL1 genes is much lower than those of the other groups (Kruskal-Wallis test pval= 0.005) ( Figure 5; Table 2). The average ENC values ranging from 51.24 to 54.95 indicated weak codon bias among PEBP genes. Analysis of the GC content revealed that MFT lineage genes had much higher GC content indices compared to other genes ( Figure 5; Table 2). In contrast, TFL1 and BFT lineage genes appear to have lower GC1% and GC3% but relatively higher GC2% compared to other groups ( Figure 5; Table 2). All gene parameters showed no variation among species ( Figure S7; Kruskal-Wallis test pval>0.05). Strong pairwise correlations between gene parameters were observed (Table S3). For example, the CAI was positively correlated with the total GC% and GC3% (both correlation coe cient r≥0.56) but was negatively correlated with the ENC (r=-0.62) (Table S3). On the other hand, the ENC displayed a negative correlation with the total GC content and GC3% (Table S3).

Molecular evolution of different PEBPs lineages
To investigate the evolution of PEBP genes in Rosaceae species, we performed selection scans on coding sequences of all PEBPs using the branch model, site model, and branch site model in the CODEML program of PAML (Table 3; Table S4; Table S5). Branch models with different ω parameters speci ed for foreground lineages (i.e., FT-like, TFL1-like, CEN-like, and BFT-like lineages and FT/TFL1 clades) were compared with the xed ratio model (Table S4). The likelihood ratio tests (LRT) on models specifying individual lineages of FT, TFL1, CEN, and BFT genes as the foreground branch showed no signi cant difference in ω between the foreground and background branch (P>0.05) (Table S4). However, the LRT test on the branch model specifying the FT and TFL1 clades as the foreground branch suggested signi cant divergence among FT/TFL1 and MFT clade genes (P<0.001) (Table S4). We then applied the site model LRT test and detected signs of positive selection among sites of PEBP proteins (Table S5). The branch-site LRT tests further revealed strong positive selection within TFL1 lineage and slight positive selection within FT lineages at speci c protein sites ( Table 3). The Empirical Bayes model suggested modest selection at positions 19 and 106 when FT lineage was set as the foreground branch and at positions 11 and 18 when TFL1 lineage was set as the foreground branch (Table 3). We further validated the results by performing selective pressure analysis on ve gene lineages separately with the software Selecton. Only the FT and TFL1 lineages showed the signature of positive selection, in which residues 40N, 56N, 128S, and 181L in the FT lineage (with RoFT1 as the reference gene) and 4T, 73V, 134P, 141S, 157L, and 161S in the TFL1 lineage (with PyTFL as the reference gene) were mostly selected ( Figure 6). In contrast, the genes of the other three lineages all showed signs of purifying selection across most sites ( Figure S8).

Cis-acting element analysis of the FT promoter
We extracted the 2000 bp region of FT genes and scanned for putative cis-elements by searching against the PlanPan and the PlantCARE databases (Table  4). We compared the type and copy number of cis-elements for 11 FT genes from A. thaliana, P. trichocarpa, M. domestica, Pyrus communis, R. occidentalis, P. armeniaca, P. avium, P. mume, and P. persica (Table 4). Within the promoter region of the investigated FTs, three to ten CCACA boxes (binding site for CO) were identi ed across nine species, while none were found for PtFT2 (Table 4). CArG boxes, the binding site for the MADS transcription factor, were present in all FT promoters, among which the AtFT promoter contained the most (Table 4). Light-response elements including the G-box, AE-box, GATA-motif, GT1-motif, and TCT-motif were present within all FT genes but in different types (Table 4). In addition, binding sites for MYB, MYC transcription factor, ethylene-responsive transcription factor, and abscisic-acid responsive element (ABRE) were present in all FT promoters (Table 4). Gibberellin-responsive elements of different types were detected in FT promoters, with GARE-motif in the promoters of MdFT, PcFT1, RoFT1, PvFT and P-box in the promoters of AtFT, PtFT2, PaFT, PvFT, PmFT, and PpFT. We also observed some cis-elements with species-speci c distribution patterns. For example, the low-temperature responsiveness (LTR) element was only detected within the promoters of AtFT, PcFT1, RoFT1, PvFT and PmFT ( Table 4). The W-box, which is the binding site for WRKY transcription factor, was detected exclusively in RoFTs, PtFT2, and Prunus FT promoter regions (Table 4).

Tissue-speci c expression patterns of PEBPs
To explore the functional roles of PEBP genes, we examined their expression patterns in different tissues of four Rosaceae species, P. persica, P. mume, P. yedoensis, and R. occidentalis (Figure 7a-d). In general, we observed a differentiated expression preference of PEBP genes across different tissues ( Figure 7). Among the ve PEBP subfamilies, FT-like and TFL1-like genes were expressed in both vegetative tissues such as leaf and stem, and reproductive organs such as ower bud and fruit ( Figure 7). The transcription of CENs, as the closest paralogs of TFL1, was barely detected in any organs, except in the root tissues of P. mume ( Figure 7). MFT was only detected in seed embryos of P. persica and fruit tissues of P. yedoensis and R. occidentalis (Figure 7). BFT was detected in the fruit tissues of all species but was relatively highly expressed in leaf and stem tissues in P. yedoensis and R. occidentals, respectively ( Figure 7). We validated the tissue-speci c expressions of ve PEBP genes by real-time quantitative PCR (qRT-PCR) in P. mume ( Figure S9). PmFT is highly expressed in oral buds compared with its expression in leaf and stem, which is consistent with result of the above tissue transcriptome sequencing in P. mume (Figure 7; Figure S9). PmTFL and PmCEN were relatively highly expressed in root tissues ( Figure S9). PmBFT and PmMFT was barely detected in the four examined tissue types ( Figure S9). The somewhat inconsistent tissue-speci c expression patterns of PEBP orthologs across examined species are likely a result of non-uniformity in the sampling time, plant physiological state, and tissue speci city across four independent studies. Despite the inconsistency, the divergent expression of PEBP members across different tissue types indicates signi cant functional differentiation of PEBP gene lineages.

Expression analysis of PEBP genes during oral bud development in P. mume
We analyzed the expression of PEBP genes in ower buds of different developmental stages from July 10 th , 2019 to January 12 th , 2020 by qRT-PCR analysis. The expression of PmFT rst decreased as the bud initiated the oral meristem from July to August, increased as oral organ initiated and developed (from August to October), slightly decreased during bud dormancy, and then signi cant increased as the oral bud exited dormancy and bloomed ( Figure 8). PmBFT maintained a low expression level throughout the whole process, with only a minor increase during oral bud development in August and September ( Figure   8). The other PEBP members retained barely detected expression levels in oral buds of all developmental stages ( Figure 8). These results imply that PmFT is possibly the primary PEBP member participating in regulating oral bud development and bud ushing in P. mume.
Co-expression network analysis of FT during the blooming process in P. mume To explore the regulatory network of FT in owering regulation in trees, we reanalyzed the transcriptome changes of P. mume during dormancy release and the oral bud opening process [55] and performed a weighted co-expression network analysis (WGCNA). We identi ed 23 modules with distinct expression patterns ( Figure S10a). Module-trait association analysis revealed four modules, 'brown', 'turquoise', 'dark green', and 'salmon', associated with the progression of bud ushing (R 2 > 0.8). Among them, module 'brown' showed the strongest correlation with the FPKM of PmFT ( Figure S10b). The 'brown' module genes were signi cantly enriched in biological processes including cell cycle (GO: 0007049), ower development (GO:0009908), glucan metabolic process (GO:0009251), auxin transport (GO:0060918), and responses to abiotic stimulus (GO: 0009628). We further identi ed the top 50 genes most associated with PmFT and 15 known owering-related genes such as PmLFY, PmAP1, and PmCOL (Table S6) [56,57]. Among genes in the 'brown' module, SVP (SHORT VEGETATIVE PHASE), SOC1 (SUPPRESSOR OF OVEREXPRESSION OF CO 1), GI (GIGANTEA), and CIB1 (CRYPTOCHROME-INTERACTING BASIC-HELIX-LOOP-HELIX 1) were previously identi ed as key players in the FT-dependent oral regulation in Arabidopsis [58,59] (Figure 9a). Four tandem-duplicated PmDAMs (PmDAM1, PmDAM4, PmDAM5, PmDAM6) from the 'brown' module also exhibited expression patterns negatively correlated with that of PmFT (Figure 9a-b). The expression patterns of other known oral regulators such as COL (CONSTANS-LIKE) from the 'turquoise', LHY1 (LATE ELONGATED HYPOCOTYL 1) and AP1 (APETALA1) from 'dark green' module were not highly correlated with PmFT (R 2 < 0.62) (Figure 9b). PmFT showed a relatively weak transcription level in endodormant oral buds ( Figure 9b). As the oral bud continued accumulating chilling units and exiting dormancy, PmFT expression signi cantly increased and showed the highest expression in ushing buds (Figure 9b). PmCIB1 and 37 other genes showed similar expression patterns to that of PmFT, while PmPHYB (Pm008367), PmGI, PmLHY, PmCOL, PmSVP, PmSOC1, and four PmDAMs displayed contrasting expression patterns, with their expression decreasing as the oral buds exited endodormancy (Figure 9b). The expression patterns of FT and its coexpressed genes were further veri ed by qRT-PCR analysis (Figure 9c).

Discussion Evolution trajectory of PEBP family genes in Rosaceae genomes
PEBPs form an ancient gene family central to many plant developmental processes, including oral transition, plant architecture, and seed germination [30,32,60]. In Arabidopsis, the PEBP family constitutes six genes grouped into three distinct clades, FT-like (FT and TSF), TFL1-like (TFL1 and CEN), and MFT-like genes [31]. Though previous studies have characterized the functions of PEBP family genes in model plants, none have focused on a comparative analysis of the PEBP family in tree species. Our study conducted a systematic search across nine Rosaceae genomes and identi ed 56 PEBP family genes orthologous to six Arabidopsis genes, FT/TSF, TFL1, CEN, BFT, and MFT. The number of PEBP family members in Prunus species (chromosome 2n = 2x = 16) was approximately the same as that in Arabidopsis ( ve to six copies), while PEBP members were expanded in M. domestica and Pyrus communis (chromosome 2n = 2x = 34). Genome synteny and duplication analyses together supported that duplicated ortholog pairs MdTFL1-MdTFL2, MdCEN1-MdCEN2, and MdBFT1-MdBFT2 are likely originated from a recent whole-genome duplication (WGD) event that occurred in the Maleae clade after splitting from Prunus [61]. However, only one copy of MdFT, MdMFT, and PcMFT was retained in apples and pears, indicating that the duplicated copy may have been lost during species evolution after the WGD [62]. The duplication mode analysis also suggested a shared origin of TFL1 and CEN from segmental or WGD duplication in Prunus species (Table S2). Previous studies reported that the angiosperm TFL1-like gene experienced duplication after splitting from basal angiosperms, followed by functional divergence, resulting in TFL1 and CEN gene lineages in eudicots [63]. Given the conserved sequence alignment of Prunus TFL1/CEN orthologs with other Rosaceae species, it is unlikely that Prunus TFL1/CEN arose from a recent segmental duplication or WGD unique to Prunus species. Therefore, the syntenic relationship may have been caused by the preservation of genomic segments containing TFL1, CEN, and their neighboring genes through rounds of chromosome rearrangements during Prunus species evolution. In Arabidopsis, the TSF gene, which is a homolog of FT, highly resembles FT in its coding sequence and owering promoting role [64]. The absence of TSF in the Rosaceae genome suggests that the gene duplication of FT/TSF possibly occurred in Brassicaceae after splitting from their common ancestors [65].
The PEBP gene family experienced two ancient duplications, giving rise to three types: FT-like genes promoting owering, TFL1-like genes repressing owering and maintaining indeterminate state of meristems, and MFT-like genes controlling seed germination [17,27,32]. The phylogenetic analysis suggests that Rosaceae PEBPs can be clustered into three distinct clades (FT, TFL1, and MFT), which is consistent with other species [17,27,32]. The FT-like clade can be further divided into FT and BFT lineages, and the TFL1-like clade can be divided into TFL1 and CEN lineages. Based on maximum-likelihood test on branch models specifying different gene lineages (FT, TFL1, CEN, MFT, and BFT) as the foreground branch, we detected no evidence of positive selection acting on any of them. However, we observed signi cant selection acting on FT/TFL1 clade genes with the MFT clade speci ed as the background branch, which supports the theory that functional divergence of the FT/TFL1 clade occurred after splitting from the MFT clade [33]. Through likelihood ratio tests on branchsite models, we detected a few slightly selected codons within the FT lineage and a few strongly selected codons in the TFL1 lineage, which is consistent with results of Selecton analysis on individual lineages. In summary, these results indicate that adaptive evolution is driving the divergence of the FT and TFL1 clades from the MFT clade, as well as the diversi cation among FTs and TFL1s in Rosaceae species. These result are consistent with a previous study reporting that positive selection on FT-like genes especially within the fourth exon is driving their divergence from MFT and TFL1 clade [17]. We also observed strong purifying selection constraining protein evolution within the MFT, CEN, and BFT lineages in Rosaceae species. However, this does not rule out the possibility of positive selection acting on a few codons masked by strong purifying selection in preserving the other sites [17].
Additionally, we examined the codon usage patterns of PEBP genes across Rosaceae species. Codon usage bias refers to the nonrandom choice of synonymous codons in speci c genes or species and can affect the translation e ciency and accuracy, protein folding, and biological functions [66,67]. The codon usage pattern usually re ects the balanced effect of mutation pressure and selection constraints during long-term evolution [68,69]. Several codons for amino acids were differentially preferred across ve PEBP lineages. Among all codons, the most frequently used codon for arginine was AGG for FT, CEN, and MFT and AGA for the BFT and TFL1 lineages ( Figure S6). Several other codons, including TCC, TCA, TCT for serine and CCT for proline, were preferred by speci c PEBP gene lineages, indicating differentially selected codons by different PEBP gene lineages. To further understand the factors in uencing codon usage patterns, we compared the GC content, gene length, CAI, and ENC of different PEBP lineages and species. The CAI measures the optimal codon usage for a gene and is commonly used as an index for the expression level [70]. The ENC has been widely used to determine the level of codon bias for individual genes [71]. We observed signi cant differences in these gene features estimated for different gene lineages but not for species. Despite the differences, all genes had a relatively high CAI (range 0.81-0.87) and moderate ENC (above 47), indicating high translational e ciency and slightly biased codon usage among PEBP genes. Furthermore, the strong pairwise correlations between ENC and GC content, ENC and CAI indicate that the nucleotide composition and gene expression level are two factors possibly contributing to the differentiated codon preference among different PEBP gene lineages [69].
Functional role of FT/TFL1 genes in Rosaceae tree species Structural analysis of Rosaceae PEBP proteins revealed a highly conserved gene structure and amino acid sequence, especially within the PEBP functional domain (Figure 1; Figure 3; Figure S2). All PEBP family genes shared a common gene structure with exactly four exons of similar sizes. Among the conserved protein motifs, the anion-binding D-P-D-x-P and G-x-H-R motifs are important for the conformation of the ligand binding site in PEBP proteins [72]. Mutations close to this region may affect the binding of FT protein with phosphate ions and thus alter its interaction with FD (FLOWERING LOCUS D) [73]. Segment B on exon 4 encodes an external loop, and together with its adjacent segment C, determines the opposite functions of FT and TFL1 in Arabidopsis [35]. Another key protein motif is the 14-3-3 binding domain that is essential for FT/TFL1 interaction with 14-3-3 receptors to modulate owering [20]. Key residues within these motifs are critical in determining FT/TFL1 functions. For example, the substitution of an amino acid (replacing His-88 in TFL1 with Tyr) can convert TFL1 into a oral promoter [37]. In another study, speci c mutations at four residues-Glu-109, Trp-138, Gln-140, and Asn-152-converted FT into a TFL1-like repressor [43]. The amino acids at each of these critical positions were highly conserved and speci c to FT-like and TFL1-like proteins, which suggests that the oral promoting and repressing role of FT/TFL1 genes in Rosaceae species is possibly conserved.
Recent molecular studies have characterized the function of Rosaceae FT/TFL-like genes in several Rosaceae perennials [33]. The overexpression of MdFT in both Arabidopsis and apple lead to precocious owering [74]. The ectopic expression of PmFT and RoFT in tobacco leads to extremely advanced owering [75]. Similarly, the late-owering phenotype of Arabidopsis ft mutant can be rescued by overexpressing PpFT, indicating the conserved oral promoting role of FT in examined Rosaceae species [76]. On the other hand, prolonged vegetative growth and a late-owering phenotype were observed for transgenic Arabidopsis/tobacco overexpressing PpTFL1, PmTFL1, RoTFL1, MdTFL1-1/2, suggesting that the Rosaceae TFL1-like genes can complement the TFL1 function in Arabidopsis [77][78][79].
Despite the conservative function of Rosaceae FT/TFL1-like genes in herbaceous plant systems, their regulatory roles in perennial trees may differ. For example, two homologs of PcFTs showed differed annual expression patterns in the apical buds of Pyrus communis [80]. The ectopic expression of PcFT2 caused early owering in tobacco but delayed dormancy and leaf senescence in M. domestica [80]. Another study in pears reported that the expression of FTs was not induced in the reproductive meristem prior to oral initiation, while the transcripts of TFL1s rapidly decreased and maintained a very low level, indicating the essential role of TFL1 in oral induction in Pyrus pyrifolia [36]. In our study, the minimal level of TFL1 throughout all oral bud stages may indicate that the repression of TFL1 is necessary for determinate oral meristem identity and terminal ower formation during oral bud development in P. mume. The multifaceted role of FT/TFL1-like genes was also observed in other tree species [33]. In poplar, PtFT1 functions as a oral promoter activated by chilling temperatures, while vegetative growth and dormancy breaking are promoted by PtFT2 [81]. Plum trees transformed with PtFT1 displayed a shrub-like growth habit, a reduced chilling requirement, and insensitivity to short-day signals [82]. In gymnosperms, FT-like genes exhibited contrasting roles in regulating growth cycling and bud setting [83]. For example, expression of FT/TFL1-like genes in Norway spruce (PaFTL2) and Scots pine (PsFTL2) increase during bud setting in autumn and decrease during bud bursting in the next spring [84][85][86]. Thus, FT/TFL1-like genes may undertake some novel functions concerning oral transition, plant architecture, and growth-dormancy cycling during the evolution of tree species.

Regulatory role of FT in promoting bud break and blooming in perennial trees
Flowering is a major developmental process that is key to the tness and reproduction of higher plants [87]. Plants have synchronized their seasonal timing of owering with favorable environmental conditions to ensure sexual reproduction success and seed production [87,88]. The regulation of owering times requires an intricate network of signaling pathways, which has been studied in many plant species but is best characterized in Arabidopsis [57,87,89]. FT functions as a gene hub integrating ve major oral induction pathways, including the photoperiodic pathway, vernalization pathway, autonomous pathway, gibberellin pathway, and age pathway [56,59]. In Arabidopsis, the transcription of FT is activated by the transcription factor CONSTANS (CO), which is affected by the circadian regulatory GI [90,91]. The GI-CO-FT module not only is used to regulate photoperiod-dependent owering in Arabidopsis and temperate cereals [92,93] but also showed a conserved function in regulating short-day induced bud dormancy in poplar [94]. In addition to CO, SVP, FLC (FLOWERING LOCUS C), and PIF4 (PHYTOCHROME INTERACTING FACTOR 4) from the vernalization pathway can also regulate FT transcription through directly binding the FT promoter or intronic regions [90,[95][96][97]. Upon induction by long-photoperiod signals, FT, together with other oral pathway integrators SOC1 and LFY (LEAFY), activates oral meristem identity genes such as AP1, APETALA2 (AP2), FRUITFULL (FUL), CAULIFLOWER (CAL), and LFY, which convert the vegetative meristem to oral meristem in Arabidopsis [59,98,99].
Though owering regulation is well understood in model species, it is still unclear in temperature tree species. Unlike annual or biennials, many trees in temperate environments initiate oral buds in the preceding summer, cease growth in autumn, with oral buds remaining dormant during winter, and then bloom early in spring after exposure to chilling temperatures [6,10]. Therefore, perennial owering marks the event of the oral bud exiting dormancy and ushing instead of the time of oral meristem initiation in annual species [6]. So far, many studies on oral bud breaking regulation have been reported; however, the molecular mechanism is still far from complete. Apart from regulating oral initiation, FT has been suggested to participate in regulating bud dormancy in temperate trees [100]. Poplar exhibited constitutive expression of FT1 initiated ower-like structures directly from tissue culture and showed delayed growth cessation in short-days [81,94], while FT2 was predominantly expressed during vegetative growth and is likely responsible for growth cessation and vegetative bud set [81]. The ectopic expression of poplar FT1 in plum causes precious owering and reduces the chilling requirement for dormancy release [82]. Moreover, Rinne et al. (2011) reported that FT is hyper-induced during bud breaking in poplar, indicating that FT may also participate in regulating dormancy release in poplar [101]. In pear, chilling reduces the expression of DAM genes, which are well-known oral repressors, releasing the repression of FT and promoting oral bud breaking [102,103]. Our expression analysis con rmed that FT is signi cantly induced during chilling-mediated oral bud breaking in P. mume.
To further understand the regulatory module of FT during oral bud breaking, we used WGCNA and identi ed a number of candidate genes whose expression patterns strongly correlated with FT in P. mume. Among these candidates, PmDAM1, PmDAM4, PmDAM5, and PmDAM6 were found to be downregulated during the progression of bud breaking. Another MADS-box gene PmSVP displaying a similar expression pattern to that of PmDAMs was reported to maintain bud dormancy in apples [104]. Thus, PmDAMs and PmSVP may function as FT repressors in the same manner as in Arabidopsis by binding to the CArG box in the promoter region of PmFT [105]. A number of genes previously identi ed upstream FT, including PmCOL, PmGI, and PmCIB1, were found to be induced by chilling in endodormant buds before the activation of PmFT. These genes may act directly or indirectly to activate FT expression during dormancy release in P. mume. We also observed that some known FT regulated genes, namely, AP1, SOC1, and LFY, peaked before the induction of FT, indicating their functional role during ower bud development prior to bud breaking [6,106]. Additionally, a number of FT co-expressed genes were annotated to pathways that did not show relatedness to bud breaking or owering in previous studies. Future functional studies are required to characterize the regulatory mechanisms of FT in oral induction and bud breaking in Rosaceae tree species.

Conclusion
In this study, we systemically characterized the PEBP gene family in nine Rosaceae species and examined their gene structure, protein features, evolutionary trajectories, and expression pro les. The 56 PEBP genes can be divided into three major clades, namely, FT-like, TFL1-like, and MFT-like genes. We observed highly conserved protein motifs and gene structure among PEBP genes. Selection scans showed that positive selection is driving the divergence of the FT and TFL1 clades, while strong purifying selection is restraining diversi cation within most lineages. Expression analysis of PEBP genes suggested the essential role of FT in oral bud development and blooming. Furthermore, we identi ed a number of FT co-expressed genes, revealing a FT-related regulatory model in Prunus species different from those in annual or biennial plants. In summary, the comprehensive analysis of the PEBP family in our study provided evidence of structural and functional conservation of PEBP genes among Rosaceae woody perennials and provided insight into the adaptive evolution of the PEBP gene family over the evolutionary history of perennial trees.

Phylogenetic analysis
Multiple sequence alignment was performed using the protein sequences with software MUSCLE v3.8 [120] and was visualized with GeneDoc v2.6 [121].
Phylogenetic trees were constructed using neighbor-joining (NJ) method with MEGA7 [122], maximum likelihood (ML) analysis with RAxML v8.1 [123], and Bayesian inference (BI) with MrBayes 3.1 [124]. The BI method was performed with 100,000 generations of MCMC processes. Bayesian inference was performed with 100,000 generations of Markov-chain Monte Carlo (MCMC) simulations, discarding the rst 2500 trees as 'burn-in'. With consistent tree topologies inferred by these three approaches, the neighbor-joining tree was chosen to display the phylogeny of Rosaceae full PEBP protein sequences. Furthermore, amino acids within the regions of predicted PEBP domains were extracted and used to construct a PEBP domain tree by the NJ method.

Gene structure and protein motif detection
The exon and intron locations of PEBP genes were analyzed by comparing the coding sequences with their genome sequences. The MEME (Multiple Expectation Maximization for Motif) online tool (http://meme-suite.org/tools/meme) was used to predict protein motifs [125]. The protein motifs were further annotated with the Pfam [118], SMART [117] and CDD [119] online tools. The chromosome distributions of PEBP genes were obtained based on genome GFF3 les. Finally, the gene structures, protein motifs, and chromosome locations were visualized with the software TBtools [126].

Microsynteny analysis and codon usage evaluation
To identify the synteny of PEBP family genes among species, we performed all-to-all BLASTP between the genomes of A. thaliana, Rubus occidentalis, M. domestica, P. avium, P. persica, P. armeniaca, and P. mume. We also performed self-blast by comparing protein-coding genes against their own genome using BLASTP. All BLASTP hits with e-values <1e -10 were used as input for software MCScanX (Multiple Collinearity Scan toolkit) [127] to identify possible collinear blocks within and between genomes of different species. Based on the self-blast results, we classi ed the duplication origin of orthologous genes pairs including PEBP family genes with the 'duplicate_gene_classi er' toolkit built in MCScanX for each species. All intra/inter-genomic synteny relationships were visualized with TBtools [126].
Gene parameters including the GC content (total GC%, GC1%, GC2%, and GC3%), CAI, and ENC were computed using CAICal (http://genomes.urv.cat/CAIcal/) [128,129]. CAI provides an estimate of directional translational selection in optimizing the codon usage patterns of genes and is used to predict highly expressed genes [70]. ENC is a number between 20 to 61 that measures the degree of codon usage bias (where ENC=20 refers to the preference of only one codon per amino acid, while ENC=61 refers to complete unbiased codon usage) [130]. We compared these gene parameters for FT, TFL1, CEN, BFT, and MFT gene lineages and across species using the Kruskal Wallis Test with the 'kruskal.test' function in R. The relative synonymous codon usage (RSCU) is de ned as the ratio of the observed codon frequency to the expected frequency of all synonymous codons per amino acid and is calculated using software MEGA7 [131].

Molecular evolution of PEBP genes
To investigate the signatures of positive selection on Rosaceae PEBP genes, we extracted the coding sequences of PEBP genes and aligned them with MUSCLE v3.8 [120]. The sequence alignment was then trimmed with Gblocks [132] in 'codon' mode, and the resulting alignments were used to infer phylogenetic relationships with RAxML [123]. The ratios (ω) of nonsynonymous substitution sites (dN) and synonymous substitution sites (dS) were computed for each PEBP lineage gene using the branch model, site model and branch-site model with the codeml package in PAML 4.0 [133]. To test the hypothesis of adaptive evolution in speci c PEBP lineages and across sites, we performed likelihood ratio tests to evaluate the t of branch models (FT, TFL1, CEN, BFT and (FT, TFL1, CEN, BFT) set as foreground branch), site models, and branch site models. The positively selected sites were detected by Bayes Empirical Bayes analysis in PAML 4.0 [133]. To better visualize the site-speci c selection on amino acids within each PEBP lineage, we performed a selection pressure test with site model M8 and visualized the results with Selecton Server [134].

Cis-element analysis of the FT promoter region
To investigate the conservation of the cis-regulatory model of FT genes across different species, we extracted the 2 kb upstream region of the start codon (ATG) and submitted the sequences to the PlantCARE [135] and PlantPan 2.0 databases [136]. The cis-acting elements predicted by both methods were integrated and considered as putative cis-acting elements.

Tissue-speci c expression pro les of PEBP genes
The tissue transcriptome sequencing data of P. mume, P. yedoensis, P. persica, and Rubus occidentalis was retrieved from four independent studies: GSE4760162 from the GEO database [108] and SRP136962, SRA053230, and SRP149938 from the NCBI SRA database [109,137]. The raw SRA les were rst dumped to FASTQ format using SRA toolkit and preprocessed with Trimmomatic v0.38 [138] to trim off poor-quality reads. Clean paired reads were aligned with the reference genomes of P. mume, P. yedoensis, P. persica, and Rubus occidentalis, respectively, with software HISAT2 [139]. The genic count was computed with HTSeq [140] and normalized to RPKM with R package 'edgeR' [141]. The RPKM value of each PEBP gene across different tissues of P. mume, P. persica, P. yedoensis, and Rubus occidentalis was extracted and visualized using the 'pheatmap' package in R. The relative expression of PEBP genes in leaf, stem, root, and oral bud tissues was tested in P. mume using real-time PCR analysis with detailed procedure described below.
Expression analysis of PEBP genes during the ower bud development process To further understand the functional role of PEBP genes in oral bud initiation and the bud ushing process, we performed real-time quantitative PCR analysis to examine the temporal expression patterns of PEBP genes. Lateral oral bud samples were collected from P. mume 'Fei Lve' tree grown in the Jiufeng sunlight greenhouse approximately every four weeks from July 10 th , 2019 to January 12 th , 2020. and ending 20 °C. We used protein phosphatase 2A (PP2A) as an internal reference and calculated the relative transcription levels of target genes using the 2−ΔΔCt method [142]. The primers used for qRT-PCR experiments are listed (Table S7).
Co-expression network of FT during the blooming process in P. mume To investigate the functional role of FT during oral bud ushing, we obtained the transcriptome data of four successive stages during dormancy release and blooming in P. mume from a previous study reported by Zhang et al. (2018). The procedure of sample collection, RNA extraction, sequencing library construction, quality control, and gene expression quanti cation was described in detail [55]. We normalized the gene expression and performed weighted gene co-expression network analysis with WGCNA v1.67 package in R [143]. The Dynamic Tree Cut algorithm was applied to detect gene modules (power β of 4; height cutoff of 0.3; minimal module size of 30). To identify the key modules coexpressed with FT, we calculated the module-trait association and ranked genes by their correlation with the FPKM value of PmFT. Finally, the top 50 candidate genes (R 2 > 0.6) coexpressed with PmFT and 15 FT interacting factors identi ed in Arabidopsis owering pathways [56,57] were selected to construct the coexpression network of FT. The FT regulatory network was visualized with Cytoscape 3.1 [144]. The expression levels of FT and putative co-expressed genes were further validated by qRT-PCR analysis. The primers are described in the supplementary data (Table S8).

Declarations
Ethics approval and consent to participate Not applicable

Consent for publication
Not applicable Availability of data and materials The datasets analyzed during the current study are available from the corresponding author on reasonable request.

Competing interests
The authors declare that no competing interests exist.

Funding
This work was nancially supported by the Fundamental Research Funds for the Central Universities (NO. BLX201828) and Special Fund for Beijing Common Construction Project.

Authors' contributions
Zhang M designed and conducted the study; Li P and Yan XL performed the blast analysis and sequence curation; Wang J and Cheng T provided help with transcriptome analysis; Zhang Q supervised the project and revised the manuscript.  The organism column indicates in which organism the motif was characterized.   Phylogenetic tree of PEBPs from Rosaceae species and A. thaliana constructed by the neighbor-joining method. All PEBP proteins can be clustered into three clades and ve subfamilies Figure 2 Phylogenetic tree of PEBPs from Rosaceae species and A. thaliana constructed by the neighbor-joining method. All PEBP proteins can be clustered into three clades and ve subfamilies   Inter-genomic synteny blocks related to PEBP family genes in A. thaliana, R. occidentalis, M. domestica, P. avium, P. persica, P. armeniaca and P. mume.
Chromosomes of Rosaceae species are labeled as Ro, Md, Pv, Pp, Pa, and Pm and are colored differently. We used purple, red, orange, green, and blue lines to connect collinear blocks containing MFTs, FTs, CENs, TFL1s, and BFTs, respectively.
Chromosomes of Rosaceae species are labeled as Ro, Md, Pv, Pp, Pa, and Pm and are colored differently. We used purple, red, orange, green, and blue lines to connect collinear blocks containing MFTs, FTs, CENs, TFL1s, and BFTs, respectively.         Co-expression network of FT during oral bud blooming in P. mume. (a) Cytoscape visualization of candidate genes co-expressed with PmFT during dormancy release. Candidate genes from the 'brown', 'dark green, 'green-yellow, 'turquoise', and 'cyan' modules are colored in brown, green, green-yellow, turquoise, and cyan, respectively. The circle size represents the signi cance of gene expression correlation with PmFT. (b) Expression patterns of PmFT and putative coexpressed genes during oral bud blooming. (c) Relative expression of PmFT and putative co-expressed genes veri ed by qRT-PCR analysis.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.