The evolutionary analysis of PEBP Gene Family among Rosaceae tree species

Phosphatidylethanolamine-binding proteins (PEBPs) are a common gene family found among animals, plants and microbes. Plant PEBP proteins play an important role in regulating flowering time, as well as seed and bud dormancy. PEBP proteins can be divided into three major clades: FLOWERING LOCUS T-like (FT-like), TERMINAL FLOWER1-like (TFL1-like), and MOTHER OF FT AND TFL1-like (MFT-like). Though PEBP family genes have been well studied in Arabidopsis and other model species, their functional role in perennial trees is not fully understood. To characterize the evolution of PEBP genes and their role in flowering control among Rosaceae species, we identified a total of 46 PEBP members in seven Rosaceae species. Sequence and gene structure analysis revealed highly conserved intron/exon distributions and featured motifs among Rosaceae PEBP proteins. Analysis of synonymous/nonsysnonymous substitution rates showed purifying selection constraining divergence within most lineages, while positive selection appears to have driven divergence of FT-like and TFL-like genes from the MFT clade. The expression of PEBP genes varied among different tissues indicating their functional divergence during gene family evolution. Furthermore, by employing a weighted gene co-expression network approach, we inferred a putative FT regulatory module essential for dormancy release and floral induction in P. mume. Our study sheds new light on the evolution of PEBP genes and their functional roles in controlling flowering time among Rosaceae tree species. , P. persica , P. mume , P. yedoensis , P. avium and P. dulcis . Gene notation was assigned to each PEBP based on their Arabidopsis ortholog.


Introduction
Phosphatidylethanolamine-binding proteins (PEBPs) form a superfamily of genes containing the PEBP domain, which is highly conserved across taxa, from bacteria to insects to mammals and plants [1][2][3]. Mammalian PEBPs are globular proteins composed of a functional binding site for acetate, phosphate groups and phosphorylethanolamine [4,5]. Plant PEBP homologs share similar conserved motifs except their C-terminal part is deleted [6,7]. The animal PEBP proteins were reported to function as serine protease or Raf kinase inhibitors controlling cell growth and differentiation [8][9][10][11] Rosaceae tree species, it is of great interest to dissect the evolution and their roles in contributing to flowering time variation in Rosaceae tree species. Although PEBP gene family has been recognized as key floral regulators in model plant species, their molecular evolution and function remains less clear in Rosaceae perennials. Here we provide a systematic study on the molecular evolution and function of PEBP gene family in Rosaceae tree species. We identified PEBP gene family across seven Rosaceae species and evaluated the conservation of their sequence, gene structure, and protein motifs. We then used synonymous/nonsynonymous substitution ratios to test for selection signatures on PEBP family genes. We further examined the expression pattern of PEBP genes across different organs and inferred a co-expression network of FT during dormancy break and floral induction in P. mume.

Characterization of PEBP genes in Rosaceae species
By combing HMM and BLAST searches, we identified 45 PEBP-like proteins across seven Rosaceae species (Table 1). Each putative gene was validated by blasting against SMART, Pfam and NCBI CDD to ensure they contain PEBP domains. We then assigned all Rosaceae PEBPs to their closest Arabidopsis homologs (Figure 1). In total, we retried 45 Rosaceae PEBPs including 9 FT/TSF-like, 9 TFL-like, 9 CEN-like, 8 MFT-like and 10 BFT-like genes (  (Table 1).

Phylogenetic analyses
Phylogenetic trees were constructed based on protein sequence alignment of Rosaceae PEBPs using three approaches: neighbor-joining, maximum likelihood, and Bayesian inference methods (Figure 2; Figure S1). All three phylogenetic trees shared similar topology ( Figure 2; Figure S1). The phylogenetic tree inferred showed that the total 51 PEBP proteins can be clustered into three major clades, FT-clade, TFL-clade, and MFTclade ( Figure 2). The FT-clade could be further split into FT/TSF-like genes and BFT-like genes, TFL-clade into TFL1-like and CEN-like genes ( Figure 2). Within each subfamily, Prunus genes grouped closely together, while genes from Maleae species formed a separate groupFigure 2). Among Prunus PEBPs, proteins of P. dulcis and P. persica from Amygdalus subgenus first grouped together, then with that of P. mume, and with proteins of P. yedoensis and P. avium from Cerasus subgenus ( Figure 2).

Structural analysis of PEBP family genes
Roseaceae PEBP family genes displayed conserved genomic structure and high amino acid sequence similarity with each other ( Figure S2; Figure 3). The length of coding regions of PEBPs ranged from 519 to 579 bps, with FT-like genes between 525 to 529bps, MFT-like genes between 519 to 579 bps, BFT-like genes between 522 to 525 bps, TFL1-like genes between 519 to 534 bps, and CEN-like genes between 522 to 528 bps. Gene structure analysis revealed a rather loose gene structure among all PEBPs consisting of four exons and three introns ( Figure S2). For example, BFT-like genes harbor shortest introns of total length larger than 200 bp ( Figure S2). Most Rosaceae PEBP genes were located on different chromosomes or scaffolds except MdTFL2 and MdFT collocated on chromosome 12, PpCEN and PpFT co-located on chromosome 6 ( Figure S3).

Molecular evolutionary analysis of PEBPs
To investigate the evolution of PEBP genes in Rosaceae species, we performed selection scans using branch model, site model, and branch site model in CODEML program of PAML (Table 2; Table S1; Table S2). Branch models specifying different ω parameters for foreground lineages (i.e. FT-like, TFL-like, CEN-like, and BFT-like lineages, and (FT, BFT, TFL, CEN))) were compared with fixed ratio model (Table S1). The likelihood ratio tests (LRT) on models specifying individual lineages of FT, TFL, CEN, BFT genes as foreground branch showed no significant ω difference between foreground and background branches (P>0.05) (Table S1). However, the LRT test on branch model 5 suggested that there is significant divergence between genes in the FT/TFL clade and those in MFT clade (P<0.001) (Table S1). We then applied the site model LRT and found no significant differences in ω values among sites of the PEBP coding sequence (Table S2). However, the branch-site LRT test revealed selection at specific sites in FT/TFL lineages ( Table 2). The Bayes Empirical Bayes model suggested modest selection at at position 28, and position 10, 108, 128 when FT/TFL-clade and TFL lineage were set as foreground branch (Table 2).

Cis-acting element analysis of FT promoter
We extracted the 2000 bp region of FT genes and scanned for putative cis-element by searching against PlanPan and the PlantCARE database (Table 3). We compared the type and copy number of cis-element for 10 FT genes from A. thaliana, P. trichocarpa, M. domestica, Pyrus communis, P. mume, P. persica, P. dulcis, and P. yedoensis (Table 3).
Within the promoter region of investigated FTs, five to seven CCACA boxes (binding site for CO) were identified across 8 species, while none were found within 2kb promoter region of PtFT2 (Table 3). CArG boxes, binding site for MADS transcription factor, were found among all FT promoters, with AtFT promoter containing the most (Table 3). In addition, binding sites for MYB, MYC transcription factor, and ethylene-responsive transcription factor were present in all FT promoters (Table 3). Gibberellin-responsive elements of different types were present in all promoters: GARE-motif in promoter of MdFT, PcFT, PyFT1 and PyFT2, while P-box detected in promoter of AtFT, PtFT2, PmFT, PpFT, PyFT1 and PyFT2. Additionally, some cis-element showed species-specific distribution pattern. For example, low-temperature responsiveness element was only detected within promoter of AtFT, PcFT1, and PmFT (Table 3), while W-box, binding sites for WRKY transcription factor, were detected exclusively among Prunus FT promoters and PtFT2 (Table 3).

Tissue-specific expression pattern of PEBPs
To explore the functional role of PEBP genes, we examined their expression pattern in different tissues of three Prunus species P. persica, P. mume, and P. yedoensis (Fgiure 4).
In general, all tissues had expression of at least one of the PEBPs (Figure 4). and 'turquoise' were not highly correlated with that of PmFT (R 2 < 0.62) (Figure 5a). The FT co-expressed genes were annotated to other biological processes including phosphomonoester hydrolysis, membrane trafficking, and ethylene signaling pathways (Table S3) Co-expression network of FT in regulating flowering in P. mume The flowering time regulation pathway has been studied in many species, but is best characterized in Arabidopsis [57,71]. It is well understood that FT is a hub gene integrating four major pathways, including photoperiodic pathway, temperature pathway, autonomous pathway, and gibberellin pathway [56]. The expression of FT is facilitated by transcription factor CONSTANS (CO), which peaks at the end of the day under long photoperiod [72,73]. Other genes involved in temperature pathway, SVP (SHORT VEGETATIVE PHASE), FLC (FLOWERING LOCUS C), and PIF4 (PHYTOCHROME INTERACTING FACTOR 4) can also regulate FT transcription through directly binding FT promoter or intronic regions [72,[74][75][76]. Upon induction, FT travels from the leaves to the shoot apical meristem and binds to FD to regulate meristem identify genes such as AP1 ( APETALA 1) to produce reproductive organs [77,78]. Though flowering regulatory module concerning FT is well understood in annuals or biennial plants, it is still unclear in temperature tree species. Unlike annual or biennials, floral initiation in temperate trees starts in the preceding summer, floral buds enter dormancy during winter, and then bloom in the spring [79]. Thus, flowering in temperate woody perennials involves chilling-induced dormancy release, resumption of floral organ growth, and production of reproductive gametes, which is more complex than flowering in annuals or biennials [79,80].
To gain insights into the regulatory network of FT during floral induction in perennials, we used WGCNA and identified a number of candidate genes whose expression pattern strongly correlated with FT in P. mume. Among these candidate, PmDAM1, PmDAM4,

Molecular evolution of PEBP genes
To investigate signatures of positive selection on Rosaceae PEBP genes, we first extracted the coding sequence for PEBP genes and aligned them with MUSCLE v3.8 [99]. The sequence alignment was then trimmed with Gblocks [106] in 'condon' mode and the resulting alignments were used to infer phylogenetic relationships with RAxML [102]. The ratio (ω) of synonymous substitution sites (dS) and nonsynonymous substitution sites (dN) were computed for each lineage and site of PEBP family genes using branch model, site model and branch-site model with codeml package in PAML 4.0 [107]. To test the hypothesis of adaptive evolution in specific PEBP lineages and across sites, we performed likelihood ratio tests to evaluate the fit of branch models (FT, TFL, CEN, BFT set as foreground branch), site models (M0, M1a, M2a, M7 and M8) and branch site models. The positive selected sites were detected with Bayes Empirical Bayes analysis in PAML 4.0 [107].

Cis-elements analysis of FT promoter region
To investigate the conservation of cis-regulatory model of FT genes across species, we extracted the 2kb upstream region of the start codon (ATG) and submitted the sequences to PlantCARE [108] and PlantPan 2.0 database [109]. The cis-acting elements predicted with these two methods were integrated and considered as putative cis-acting element.

Tissue-specific expression profile of PEBP genes
The RNA-seq data for different tissues of P. mume, P. yedoensis, and P. persica was retrieved from three independent studies: GSE40162 from GEO database [89], SRP136962 and SRA053230 from NCBI SRA database [90,110]. The raw SRA files were first dumped to FASTQ format using SRA-toolkit and preprocessed with Trimmomatic v0.38 [111] to trim off poor quality reads. Clean paired reads were aligned with reference genome of P. mume, P. yedoensis, and P. persica respectively with software HISAT2 [112]. The genic count was computed with HTSeq [113] and normalized to FPKM with R package 'edgeR' [114]. To investigate tissue specific expression of PEBP genes, we extracted the FPKM value for each PEBP gene across different tissues of P. mume, P. persica and P. yedoensi.
The expression profile of PEBP genes was visualized using the 'pheatmap' package in R.

Availability of data and materials
The datasets analyzed during the current study are available from the corresponding author on reasonable request.

Competing interests
The authors declare that no competing interests exist.         Table S2.pdf Figure S5.pdf Table S1.pdf Figure S4.pdf Table S3.pdf