A single mutation in ACTR8 gene results in lineage-specific expression in primates CURRENT STATUS: UNDER REVISION

Background: Alternative splicing (AS) generate various transcripts from a single gene and thus plays a significant role in transcriptomic diversity and proteomic complexity. Alu elements are primate-specific transposable elements (TEs) and can provide a donor or acceptor site for AS. In a study on TE-mediated AS, we recently identified a novel Alu Sz6-exonized ACTR8 transcript of a crab-eating monkey ( Macaca fascicularis ). In the present study, we sought to analyse the molecular mechanism of Alu Sz6-derived exonization in ACTR8 gene. Results: We performed RT-PCR and genomic PCR using the tissue RNA and DNA samples from the crab-eating monkey and various other primates, including humans, to study Alu Sz6-derived exonization in ACTR8 gene and transcript variants expression. Alu Sz6 integration was estimated to have occurred before the divergence of simians and prosimians. The novel transcript was expressed only in Old world monkeys and apes, and humans. Lineage-specific expression of ACTR8 gene was due to a ‘G’ duplication in the Alu Sz6 sequences of Old world monkey and ape lineages. Six alternative transcripts (TV1-TV6) generated by various AS mechanisms were newly identified. Based on in-silico analysis, the alternative transcripts were transcribed into new isoforms with C-terminus deletion. Conclusion: ‘G’ duplication together with TE exonization and AS via various mechanisms resulted in a different fate of ACTR8 gene expression during primate evolution.


Background
Alternative splicing (AS) is a molecular mechanism producing various transcripts and diverse proteins from a single gene and plays an important role in genomic and phenotypic complexity [1,2]. In the human genome, more than 95% of pre-mRNAs undergo AS [3,4]. AS mechanisms are classified into five types, including exon skipping, alternative 3′ splice site (SS), alternative 5′ SS, intron retention, and mutual exclusion [5,6]. According to a recent studies, transposable elements (TEs) are closely related to the AS mechanisms [7][8][9]. A transcriptome study of the crab-eating monkey reported that 10% of AS events were associated with TEs [10].
TEs are mobile genetic elements that can change their position in the genome and thus can affect the sequence and structure of genes. Accordingly, they can modulate gene functions in a relatively short time and are considered as an evolutionary driving force [11]. They make up over 45% of the human genome and are categorized into two classes; Class Ⅰ TEs or retrotransposons, which require reverse transcription for their activation, and Class Ⅱ TEs of DNA transposons, which encode transposases responsible for their excision and insertion. Retrotransposons comprise approximately 42% of the human genome and include endogenous retroviruses (ERVs), long interspersed elements (LINEs), short interspersed elements (SINEs), as well as SINE-R, variable number of tandem repeats (VNTRs) and Alu elements (known as SVA elements for SINE/VNTR/Alu) [12]. Alu and SINE elements are the most successful of all TEs in the primate genome. Alu elements are slightly less than 300 base pairs (bp) in length, present at over than 1 million copies, and widely dispersed within introns and genes in the human genome [13]. Alu elements generally consist of distinct monomeric left and right arms, an A-rich linker, and a poly (A) tail. These features facilitate gene expression regulation by AS and alternative polyadenylation from Alu elements [14,15]. Non-allelic Alu/Alu recombination can cause genomic instability and has contributed to primate genome divergence [16]. DNA rearrangements caused by Alu elements contribute to genetic disease [17,18]. Thus, Alu elements are involved in transcription regulation and affect gene function through the generation of new isoforms.
In a study on TE-mediated AS, we recently identified a novel AluSz6-exonizaed ACTR8 transcript in a crab-eating monkey. Actr8 (Actin-Related Protein 8 Homolog, Arp8) is a member of the Actin superfamily and has 15-72% sequence identity with Actin, which is structurally and evolutionary similar [19]. As actin-related proteins (Arps) contain the ATP-binding pocket termed the actin fold, their functions are distinct from that of Actin [20]. Unlike Actin, Arps are predominantly located in the nucleus and have been associated with nucleosome remodeling, histone acetylation, histone variant exchange, transcription regulation, and DNA repair [21][22][23]. Actr8 is a key component of the INO80 complex, which has critical functions in DNA replication, repair, and recombination as well as in transcription and heterochromatin maintenance [24,25]. Actr8 is involved in the ATPase activity of the INO80 complex and recruits the complex to the DNA damage sites, and mutation or deletion of Actr8 has effects similar to INO80 deletion [26]. Further, Actr8 is involved in transcriptional activation 4 via regulating promoter architecture and contributes to the regulation of cellular function via cytoskeleton organization [27]. Despite the numerous functional studies on Actr8, information on the alternative transcripts and evolution of the ACTR8 gene is limited. Therefore, this study aimed to explain a series of mutational events in the ACTR8 gene during primate evolution.

Structural analysis of the ACTR8 gene in various primates
We conducted a structural analysis of the ACTR8 gene in nine primates, including humans, which revealed that AluSz6 is located in the 7th intron region in antisense orientation through the NCBI genome database (Fig. 1A). ACTR8 gene is composed of 13 exons. The length of untranslated region (UTR) differs between species, but the open reading frame (ORF) region is highly conserved and encodes 624 amino acids. Remarkably, squirrel monkey ACTR8 gene has 12 exons and encodes a short protein of 616 amino acids. Transcripts containing AluSz6 were not found in all primates. Next, we performed genomic PCR to determine the integration time of AluSz6 using the nine primates genomic DNA samples. Amplicons containing AluSz6 were detected in all the primates studied, including hominoids (human, chimpanzee, and gorilla), Old world monkeys (rhesus monkey, crabeating monkey, and African green monkey), New world monkeys (marmoset and squirrel monkey), and prosimian (ring-tailed lemur) (Fig. 1B). These findings indicated that AluSz6 integrated into the primate genome before the divergence of simian and prosimian lineages.

Alternative transcripts containing the Alu-derived transcript of the ACTR8 gene
To confirm the occurrence of AluSz6-derived exonization of ACTR8 gene in the crab-eating monkey, reverse transcription (RT-)PCR was performed using two validation primer pairs (V1 and V2) ( Fig. 2A and Additional file 1: Table. 1). The V1 primers were designed to identify transcript variants and the V2 primers were used to detect the transcripts containing the Alu-derived exon. In total, seven transcripts were identified in the crab-eating monkey; the V1 primers yielded five transcripts and the V2 primers two (Fig. 2B). Sequence analysis of the transcripts revealed that the variants originated from multiple AS events, including exon skipping, alternative 3′ SS and 5′ SS, intron retention, mutual exclusion, and Alu-exonization (Fig. 2C) [5,6]. The TV1 transcript skips exon 8 and 9a through 5 alternative 3′ SS and is 19 bp longer than exon 9. The TV2 transcript has exon 7a and an AluSz6derived exon, which were generated by mutual exclusion and Alu-exonization, respectively. TV3 and TV4 have the same AluSz6-derived exon, but carry exon 9 and exon 9a, respectively, through differential alternative 3′ SSs. TV5 is generated by simultaneous AluSz6 exonization and intron retention. TV6 has longer AluSz6-derived exon due to a differential alternative 5′ SS.
Generally, Alu-derived exonization transcripts exhibit tissue-specific expression patterns [28]. Therefore, we profiled ACTR8 gene expression in various tissues of the crab-eating monkey, including cerebellum, cerebrum, heart, kidney, lung, pancreas, spleen, and testis. Specific RT-PCR primers for each transcript variant were designed based on the splice junction (Fig. 2C). RT-PCR analysis did not reveal tissue-specific ACTR8 gene expression; the original transcript was ubiquitously expressed in all tissues evaluated, whereas the variants TV1-TV6 showed low or no expression (Fig. 2D). We further investigated ACTR8 gene expression in the cerebellum of other primates (humans, rhesus monkey, African green monkey, marmoset, and squirrel monkey) using RT-PCR with transcript variant-specific primers (Fig. 2E). In humans and Old world monkeys, all transcript variants showed expression patterns similar to that in the crab-eating monkey. Remarkably, in New world monkeys, only the original transcript was expressed.
Thus, the original transcript was ubiquitously expressed in all species studied, whereas transcript variants showed lineage-specific expression.

Lineage-specific ACTR8 gene transcript expression
Comparative sequence analysis of AluSz6-derived exon in nine primates was conducted by multiple sequence alignment ( Fig. 3 and Additional file 1: Fig. S1). A novel 'G' duplication was found at the 5′ SS of AluSz6-derived exon in Old world monkeys and apes, providing a new canonical 5′ SS. In New world monkeys, this duplication was not present and hence, the canonical 5′ SS was not created.
In the squirrel monkey, exon 2 and 3 were found to be longer than in the other primates (Fig. 1A).
Therefore, we examined the splice site of each exon. Interestingly, squirrel monkey has specific sequence, 'TA' acceptor splice site, whereas Old world monkeys and apes have no 5′ SS in this region (Additional file 1: Fig. S2). Marmoset and lemur have 'TA' acceptor splice site like the squirrel monkey, 6 but we did not experimentally confirm that whether marmoset and lemur have longer exon 2 and 3 (Additional file 1: Fig. S2).
The TV2 transcript carrying exon 7a showed lineage-specific expression. The splicing sites (donor and acceptor site) were well conserved in all primates evaluated (Additional file 1: Fig. S1), and the branch point was analysed using SVM-BP finder as it may be caused by differences in the surrounding sequences. Multiple candidate branch points, including "TTATAAGAT", were identified. This sequence was located 21 bp upstream of the 3′ SS of exon 7a (Additional file 1: Fig. S1). Old world monkeys and apes, but not New world monkeys and prosimians, acquired this branch point. Probably, a lineagespecific mutual exclusion exon, exon 7a, may have been spliced due to the branch point difference (Additional file 1: Fig. S1).
Notably, the functional domains (ATP-and nucleotide-binding sites), which are the most crucial for ACTR8 function, are well preserved in TV2-TV6. According to a previous study, the N-terminal region of ACTR8 is critical for functional activity and N-terminal deletions have deleterious effects on the expressed protein, whereas deletions in the C-terminal region did not have such effects [29-31].
Based on our findings and previously reported experimental results [32,33], we suggested that the ACTR8 gene can produce a lineage-specific protein by AluSz6 integration and subsequent splicing events.

Discussion 7
Alu elements belong to the SINE family of TEs and are primate-specific repetitive elements. In humans, they make up 11% of the genome. Alu elements emerged 65 million years ago before the radiation of primates by a fusion of 5′ and 3′ ends of the 7 SL RNA gene [34]. Alu elements have been continuously inserted into host genomes, which has been contributed to an increase in transcriptome diversity and resulted in positive effects on primate evolution [35][36][37][38]. However, insertion of individual Alu elements can also negatively impact genome stability through the induction of gene mutation, potentially leading to genetic diseases and cancer [13,36,39,40]. Indeed, when Alu elements integrated into the intron, it was known to have very little effect on the phenotype [41]. Aluderived exons tend to be alternatively spliced at a higher frequency than the original exon [42,43].
AluSz6 is classified as the second oldest Alu lineage, which is estimated to be ~ 30 million years old and contains 551,383 full-length copies in humans [44]. in Old world monkeys and apes. Therefore, AluSz6 has existed in ACTR8 integrated into the ACTR8 gene has existed in an sleeping state as "Junk DNA" in the genome, and it has been expressed as awake "ACTR8 gene" in old world monkey and ape lineages due to the 'G' duplication, a tiny mutation (Fig. 3). This duplication was carefully re-confirmed by comparative analysis of the AluSz6 sequences of UCSC Genome Browser in different primate species (Additional file 1: Fig. S4). Duplication events can be induced by two molecular mechanisms: secondary unequal homologous recombination or DNA replication-derived errors [40,51,52]. We found that the single 'G' duplication in AluSz6 is a result of DNA replication and increased ACTR8 transcript diversity in Old world monkeys and apes, specifically  (Fig. 2D, E).
Actr8 operates in linking to H2A via the C-terminal and the N-terminal binds to the DNA. Related studies have reported that there is no effect of C-terminal loss on functional activity [29,33].
Interestingly, our newly identified transcript variants obtained the expressing ability and it was predicted to express a short protein with C-terminal deletion by the premature termination codon (PTC). To understand the lineage-specific evolution of Actr8 in primates, we identified and analysed Actr8 protein isoforms in various mammals (mouse, rat, dog, horse, cow, and pig) and we identified Actr8 protein isoforms (Fig. 5). As expected, Old world monkeys and apes expressed various isoforms, whereas other species did not have the diverse isoform or the C-terminal deletion isoform. This result explain the appearance of the novel ACTR8 protein in primates.
Taken together, our findings indicate that the AluSz6 integration event occurred prior to primate radiation. The birth of tiny 'G' duplication and creation of the Alu-derived exon provided lineagespecific splicing site in the Old world monkeys and apes during the primate evolution of ACTR8 gene ( Fig. 6). At the same time, Old world monkeys and apes acquired a branch-point of "TATATAAGAT" sequence, and these species finally gained lineage-specific transcripts. In addition, New world monkeys have lineage-specific transcripts because of specific sequences that function as splice donor sites.

Conclusion
The current study revealed step-by-step evolutionary events in ACTR8 gene that contributed to transcriptome diversity and the generation of novel isoforms of this gene in primates.

Methods
Total RNA and genomic DNA extraction 9 The total RNA samples isolated from different human tissues including the cerebellum, cerebrum, heart, kidney, lung, liver, spleen, and testis were purchased from Clontech Laboratories, Inc. The total RNA of non-human primates was extracted from the specified tissues: the crab-eating monkey (cerebellum, cerebrum, heart, kidney, lung, pancreas, spleen, and testis), the African green monkey (cerebrum), and the common marmoset (cerebrum). The RNA samples were isolated using the  The GAPDH gene was used as a standard control and analysed using specific primer pairs (S: 5′-GAA ATC CCA TCA CCA TCT TCC AGG-3′, AS: 5′-GAG CCC CAG CCT TCT CCA TG-3′) designed based on the human GAPDH sequence.

Molecular cloning and sequencing
For the validation and sequencing of the PCR products, all PCR products were separated on a 1.5% agarose gel, purified with the Gel SV Extraction kit (GeneAll), and cloned into the RBC T&A Cloning Vector. The cloned DNA was isolated using the Plasmid DNA Mini-prep kit (GeneAll) and sequenced by Macrogen. The validation of primate DNA samples and alternative transcripts was performed as mentioned above.

Acquisition and quantitative analysis of PCR-gel images
The separated PCR products on the ethidium bromide-stained gels were scanned using the gel image software (Vision-capt, Vilber). Band intensity was calculated by the volume and relative intensity compared to the volume %.

Branch-point analysis
Splicing is catalysed by the spliceosome, a large RNA-protein complex, and this complex binds to the branch point site, which are short sequences at the upstream of the acceptor splice sites (3′SS). As

Comparative in-silico analysis of Actr8 protein in different species
Actr8 protein sequences were collected from the NCBI data base in FASTA format for the following different species: human, chimpanzee, gibbon, crab-eating monkey, rhesus monkey, African green monkey, marmoset, rat, dog, horse, cow, and pig. The BioEdit program was used for multiple sequence comparison. Pairwise alignments were used to determine the identity and similarity between each species.

Consent for publication
Not applicable.

Availability of data and materials
All figures and tables generated in this study are available in this article and its additional files.

Competing interests
Authors declare that they have no competing interests.    AluSz6 of RTL could not be analyzed for unenrolled reason, but we confirmed AluSZ6 sequence of RTL through the experiment.   Schematic representation of the experimental methods and in silico analysis of the ACTR8 isoform in primates, mouse, rat, dog, horse, cow, and pig. Phylogenetic analysis of the ACTR8 isoform in mammals. Orange represents the truncated isoform with C-terminal deletion.
25 Figure 6 Schematic representation of the evolutionary event in primate ACTR8 genes. Events during the primate evolution of the ACTR8 gene are summarized.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download.