Skip to main content

Evolution and distribution of RNA polymerase II regulatory sites from RNA polymerase III dependant mobile Alu elements



The primate-specific Alu elements, which originated 65 million years ago, exist in over a million copies in the human genome. These elements have been involved in genome shuffling and various diseases not only through retrotransposition but also through large scale Alu-Alu mediated recombination. Only a few subfamilies of Alus are currently retropositionally active and show insertion/deletion polymorphisms with associated phenotypes. Retroposition occurs by means of RNA intermediates synthesised by a RNA polymerase III promoter residing in the A-Box and B-Box in these elements. Alus have also been shown to harbour a number of transcription factor binding sites, as well as hormone responsive elements. The distribution of Alus has been shown to be non-random in the human genome and these elements are increasingly being implicated in diverse functions such as transcription, translation, response to stress, nucleosome positioning and imprinting.


We conducted a retrospective analysis of putative functional sites, such as the RNA pol III promoter elements, pol II regulatory elements like hormone responsive elements and ligand-activated receptor binding sites, in Alus of various evolutionary ages. We observe a progressive loss of the RNA pol III transcriptional potential with concomitant accumulation of RNA pol II regulatory sites. We also observe a significant over-representation of Alus harboring these sites in promoter regions of signaling and metabolism genes of chromosome 22, when compared to genes of information pathway components, structural and transport proteins. This difference is not so significant between functional categories in the intronic regions of the same genes.


Our study clearly suggests that Alu elements, through retrotransposition, could distribute functional and regulatable promoter elements, which in the course of subsequent selection might be stabilized in the genome. Exaptation of regulatory elements in the preexisting genes through Alus could thus have contributed to evolution of novel regulatory networks in the primate genomes. With such a wide spectrum of regulatory sites present in Alus, it also becomes imperative to screen for variations in these sites in candidate genes, which are otherwise repeat-masked in studies pertaining to identification of predisposition markers.


In the post genome sequence era, repetitive sequences, erstwhile considered junk and devoid of function, are increasingly being implicated in many cellular functions, genome organization and diseases [18]. Alu repeats, which belong to SINE (short interspersed nucleotide elements) family of repetitive sequences, are present exclusively in the primate genomes. These elements which are ~300 bps in length have originated from the 7SL RNA gene and comprise of two similar, but not identical subunits [912]. Each element contains a bipartite promoter for RNA polymerase III, a poly (A) tract located between the monomers, a 3'-terminal poly(A) tract, a number of CpG dinucleotides, and is flanked by short direct repeats [13, 14]. Based on certain diagnostic site mutations, they have been broadly classified into three subfamilies: Old (Alu Js), Middle (Alu S) and the Youngest (Alu Ys) [15, 16]. Further, some of the Alu Y sequences are very new and exhibit polymorphisms, indicating that they have recently undergone retropositioning process [17].

Alus have been shown to harbor a number of regulatory sites like hormone response element (HRE), and a couple of ligand activated transcription factor binding sites [1824]. These sites regulate the expression of downstream genes, in some cases in a temporal or tissue specific manner. Most of the regulatory sites in Alus have been reported during the course of characterization of specific genes [2532]. Besides, the intrinsic A-Box and B-Box RNA polymerase III (RNA pol III) sequences and the recombinogenic sites present in these elements are involved in retrotranspositional and recombination process [10].

Alus originally demonstrated to have non uniform distribution on the chromosomes through banding studies [33, 34] have been recently substantiated by genome sequence analysis [35]. It has been observed that that Alus not only show a non- random pattern of distribution in the human chromosomes but also varying densities within genes. Additionally, in a genome wide expression analysis, co-variation of expression of gene pairs has been attributed to sequence similarity metric in the upstream region of promoter predominantly contributed by Alu repeats present in these regions [36]. These effects of Alu have been shown to be completely independent of the effects of isochoric (GC) composition on Alu density as well as gene expression [3436].

Identification and analysis of various permutations and combinations of these regulatory elements in otherwise conserved repetitive Alus are mostly excluded from genetic analysis. Since, Alus occupy a tenth of the human genome, it is imperative to identify those, which might assume function in the proper context. Our primary aim in this analysis is to find out if any bias exists in the distribution of transcriptional regulatory sites in Alus of various evolutionary ages and their distribution with respect to the functional classes of genes.

Results and Discussion

Distribution of functional sites in Alus is position specific

As a first step toward examining the role of these regulatory sites, we mapped their most probable positions on Alus, using in house developed algorithms (Figure 1). This was carried out on 500 Alus, each of Alu Jo, Alu Jb, Alu Sx, Alu Sc, Alu Yb8 and Alu Y subfamilies. The classification of these evolutionarily distinct subfamilies are based on diagnostic sites [15, 16, 37, 38]. Besides, members of the most recent and retropositionally active and polymorphic Alus were also included in the analysis [39, 40]. Though the polymorphic Alus belong to Alu Y subfamily, these were treated as a separate category since insertion/deletion of these Alus have been associated with many phenotypes/diseases [2]. The regulatory sites show positional conservation across all subfamilies in which they are represented (Table 1). However, these sites are distinct from the diagnostic sites, which are used for classifying Alus, which suggests that they have not arisen randomly in different subfamilies.

Table 1 Position of sites analysed in Alu repeats in various subfamilies.
Figure 1

Representation of regulatory sites on Alu elements. 500 representative Alu sequences each of distinct evolutionary ages were selected for identification of most probable regulatory sites. 126 polymorphic Alus (POLY) from younger subfamilies which show insertion – deletion polymorphisms were also analysed. Sites were identified using local alignment based program as well as by probabilistic modelling approach. These sites are positionally conserved in all subfamilies.

Evolution of regulatory sites is biased and clustered in Alus

Nearly all the analyzed regulatory sites for RNA polymerase II (RNA pol II) are distributed in the region between A- Box and B-Box with more clustering near the B-Box region (Figure 1). There is an evolutionary age specific loss / gain of these sites in various subfamilies leading to a bias in their distribution (Figure 2). Newly transposing Alus have methylated CpG sites, which are prone to transition. Many sites seem to have evolved as a consequence of these transitions. The regulatory elements are most abundant in the middle subfamilies and least represented in the younger Alus. Some sites like AP1, ERE, nCARE are present in older and middle Alus but rarely so in the younger as well as polymorphic Alus. An opposite trend is observed for CETP, wherein the highest density is observed in the younger active and polymorphic Alus. RARE and TRE sites are retained in all subfamilies whereas LXR is specific to only middle Alu subfamilies (Figure 2). It is curious, nCARE which is also present in the 7sl RNA, the progenitor of Alus, is not equally represented in all Alus and has higher density in the older Alus and middle and is very poorly represented in the younger subfamilies.

Figure 2

Distribution of regulatory sites in various Alu subfamilies as well as polymorphic Alus. On the X-axis Alus of different evolutionary ages as well as polymorphic Alus (POLY) are represented. On the Y-axis the percentage of elements carrying these sites in various subfamilies is indicated.

Evolution from retropositionally active to transcriptionally active Alu elements

Majority of Alu retroposition has ceased at least 30 million years ago and only a few Alu subfamilies are still active [15, 17, 41]. Transcription of Alus is a prerequisite for retrotransposition and there is regulation not only during transcription initiation but also at the level of stability of transcripts [42]. Alu elements are transcribed by RNA pol III which are composed of two properly spaced conserved sequence motifs, an upstream element named the A-Box and a downstream element called the B Box which are essential for efficient transcription. Deletion of the Box B sequences within the Alu repeat completely abolishes the transcriptional activity. In the absence of box A sequences even though there is a reduction in efficiency of transcription by 10 to 20 fold, B-Box sequence is still capable of initiating transcription 70 bps upstream [43, 44]. An intact A Box is therefore a critical determinant for RNA pol III retropositional activity. Besides, it has been shown by in vitro as well as in vivo studies in the 'B' Box that 'G' and 'T' residues at the 1st and 3rd positions respectively are very critical for it's functioning [45]. Our analysis on the distribution of these promoter elements show that the polymorphic Alu sequences have the highest density of A Box (70%) and is almost absent in older subfamilies (Figure 2). Since the younger Alus are considered to be transcriptionally more active, this fits in well with the loss of this site in the course of evolution due to accumulation of mutations. The B Box motif with the sequence G(A/T)T(C/T)RANNC shows a similar trend as the A Box. Interestingly, a fraction of older Alu subfamily still retains the B-Box sequence. However, 'A' residue at the second position which has not been shown to be critical for transcription is a diagnostic nucleotide [39] for the younger subfamilies. This could result in the increased proportion for B-Box in the younger families. We observe a very curious distribution of the B Box motif if we consider the sequence GTT(C/T)GAGAC (B'Box in Figure 2) wherein we restrict the pattern to the experimentally validated sequence. Alu Sx and Alu Sc have the highest density match with this pattern, followed by the older subfamilies and it is present in only < 2% frequency in AluY and polymorphic Alus. The "C" at the 4th position in this case is mutated to "T" in the older families. The Yb8 family that has been reported to be transcriptionally and retropositionally active amongst the younger subfamilies, retains the B'-Box element in a significant fraction. This suggests that even though retropositionally competent younger Alus are hypothesized to be transcriptionally active, only a minority retains consensus B'-Box. It is possible that the enhancing activity of the A Box is sufficient to drive transcription from the weaker B'- Box in the younger subfamilies. Our findings corroborates well with an earlier study in which presence of all subfamilies in the RNA polymerase III driven Alu transcript pool was reported [46]. Additionally, it was also observed that though there was a quantitative bias towards younger subfamilies and younger members of these subfamilies (based on their relative presence in the transcript compared to their abundance in the genome), there was a preferential expression of the middle subfamilies relative to the most active subfamilies. Our observations, therefore, further rules out the hypothesis that transcription may be biased only towards retropositionally active subfamilies of Alu elements. This could be the reason why only a fraction of younger Alus is currently retrotranspositionally active. The presence and retention of B-Box coupled with near absence of A Box in the Alu Sx and AluSc families suggests basal level of transcription from these elements which could be enhanced through binding of other regulatory proteins under certain conditions such as stress [47]. Additionally, with evidence of presence of naturally occurring Alu antisense as well as edited Alu transcripts [48, 49], transcribing Alus could play a major role in yet unknown biological processes.

Exaptation of Alus in the transcriptional regulatory repertoire

Alus have been demonstrated to exert effects at transcription, post-transcription as well as at the translation level. In an earlier study on complete chromosomes 21 and 22, we have demonstrated that the Alu elements are clustered in genes of signaling, metabolic and transport proteins and rarely present in the structural and information proteins [50]. This clustering bias was found to be irrespective of genomic location, GC content, length of genes or intronic content. To further address whether the Alus harboring transcriptional regulatory sites also show a selective distribution and thereby exert effects on transcription, we analyzed their distribution in the genes of various functional categories of chromosome 22. Two different datasets 1) Promoter region Alus and 2) Intronic region Alus, harboring regulatory sites were analyzed. The promoter region Alus of genes involved in metabolism, signaling were significantly rich in regulatory sites compared to those of information, structure and transport (F value = 4.86, df = 4, 40, p-value < 0.0027). In the intronic regions, distinction in their distribution with respect to functional categories was not so significant though the intronic regions also harboured Alus containing regulatory sites (F value = 2.92, df = 4,40, p-value = 0.032). Since the genes of the signaling and metabolic pathway are more subject to regulation by cellular cues like hormonal triggers, this observation is significant. Most of the Alus in the promoters belong to the middle Alu S families and rarely Younger Alus are present. Since younger Alus also harbour few regulatory sites and actively retropose, it is possible that there is a negative selection against their insertion in the promoter sites of genes of information pathways and structural proteins [see the supplementary data].

Alu movements and aberrant gene expression

Gene inversions, duplications and formation of pseudogenes have been extensively reported to be mediated both through retrotransposition as well as recombination of Alus. This, in many cases, has also been associated with aberrant gene expression. For instance, presence of AML sites in an Alu upstream of MPO gene, has been first demonstrated to be associated with Acute Myelocytic Leukemia [20]. This is due to the presence of a strong SP1 site within AML which leads to over expression of MPO gene. AML sites are most abundant in younger and polymorphic Alus and a single base pair transition results in MPO site, present predominantly in the members of older subfamilies. In the case of polymorphic Alus, many sequences that do not show 100% conservation of AML site still retain the SP1 site. Interestingly, the core recombinogenic site is also most predominant in younger and polymorphic Alus. The presence of recombinogenic sites in polymorphic Alus, could therefore not only contribute to genome shuffling but also serve to distribute ectopic sites such as AML through retrotransposition and recombination (Figure 2).

Regulatory region distribution through Alu expansion

Analysis of regulatory sites within Alus suggests that a polymorphic Alu has the potential to transpose and recombine which allows it to integrate at random sites in the genome. They also harbour potential regulatory sites, which could evolve to become accessory sites for RNA pol II transcription as revealed by their clustering in older subfamilies. Further, the Alu sequence due to acquisition of novel functions could form a part of the transcription repertoire involved in the regulation of the downstream /associated genes and create novel regulatory networks (Figure 3). These results also corroborate with the hypothesis of evolution of transposable elements of Kidwell [51] wherein they had proposed a 3 stage life cycle of class II Transposable elements:- invasion and amplification followed by mutations and maturity and finally senescence and fading. In the case of Alu, instead of fading, they could also evolve to become members of host regulatory machinery.

Figure 3

Alu expansion and evolution of regulatory sites. With the help of LINEs, Alu may keep on retro-transposing or may get inactive/negatively selected. Alternatively, it may integrate upstream of a gene, accumulate mutations, evolve RNA pol II regulatory sites, get stabilized and control gene expression. This is supported by the presence of sparse regulatory sites, unhindered A box, recombinogenic sites initially in the younger and active Alus and its accumulation in older Alu subfamilies as well as significant presence of Alus harbouring regulatory sites in the promoter encompassing regions of the genes of signaling and metabolic pathways.


Comparison of sequences in the regulatory regions of many homologous genes in human have shown accumulation of Alus, not only post divergence from non-human primates but also during primate evolution [52]. Perhaps, recruitment of cis regulatory elements responsive to cellular cues through Alu elements could result in altered spatial and temporal transcription of genes as well as create novel metabolic and signaling networks. These might contribute to the observable physiological complexity in human and primates [53]. Additionally, the underlying events which would be defining event of speciation of human from chimpanzee (with which it shares nearly 99% homology at coding level) still eludes identification and might to some extent reside in such genomic elements. These issues can now be addressed through comparison of these sites in human and chimpanzee.

Currently, Alus are repeat-masked in all studies pertaining to identification of predisposition markers in complex disorders. With such wide spectrum of nuclear receptors, which play a major role in maintaining normal physiological state and affect as diverse processes as development, reproduction, general metabolism, residing in Alus, it therefore becomes imperative to screen for variations in these sites. This might have important consequences in the candidate genes for those complex diseases that are triggered in response to hormonal imbalances as well as other environmental cues.


126 polymorphic Alu sequences cited in literature [39, 40] were retrieved using NCBI BLAST and Repeat Masker software[54, 55]. The analysis was carried out on Alu repeats of human chromosome 22. A randomly selected representative set of approximately 500 Alu sequences, each of distinct evolutionary ages, Alu Jb, Alu Jo, Alu Sx, Alu Sc, Alu Yb8 and Alu Y were used for the analysis. Sequences were retrieved from Sanger Institute Home Page, June 2001 release [56]. Besides, Alus were also analyzed within 5000 base pairs upstream of genes of chromosome 22 in the regulatory regions encompassing promoter sequences as well as inside their intronic regions.

Collection of biologically active sites

Information about the regulatory sites and their sequences was collected from various literature sources (Table 2). Characteristic features of the sites are given below. We selected those regulatory sites, which have been shown to have function in the Alu elements. The A Box and B Box sequences define the bipartite internal promoters, which bind RNA polymerase III. MPO and AML sites, which are 14 nucleotides differ by an A / G at 5th position of the sequence and transition from G to A at this site converts the MPO allele to AML, resulting in the formation of a strong SP-1 binding site and over expression of the following gene. AP1 sites bind AP-1 transcription factor, which is a dimeric complex that contains members of the JUN, FOS, ATF and MAF protein families. Hormone responsive elements (HRE) are super family of binding sites for ligand activated nuclear hormone receptors for thyroid hormone (TRE), retinoic acid (RARE) and vitamin D, which regulate gene transcription. Estrogen response elements (EREs) are sites for binding of estrogen receptor (ER), a ligand-activated enhancer protein that is a member of the steroid/nuclear receptor super family and transactivates gene expression in response to estradiol. The negative calcium response element type 2 (nCARE) is a regulatory DNA sequence, which inhibits transcription in response to raised extra cellular calcium levels. The nuclear receptors liver X (LXR) is involved in different cell-signaling pathways. CETP site is an orphan receptor site in the Alu in promoter of cholesteryl ester transfer protein (CETP) which plays a key role in reverse cholesterol transport in mediating the transfer of cholesteryl ester from HDL to atherogenic apolipoprotein B-containing lipoproteins.

Table 2 Sequences of regulatory elements analysed in Alu repeats.

Computational methods

Two different programs were written in order to locate the most probable biologically significant regions. A local alignment based program, Xalign, was implemented in C++, Red Hat 7.3 based Linux. This program finds the probable sites by aligning the consensus of regulatory site with the query sequence. Multiple queries with a size upto 600 nucleotides can be taken at a time. Another program, Promotif, was implemented in C++, Red Hat 7.3 based Linux, using the probabilistic modeling approach. It uses the position weight matrix, normalization of the positions with conservation index (Ci Value), and inter-nucleotide dependence in terms of transition matrix to find out the sites. Position weight matrices were generated using Gibbs Motif Sampler, for every site included in the program. The sequences for position weight matrix generation were carefully selected based on the sequence and length reported for each binding site. The final length for search was fixed at the lowest length observed. This provides element specific matrix with lesser chance for the selection on non-RE regions. For the sites analyzed, it had an in built transition matrix, position weight matrix and conservation index. Batch analysis of over a thousand Alu sequences can be performed with this program.

Using the annotated sequences from literature as well as from NCBI web page, training set for the probabilistic model was created. Training was done for approximately 70% sequences and rest of the sequences were taken as test set. Details of the program along with the equations used are available on request.

Mapping of recently integrated and younger Alus

About 126 recently integrated Alus from younger subfamilies were searched in the human genome using BLASTn at NCBI server and regulatory sites were mapped in these regions using the programs discussed above.

Association analysis

Alus in the promoter regions and intronic regions of functionally classified genes [50] of chromosome 22 were mapped and pattern of distribution of biologically significant sites were analyzed by ANOVA.


  1. 1.

    Hamdi HK, Nishio H, Tavis J, Zielinski R, Dugaiczyk A: Alu-mediated phylogenetic novelties in gene regulation and development. J Mol Biol. 2000, 299: 931-939. 10.1006/jmbi.2000.3795.

    CAS  Article  PubMed  Google Scholar 

  2. 2.

    Deininger PL, Batzer MA: Alu repeats and human disease. Mol Genet Metab. 1999, 67: 183-193. 10.1006/mgme.1999.2864.

    CAS  Article  PubMed  Google Scholar 

  3. 3.

    Szmulewicz MN, Novick GE, Herrera RJ: Effects of Alu insertions on gene function. Electrophoresis. 1998, 19: 1260-1264.

    CAS  Article  PubMed  Google Scholar 

  4. 4.

    Muratani K, Hada T, Yamamoto Y, Kaneko T, Shigeto Y, Ohue T, Furuyama J, Higashino K: Inactivation of the cholinesterase gene by Alu insertion: possible mechanism for human gene transposition. Proc Natl Acad Sci U S A. 1991, 88: 11315-11319.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  5. 5.

    Wallace MR, Andersen LB, Saulino AM, Gregory PE, Glover TW, Collins FS: A de novo Alu insertion results in neurofibromatosis type 1. Nature. 1991, 353: 864-866. 10.1038/353864a0.

    CAS  Article  PubMed  Google Scholar 

  6. 6.

    Brahmachari SK, Meera G, Sarkar PS, Balagurumoorthy P, Tripathi J, Raghavan S, Shaligram U, Pataskar S: Simple repetitive sequences in the genome: structure and functional significance. Electrophoresis. 1995, 16: 1705-1714.

    CAS  Article  PubMed  Google Scholar 

  7. 7.

    Conrad M, Brahmachari SK, Sasisekharan V: DNA structural variability as a factor in gene expression and evolution. Biosystems. 1986, 19: 123-126. 10.1016/0303-2647(86)90024-9.

    CAS  Article  PubMed  Google Scholar 

  8. 8.

    Makalowski W: Genomic scrap yard: how genomes utilize all that junk. Gene. 2000, 259: 61-67. 10.1016/S0378-1119(00)00436-4.

    CAS  Article  PubMed  Google Scholar 

  9. 9.

    Labuda D, Striker G: Sequence conservation in Alu evolution. Nucleic Acids Res. 1989, 17: 2477-2491.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  10. 10.

    Schmid C, Maraia R: Transcriptional regulation and transpositional selection of active SINE sequences. Curr Opin Genet Dev. 1992, 2: 874-882.

    CAS  Article  PubMed  Google Scholar 

  11. 11.

    Schmid CW: Alu: structure, origin, evolution, significance and function of one-tenth of human DNA. Prog Nucleic Acid Res Mol Biol. 1996, 53: 283-319.

    CAS  Article  PubMed  Google Scholar 

  12. 12.

    Ullu E, Tschudi C: Alu sequences are processed 7SL RNA genes. Nature. 1984, 312: 171-172.

    CAS  Article  PubMed  Google Scholar 

  13. 13.

    Rowold DJ, Herrera RJ: Alu elements and the human genome. Genetica. 2000, 108: 57-72. 10.1023/A:1004099605261.

    CAS  Article  PubMed  Google Scholar 

  14. 14.

    Mighell AJ, Markham AF, Robinson PA: Alu sequences. FEBS Lett. 1997, 417: 1-5. 10.1016/S0014-5793(97)01259-3.

    CAS  Article  PubMed  Google Scholar 

  15. 15.

    Shen MR, Batzer MA, Deininger PL: Evolution of the master Alu gene(s). J Mol Evol. 1991, 33: 311-320.

    CAS  Article  PubMed  Google Scholar 

  16. 16.

    Jurka J, Milosavljevic A: Reconstruction and analysis of human Alu genes. J Mol Evol. 1991, 32: 105-121.

    CAS  Article  PubMed  Google Scholar 

  17. 17.

    Batzer MA, Arcot SS, Phinney JW, Alegria-Hartman M, Kass DH, Milligan SM, Kimpton C, Gill P, Hochmeister M, Ioannou PA, Herrera RJ, Boudreau DA, Scheer WD, Keats BJ, Deininger PL, Stoneking M: Genetic variation of recent Alu insertions in human populations. J Mol Evol. 1996, 42: 22-29.

    CAS  Article  PubMed  Google Scholar 

  18. 18.

    Tomilin NV, Bozhkov VM: Human nuclear protein interacting with a conservative sequence motif of Alu-family DNA repeats. FEBS Lett. 1989, 251: 79-83. 10.1016/0014-5793(89)81432-2.

    CAS  Article  PubMed  Google Scholar 

  19. 19.

    Hudson LG, Ertl AP, Gill GN: Structure and inducible regulation of the human c-erb B2/neu promoter. J Biol Chem. 1990, 265: 4389-4393.

    CAS  PubMed  Google Scholar 

  20. 20.

    Piedrafita FJ, Molander RB, Vansant G, Orlova EA, Pfahl M, Reynolds WF: An Alu element in the myeloperoxidase promoter contains a composite SP1-thyroid hormone-retinoic acid response element. J Biol Chem. 1996, 271: 14412-14420. 10.1074/jbc.271.24.14412.

    CAS  Article  PubMed  Google Scholar 

  21. 21.

    Babich V, Aksenov N, Alexeenko V, Oei SL, Buchlow G, Tomilin N: Association of some potential hormone response elements in human genes with the Alu family repeats. Gene. 1999, 239: 341-349. 10.1016/S0378-1119(99)00391-1.

    CAS  Article  PubMed  Google Scholar 

  22. 22.

    Chesnokov I, Bozhkov V, Popov B, Tomilin N: Binding specificity of human nuclear protein interacting with the Alu-family DNA repeats. Biochem Biophys Res Commun. 1991, 178: 613-619.

    CAS  Article  PubMed  Google Scholar 

  23. 23.

    Vansant G, Reynolds WF: The consensus sequence of a major Alu subfamily contains a functional retinoic acid response element. Proc Natl Acad Sci U S A. 1995, 92: 8229-8233.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  24. 24.

    Norris J, Fan D, Aleman C, Marks JR, Futreal PA, Wiseman RW, Iglehart JD, Deininger PL, McDonnell DP: Identification of a new subclass of Alu DNA repeats which can function as estrogen receptor-dependent transcriptional enhancers. J Biol Chem. 1995, 270: 22777-22782. 10.1074/jbc.270.39.22777.

    CAS  Article  PubMed  Google Scholar 

  25. 25.

    Almenoff JS, Jurka J, Schoolnik GK: Induction of heat-stable enterotoxin receptor activity by a human Alu repeat. J Biol Chem. 1994, 269: 16610-16617.

    CAS  PubMed  Google Scholar 

  26. 26.

    Ashfield R, Ashcroft SJ: Cloning of the promoters for the beta-cell ATP-sensitive K-channel subunits Kir6.2 and SUR1. Diabetes. 1998, 47: 1274-1280.

    CAS  PubMed  Google Scholar 

  27. 27.

    Austin GE, Lam L, Zaki SR, Chan WC, Hodge T, Hou J, Swan D, Zhang W, Racine M, Whitsett C, .: Sequence comparison of putative regulatory DNA of the 5' flanking region of the myeloperoxidase gene in normal and leukemic bone marrow cells. Leukemia. 1993, 7: 1445-1450.

    CAS  PubMed  Google Scholar 

  28. 28.

    Brini AT, Lee GM, Kinet JP: Involvement of Alu sequences in the cell-specific regulation of transcription of the gamma chain of Fc and T cell receptors. J Biol Chem. 1993, 268: 1355-1361.

    CAS  PubMed  Google Scholar 

  29. 29.

    Britten RJ: DNA sequence insertion and evolutionary variation in gene regulation. Proc Natl Acad Sci U S A. 1996, 93: 9374-9377. 10.1073/pnas.93.18.9374.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  30. 30.

    Britten RJ: Evolutionary selection against change in many Alu repeat sequences interspersed through primate genomes. Proc Natl Acad Sci U S A. 1994, 91: 5992-5996.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  31. 31.

    Chang SF, Scharf JG, Will H: Structural and functional analysis of the promoter of the hepatic lipase gene. Eur J Biochem. 1997, 247: 148-159. 10.1111/j.1432-1033.1997.00148.x.

    CAS  Article  PubMed  Google Scholar 

  32. 32.

    Le Goff W, Guerin M, Chapman MJ, Thillet J: A CYP7A promoter binding factor site and Alu repeat in the distal promoter region are implicated in regulation of human CETP gene expression. J Lipid Res. 2003, 44: 902-910. 10.1194/jlr.M200423-JLR200.

    CAS  Article  PubMed  Google Scholar 

  33. 33.

    Filatov LV, Mamayeva SE, Tomilin NV: Non-random distribution of Alu-family repeats in human chromosomes. Mol Biol Rep. 1987, 12: 117-122.

    CAS  Article  PubMed  Google Scholar 

  34. 34.

    Korenberg JR, Rykowski MC: Human genome organization: Alu, lines, and the molecular structure of metaphase chromosome bands. Cell. 1988, 53: 391-400. 10.1016/0092-8674(88)90159-6.

    CAS  Article  PubMed  Google Scholar 

  35. 35.

    Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, Stange-Thomann N, Stojanovic N, Subramanian A, Wyman D, Rogers J, Sulston J, Ainscough R, Beck S, Bentley D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R, French L, Grafham D, Gregory S, Hubbard T, Humphray S, Hunt A, Jones M, Lloyd C, McMurray A, Matthews L, Mercer S, Milne S, Mullikin JC, Mungall A, Plumb R, Ross M, Shownkeen R, Sims S, Waterston RH, Wilson RK, Hillier LW, McPherson JD, Marra MA, Mardis ER, Fulton LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL, Wendl MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB, Cook LL, Fulton RS, Johnson DL, Minx PJ, Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P, Wenning S, Slezak T, Doggett N, Cheng JF, Olsen A, Lucas S, Elkin C, Uberbacher E, Frazier M, Gibbs RA, Muzny DM, Scherer SE, Bouck JB, Sodergren EJ, Worley KC, Rives CM, Gorrell JH, Metzker ML, Naylor SL, Kucherlapati RS, Nelson DL, Weinstock GM, Sakaki Y, Fujiyama A, Hattori M, Yada T, Toyoda A, Itoh T, Kawagoe C, Watanabe H, Totoki Y, Taylor T, Weissenbach J, Heilig R, Saurin W, Artiguenave F, Brottier P, Bruls T, Pelletier E, Robert C, Wincker P, Smith DR, Doucette-Stamm L, Rubenfield M, Weinstock K, Lee HM, Dubois J, Rosenthal A, Platzer M, Nyakatura G, Taudien S, Rump A, Yang H, Yu J, Wang J, Huang G, Gu J, Hood L, Rowen L, Madan A, Qin S, Davis RW, Federspiel NA, Abola AP, Proctor MJ, Myers RM, Schmutz J, Dickson M, Grimwood J, Cox DR, Olson MV, Kaul R, Raymond C, Shimizu N, Kawasaki K, Minoshima S, Evans GA, Athanasiou M, Schultz R, Roe BA, Chen F, Pan H, Ramser J, Lehrach H, Reinhardt R, McCombie WR, de la Bastide M., Dedhia N, Blocker H, Hornischer K, Nordsiek G, Agarwala R, Aravind L, Bailey JA, Bateman A, Batzoglou S, Birney E, Bork P, Brown DG, Burge CB, Cerutti L, Chen HC, Church D, Clamp M, Copley RR, Doerks T, Eddy SR, Eichler EE, Furey TS, Galagan J, Gilbert JG, Harmon C, Hayashizaki Y, Haussler D, Hermjakob H, Hokamp K, Jang W, Johnson LS, Jones TA, Kasif S, Kaspryzk A, Kennedy S, Kent WJ, Kitts P, Koonin EV, Korf I, Kulp D, Lancet D, Lowe TM, McLysaght A, Mikkelsen T, Moran JV, Mulder N, Pollara VJ, Ponting CP, Schuler G, Schultz J, Slater G, Smit AF, Stupka E, Szustakowski J, Thierry-Mieg D, Thierry-Mieg J, Wagner L, Wallis J, Wheeler R, Williams A, Wolf YI, Wolfe KH, Yang SP, Yeh RF, Collins F, Guyer MS, Peterson J, Felsenfeld A, Wetterstrand KA, Patrinos A, Morgan MJ, Szustakowki J, de Jong P, Catanese JJ, Osoegawa K, Shizuya H, Choi S: Initial sequencing and analysis of the human genome. Nature. 2001, 409: 860-921. 10.1038/35057062.

    CAS  Article  PubMed  Google Scholar 

  36. 36.

    Hon LS, Jain AN: Compositional structure of repetitive elements is quantitatively related to co-expression of gene pairs. J Mol Biol. 2003, 332: 305-310. 10.1016/S0022-2836(03)00926-4.

    CAS  Article  PubMed  Google Scholar 

  37. 37.

    Carroll ML, Roy-Engel AM, Nguyen SV, Salem AH, Vogel E, Vincent B, Myers J, Ahmad Z, Nguyen L, Sammarco M, Watkins WS, Henke J, Makalowski W, Jorde LB, Deininger PL, Batzer MA: Large-scale analysis of the Alu Ya5 and Yb8 subfamilies and their contribution to human genomic diversity. J Mol Biol. 2001, 311: 17-40. 10.1006/jmbi.2001.4847.

    CAS  Article  PubMed  Google Scholar 

  38. 38.

    Arcot SS, Adamson AW, Risch GW, LaFleur J, Robichaux MB, Lamerdin JE, Carrano AV, Batzer MA: High-resolution cartography of recently integrated human chromosome 19-specific Alu fossils. J Mol Biol. 1998, 281: 843-856. 10.1006/jmbi.1998.1984.

    CAS  Article  PubMed  Google Scholar 

  39. 39.

    Batzer MA, Deininger PL: Alu repeats and human genomic diversity. Nat Rev Genet. 2002, 3: 370-379. 10.1038/nrg798.

    CAS  Article  PubMed  Google Scholar 

  40. 40.

    Roy-Engel AM, Carroll ML, Vogel E, Garber RK, Nguyen SV, Salem AH, Batzer MA, Deininger PL: Alu insertion polymorphisms for the study of human genomic diversity. Genetics. 2001, 159: 279-290.

    PubMed Central  CAS  PubMed  Google Scholar 

  41. 41.

    Batzer MA, Kilroy GE, Richard PE, Shaikh TH, Desselle TD, Hoppens CL, Deininger PL: Structure and variability of recently inserted Alu family members. Nucleic Acids Res. 1990, 18: 6793-6798.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  42. 42.

    Aleman C, Roy-Engel AM, Shaikh TH, Deininger PL: Cis-acting influences on Alu RNA levels. Nucleic Acids Res. 2000, 28: 4755-4761. 10.1093/nar/28.23.4755.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  43. 43.

    Perez-Stable C, Ayres TM, Shen CK: Distinctive sequence organization and functional programming of an Alu repeat promoter. Proc Natl Acad Sci U S A. 1984, 81: 5291-5295.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  44. 44.

    Perez-Stable C, Shen CK: Competitive and cooperative functioning of the anterior and posterior promoter elements of an Alu family repeat. Mol Cell Biol. 1986, 6: 2041-2052.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  45. 45.

    Murphy MH, Baralle FE: Directed semisynthetic point mutational analysis of an RNA polymerase III promoter. Nucleic Acids Res. 1983, 11: 7695-7700.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  46. 46.

    Shaikh TH, Roy AM, Kim J, Batzer MA, Deininger PL: cDNAs derived from primary and small cytoplasmic Alu (scAlu) transcripts. J Mol Biol. 1997, 271: 222-234. 10.1006/jmbi.1997.1161.

    CAS  Article  PubMed  Google Scholar 

  47. 47.

    Liu WM, Chu WM, Choudary PV, Schmid CW: Cell stress and translational inhibitors transiently increase the abundance of mammalian SINE transcripts. Nucleic Acids Res. 1995, 23: 1758-1765.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  48. 48.

    Perl A, Colombo E, Samoilova E, Butler MC, Banki K: Human transaldolase-associated repetitive elements are transcribed by RNA polymerase III. J Biol Chem. 2000, 275: 7261-7272. 10.1074/jbc.275.10.7261.

    CAS  Article  PubMed  Google Scholar 

  49. 49.

    Levanon EY, Eisenberg E, Yelin R, Nemzer S, Hallegger M, Shemesh R, Fligelman ZY, Shoshan A, Pollock SR, Sztybel D, Olshansky M, Rechavi G, Jantsch MF: Systematic identification of abundant A-to-I editing sites in the human transcriptome. Nat Biotechnol. 2004, 22: 1001-1005. 10.1038/nbt996.

    CAS  Article  PubMed  Google Scholar 

  50. 50.

    Grover D, Majumder PP, Rao CB, Brahmachari SK, Mukerji M: Nonrandom distribution of alu elements in genes of various functional categories: insight from analysis of human chromosomes 21 and 22. Mol Biol Evol. 2003, 20: 1420-1424. 10.1093/molbev/msg153.

    CAS  Article  PubMed  Google Scholar 

  51. 51.

    Kidwell MG, Lisch DR: Perspective: transposable elements, parasitic DNA, and genome evolution. Evolution Int J Org Evolution. 2001, 55: 1-24.

    CAS  Article  Google Scholar 

  52. 52.

    Hamdi H, Nishio H, Zielinski R, Dugaiczyk A: Origin and phylogenetic distribution of Alu DNA repeats: irreversible events in the evolution of primates. J Mol Biol. 1999, 289: 861-871. 10.1006/jmbi.1999.2797.

    CAS  Article  PubMed  Google Scholar 

  53. 53.

    Hamdi H, Nishio H, Zielinski R, Dugaiczyk A: Origin and phylogenetic distribution of Alu DNA repeats: irreversible events in the evolution of primates. J Mol Biol. 1999, 289: 861-871. 10.1006/jmbi.1999.2797.

    CAS  Article  PubMed  Google Scholar 

  54. 54.

    National Center for Biotechnology Information. []

  55. 55.

    Repeat Masker server. []

  56. 56.

    Ensemble Genome Data Resources. []

Download references


We thank Krishna Kumar and S Suganya for computational support. Financial support from Council of Scientific and Industrial Research (CSIR) projects (CMM0016) to MM and (CMM0017) to SKB is duly acknowledged.

Author information



Corresponding author

Correspondence to Mitali Mukerji.

Additional information

Authors' contributions

RS developed the algorithms and programs for identifying regulatory and significant regions, carried out the analysis of distribution of these sites in Alu subfamilies, association analysis and drafted the manuscript. DG was involved in chromosome 22 analyses. SKB participated in the design of the study. MM conceived of the study, participated in its design, analysis, coordination and manuscript preparation. All authors read and approved the final manuscript.

Electronic supplementary material

Supplementary data: The analysis over the promoter and intronic regions has been performed through the data given in the supplementary table file, supplementary table 3_ravishankar et al. Format: .xls. For human chromosome 22, the data contains the accession number, associated Alu family, the respective positions, functional class of the region and further details, for each associated regulatory element found within the Alu repeats in the 5' flanking promoter and intronic regions. The zipped file name is supplementary Details about programs used are on request for academic users. (ZIP 311 KB)

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Shankar, R., Grover, D., Brahmachari, S.K. et al. Evolution and distribution of RNA polymerase II regulatory sites from RNA polymerase III dependant mobile Alu elements. BMC Evol Biol 4, 37 (2004).

Download citation


  • Regulatory Site
  • Cholesteryl Ester Transfer Protein
  • Genome Shuffling
  • Position Weight Matrix
  • Hormone Responsive Element