Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Mar 5;16(3):e2003067.
doi: 10.1371/journal.pbio.2003067. eCollection 2018 Mar.

Spliced integrated retrotransposed element (SpIRE) formation in the human genome

Affiliations

Spliced integrated retrotransposed element (SpIRE) formation in the human genome

Peter A Larson et al. PLoS Biol. .

Abstract

Human Long interspersed element-1 (L1) retrotransposons contain an internal RNA polymerase II promoter within their 5' untranslated region (UTR) and encode two proteins, (ORF1p and ORF2p) required for their mobilization (i.e., retrotransposition). The evolutionary success of L1 relies on the continuous retrotransposition of full-length L1 mRNAs. Previous studies identified functional splice donor (SD), splice acceptor (SA), and polyadenylation sequences in L1 mRNA and provided evidence that a small number of spliced L1 mRNAs retrotransposed in the human genome. Here, we demonstrate that the retrotransposition of intra-5'UTR or 5'UTR/ORF1 spliced L1 mRNAs leads to the generation of spliced integrated retrotransposed elements (SpIREs). We identified a new intra-5'UTR SpIRE that is ten times more abundant than previously identified SpIREs. Functional analyses demonstrated that both intra-5'UTR and 5'UTR/ORF1 SpIREs lack Cis-acting transcription factor binding sites and exhibit reduced promoter activity. The 5'UTR/ORF1 SpIREs also produce nonfunctional ORF1p variants. Finally, we demonstrate that sequence changes within the L1 5'UTR over evolutionary time, which permitted L1 to evade the repressive effects of a host protein, can lead to the generation of new L1 splicing events, which, upon retrotransposition, generates a new SpIRE subfamily. We conclude that splicing inhibits L1 retrotransposition, SpIREs generally represent evolutionary "dead-ends" in the L1 retrotransposition process, mutations within the L1 5'UTR alter L1 splicing dynamics, and that retrotransposition of the resultant spliced transcripts can generate interindividual genomic variation.

PubMed Disclaimer

Conflict of interest statement

JVM is an inventor on the patent: “Kazazian, H.H., Boeke, J.D., Moran, J.V., and Dombroski, B.A. Compositions and methods of use of mammalian retrotransposons. Application No. 60/006,831; Patent No. 6,150,160; Issued November 21, 2000.” JVM has not made any money from this patent and voluntarily discloses this information.

Figures

Fig 1
Fig 1. L1 mRNA contains potential SD and SA sites.
(A) Schematic of a full-length retrotransposition competent genomic L1. Top: the 5′ and 3′ UTRs (gray rectangles), ORF1 (yellow rectangle), and ORF2 (blue rectangle) are indicated in the schematic. The approximate positions of sense transcription initiation and antisense transcription initiation are indicated with black arrows on the top and bottom of the 5′UTR, respectively. The approximate positions of the coiled-coil (CC), RNA recognition motif (RRM), and C-terminal domain (CTD) are indicated in black lettering in ORF1. The endonuclease (EN), reverse transcriptase (RT), and cysteine-rich (C) domain are indicated in white lettering in ORF2. The 3′UTR ends in an AN. The L1 is flanked by target-site duplications (black arrowheads) in genomic DNA (black helical lines). Bottom: a magnified schematic of the 5′UTR and 5′ end of ORF1. The black arrow indicates the relative position of sense transcription initiation. The SD (red) and SA (green) sequences used to generate SpIREs are indicated above the 5′UTR (gray rectangle) and ORF1 (yellow rectangle). The position of the SD and SA sequences relative to L1.3 are indicated with superscript numbers. The relative positions of Cis-acting transcription factor binding sequences are indicated in the 5′UTR. (B–D) Schematics of the splicing events generating SpIRE97/622, SpIRE97/790, and SpIRE97/976. The SD (red underlined GU nucleotides) and SA (green underlined AG nucleotides) demark the intron boundaries used to generate each class of SpIRE. The left half of the figure depicts the L1 mRNA sequence before splicing and the right half of the figure depicts L1 mRNA after splicing. AN, poly(A) tract; C, cysteine-rich; CC, coiled-coil; CTD, C-terminal domain; EN, endonuclease; L1, Long interspersed element-1; ORF, open reading frame; poly(A), polyadenosine; RRM, RNA recognition motif; RT, reverse transcriptase; RUNX3, runt related transcription factor 3; SA, splice acceptor; SD, splice donor; SpIRE, spliced integrated retrotransposed element; SP1, specificity protein 1; SRY, sex determining region Y; UTR, untranslated region; YY1, yin and yang 1.
Fig 2
Fig 2. Intra-5′UTR splicing reduces L1 promoter activity.
(A) Schematic of the luciferase constructs and the relative position of northern blot probes. The L1.3 5′UTR (gray rectangle) was used to drive the transcription of the firefly luciferase reporter gene (green rectangle) present in plasmid pGL4.11. The following plasmids were created: pPLWTLUC contains the full-length L1.3 5′UTR; pPL97/622LUC contains the SpIRE97/622 5′UTR; pPLSDmLUC contains a U99C SD mutation (red asterisk) in the L1.3 5′UTR; pPLSAmLUC contains an A620C SA mutation (light blue asterisk) in the L1.3 5′UTR. The relative positions of complementary riboprobes used in the northern blot experiments (ribonucleotides 7–99 [purple line], ribonucleotides 103–336 [red line], and the 3′ end of the luciferase gene [blue line]) are indicated below the schematic. (B) Representative northern blots. The black arrowhead indicates the predicted size of full-length L1/luciferase mRNA (about 2.7 kb). Construct names are indicated above the gel lanes; UTF = untransfected HeLa-JVM cells. The probe used in the northern blot experiment is indicated below the autoradiograph. Actin served as an mRNA loading control (2.1 kb). RNA size standards (kb) (Millenium RNA Markers) are indicated to the left of the autoradiograph panels. (C) Results from the luciferase assays. The x-axis indicates the name of the luciferase expression plasmid. The y-axis indicates the relative firefly luciferase units normalized to a co-transfected Renilla luciferase internal control. These data represent the averages of three biological replicates (S2 Table). Each biological replicate contained six technical replicates. Error bars indicate the standard deviation between three biological replicates. P-values were determined using a Student one-tailed t test. (D) Results from RT-PCR assays: A 1.2% agarose gel depicting the results from a representative qualitative RT-PCR experiment. DNA size markers (1 kb Plus DNA Ladder) are indicated at the left of the gel. Plasmid names are indicated above the gel; UTF = untransfected HeLa-JVM cells, H2O = water control for PCR reactions. The inset to below the gel indicates the major (* and #) and minor (+) cDNA products detected in the experiments. FL, full-length; H2O, water control for PCR reactions; kb, kilobase; L1, Long interspersed element-1; M, marker; RT-PCR, reverse transcription PCR; SA, splice acceptor; SD, splice donor; SpIRE, spliced integrated retrotransposed element; UTF, untransfected HeLa-JVM cells; UTR; untranslated region; WT, wild-type.
Fig 3
Fig 3. ORF1p expression from intra-5′UTR and 5′UTR/ORF1 SpIREs.
(A) Schematics of the engineered L1 constructs. The L1 5′UTR (gray rectangle), ORF1 (yellow rectangle), and ORF2 (blue rectangle) are indicated in the constructs. Relative positions of the SpIRE97/622 and SpIRE97/976 deletions (red triangles) are indicated on the bottom two constructs, respectively. The CMV promoter (white arrowhead) and the mneoI retrotransposition indicator cassette (green rectangle = neo gene sequence; black “v” line = intron interrupting neo coding sequence, SD = splice donor site, SA = splice acceptor site) are indicated at the 5′ and 3′ ends of the constructs, respectively. The black lollipop at the 3′ end on top of the constructs indicates the sense SV40 polyadenylation signal. The black arrow and gray lollipop on the bottom of the constructs are embedded within the mneoI retrotransposition indicator cassette and indicate an SV40 early promoter and herpes simplex virus thymidine kinase polyadenylation signal, respectively, in the antisense orientation. (B) Representative ORF1p western blot from WCLs. Molecular weight standards (kDa) are indicated to the left of the image. The black arrowhead indicates the predicted size of full-length ORF1p (about 40 kDa). Construct names are indicated above the image; pCEP/GFP = negative control. The antibody used in the western blot experiment is indicated to the right of the gel (α-N-ORF1p). The eIF3 protein (110 kDa) served as a loading control. Western blots were performed three times, yielding similar results. (C) Schematic of ORF1 and relative location of antibody binding. Top: The relative positions in ORF1 (yellow rectangle) of the SA sequence at nucleotides 974–975 (green), the canonical ORF1 initiator methionine (AUG, black, 40 kDa), the two putative initiator methionine codons (AUG, orange, 33 kDa; AUG, blue, 27 kDa), and the N- and C-terminal epitopes recognized by the ORF1p Ab (red and purple stars, respectively) are indicated in the figure. (D) Representative western blots from WCLs: molecular weight standards (kDa) are indicated to the left of the gels. The predicted sizes of full-length ORF1p (black arrowhead) and the N-terminal truncated ORF1p variants (orange and blue arrows, respectively) are highlighted on the gel. Construct names are indicated above the image; pCEP/GFP = negative control. The antibodies used in the western blot experiments are indicated to the left (α-N-ORF1p) and right (α-C-ORF1p) of the gel images, respectively. The eIF3 protein (110 kDa) served as a loading control. The unlabeled band at about 25 kDa in the α-C-ORF1p experiment is an unknown cross-reacting product that was not detected in RNPs or with an antibody to a C-terminal ORF1p T7-gene10 epitope tag (S4A Fig and S4B Fig). Western blots were performed three times, yielding similar results. α-C-ORF1p, C-terminal ORF1p antibody; α-elF3, eukaryotic initiation factor 3 antibody; α-N-ORF1p, N-terminal ORF1p antibody; Ab, antibody; AUG, translation initiation codon; CMV, cytomegalovirus; kDa, kilodalton; L1, Long interspersed element-1; ORF, open reading frame; SA, splice acceptor; SD, splice donor; SpIRE, spliced integrated retrotransposed element; UTR, untranslated region; WCL, whole cell lysate.
Fig 4
Fig 4. Intra-5′UTR and 5′UTR/ORF1 SpIREs are retrotransposition-defective.
(A) Results from the SpIRE97/622 retrotransposition assay. The x-axis indicates the construct names. The y-axis indicates the relative retrotransposition efficiency (%). The CMV promoter either augments L1 expression (+CMV, black bars) or is absent (ΔCMV, gray bars) from the L1 expression construct. The relative retrotransposition efficiencies are normalized to pJM101/L1.3 (set at 100%). The pJM105/L1.3 plasmid served as a negative control. The images and data are from one representative experiment (S3 Table). Error bars represent the standard deviation of technical triplicates for the depicted assay. Each assay was repeated three times, yielding similar results. (B) Results from the SpIRE97/976 retrotransposition assay. The x-axis indicates the construct names. The y-axis indicates the relative retrotransposition efficiency (%). A CMV promoter augments L1 expression (+CMV, black bars). The relative retrotransposition efficiencies are normalized to pJM101/L1.3 (set at 100%). The pJM105/L1.3 plasmid served as a negative control. The images and data are from one representative experiment (S4 Table). Error bars represent the standard deviation of technical triplicates for the depicted assay. Each assay was repeated three times, yielding similar results. (C) Results from the SpIRE97/976 Trans-complementation assay. The x-axis indicates the “reporter” (top text) and the “driver” (bottom text) construct names. The y-axis indicates the relative Trans-complementation efficiency (%). The results of each assay were normalized to the pPL97/976/L1.3 “reporter” plasmid + pJBM561 “driver plasmid” co-transfection experiment, which was set at 100%. The image at the bottom right-hand side of the figure represents the efficiency of pJM101/L1.3 retrotransposition in cis. The pPL97-976/L1.3 “reporter” plasmid + pCEP4 “driver plasmid” co-transfection experiment served as a negative control. The images and data are from one representative experiment (S5 Table). Error bars represent standard deviations of technical triplicates for the depicted experiment. Each assay was repeated four times, yielding similar results. CMV, cytomegalovirus; L1, Long interspersed element-1; ORF, open reading frame; SpIRE, spliced integrated retrotransposed element; UTR, untranslated region.
Fig 5
Fig 5. Sequence changes within the 5′UTR affect intra-5′UTR splice site choice.
(A) Schematic of the L1PA1 and L1PA3 5′UTRs. Top schematic, the relative positions of the SD (red lettering), SA (green lettering), and putative branch point sequence (ACCTCAC, black lettering) in the L1PA1 5′UTR that led to the formation of SpIRE97/790 are indicated in the schematic. Superscript numbers indicate the first and last nucleotide of the indicated sequence. Note that nucleotide positions are indicated in the context of L1.3 (accession #L19088). Numbers below the branch point (underlined A; 95.75) and above the SA A788G789 (84.95) indicate the predicted score of those sequences for utilization in a splicing reaction, as determined using Human Splicing Finder v.3.0 (http://www.umd.be/HSF3/) [105]. Note that predicted scores above 80 are considered “strong” [105]. Bottom schematic, the relative positions of the SD (red lettering), SAs A851G852 (purple lettering), A916G917 (green lettering), and putative branch point sequence (TCCAGAG, black lettering) in the L1PA3 5′UTR are indicated in the schematic. Superscript numbers indicate the first and last nucleotide of the indicated sequence. Numbers below the branch point (underlined A; 75.73) and SAs A851G852 (83.75) and A916G917 (79.66) indicate the predicted strength of those sequences for utilization in a splicing reaction, as determined using Human Splicing Finder v.3.0 (http://www.umd.be/HSF3/) [105]. The L1PA3 5′UTR contains a 129-bp sequence (gray triangle) containing the SA A851G852 that was lost in the transition from the L1PA3 to L1PA2/L1PA1 subfamilies. The 129-bp deletion results in repositioning the SA A916G917 in L1PA3 to closer proximity of a putative branch point in the L1PA2/L1PA1 subfamilies 5′UTR (now noted as A788G789 in the top schematic), leading to a higher predicted score (84.95 in PA1 compared to 79.66 in PA3). (B) Schematic of luciferase constructs and results from luciferase assays. Top panel: the L1RP 5′UTR (gray rectangle) was used to drive the transcription of the firefly luciferase reporter gene (green rectangle) present in plasmid pGL4.11. The following plasmids were created: pJBMWTLUC contains the full-length L1RP 5′UTR; pJBMWT129PA4LUC contains the 129-bp (black box in 5′UTR) sequence derived from L1PA4 within the L1RP 5′UTR; pJBMWT129SCRLUC contains a scrambled version of the 129-bp sequence (black and white striped box) within the 5′UTR. Bottom panel: luciferase assay. The x-axis indicates the name of the luciferase expression plasmid. The y-axis indicates the relative firefly luciferase units normalized to a co-transfected Renilla luciferase internal control. These data represent the averages of three biological replicates (S6 Table). Each biological replicate contained six technical replicates. Error bars indicate the standard deviation between three biological replicates. P-values were determined using a Student one-tailed t test and “n.s.” indicates that there was no statistical difference. (C) Results from the EGFP retrotransposition assay: the x-axis indicates the construct names. The y-axis indicates the relative retrotransposition efficiency (%). The relative retrotransposition efficiencies are normalized to pL1RP-EGFP (set at 100%). The data are from one representative experiment (S7 Table). Error bars represent the standard deviation of technical triplicates for the depicted assay. Each assay was repeated four times, yielding similar results. (D) Results from RT-PCR assays: a 2.0% agarose gel depicting the results from a representative qualitative RT-PCR experiment. DNA size markers (1 kb Plus DNA Ladder) are shown at the left of the gel. Plasmid names are indicated above the gel; UTF = untransfected HeLa-JVM cells, H2O = water control for PCR reactions. The right half of the agarose gel (“NO RT”) indicates the results from a representative experiment conducted without the addition of reverse transcriptase. The inset to the right of the gel indicates the major (*, **, and ***) and minor (+, @, and $) cDNA products detected in the experiments. The assay was repeated four times, yielding similar results. H2O, water control for PCR reactions; L1, Long interspersed element-1; RT-PCR, reverse transcription PCR; SA, splice acceptor; SD, splice donor; SpIRE, spliced integrated retrotransposed element; UTF, untransfected HeLa-JVM cells; UTR, untranslated region.
Fig 6
Fig 6. A working model for the generation of SpIREs.
(A) Canonical L1 retrotransposition. An L1 is transcribed from a genomic location (red chromosome). Translation of the mRNA (multicolored wavy line) occurs in the cytoplasm and ORF1p (yellow circles) and ORF2p (blue oval) bind back onto their respective mRNA (Cis-preference) to form an RNP. The L1 RNP then enters the nucleus and a de novo L1 insertion occurs at a new genomic location (green chromosome) by TPRT. This insertion, if full length, could act as a source element, giving rise to new insertions (green arrow) at a new genomic location (gray chromosome). (B) Retrotransposition of intra-5′UTR spliced L1 isoform. A full-length L1 element is transcribed from its genomic location (red chromosome) and undergoes intra-5′UTR splicing. Translation of the mRNA (multicolored wavy line) occurs in the cytoplasm and ORF1p (yellow circles) and ORF2p (blue oval) bind back onto their respective mRNA (Cis-preference) to form an RNP. The L1 RNP then enters the nucleus and L1 mRNAs subject to intra-5′UTR splicing can undergo a single round of retrotransposition (green chromosome) by TPRT. However, because the intra-5′UTR splicing event deletes sequences required for L1 promoter activity, the resultant insertion is unlikely to undergo subsequent rounds of retrotransposition in future generations (dashed green arrow). (C) Retrotransposition of 5′UTR/ORF1 spliced L1 isoform. An L1 is transcribed from its genomic location (red chromosome) and is subject to 5′UTR/ORF1 splicing. Translation of the mRNA (multicolored wavy line) occurs in the cytoplasm; however, because translation occurs at downstream AUG codons, ORF1p (yellow circles) is truncated and nonfunctional, the 5′UTR/ORF1 spliced L1 mRNA relies on a wild-type source of ORF1p to be supplied from another L1 in trans. In the rare instance that Trans-complementation occurs (dotted arrow), it is highly unlikely that the resultant SpIRE will generate RNAs that can undergo retrotransposition in future generations (dashed thin green arrow). L1, Long interspersed element-1; ORF, open reading frame; RNP, ribonucleoprotein particle; SpIRE, spliced integrated retrotransposed element; TPRT, target-site primed reverse transcription; UTR, untranslated region.

Comment in

References

    1. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409: 860–921. doi: 10.1038/35057062 - DOI - PubMed
    1. Grimaldi G, Skowronski J, Singer MF. Defining the beginning and end of KpnI family segments. EMBO J. 1984;3: 1753–9. - PMC - PubMed
    1. Kazazian HH Jr., Moran JV. The impact of L1 retrotransposons on the human genome. Nat Genet. 1998;19: 19–24. doi: 10.1038/ng0598-19 - DOI - PubMed
    1. Ostertag EM, Kazazian HH Jr. Twin priming: a proposed mechanism for the creation of inversions in L1 retrotransposition. Genome research. 2001;11: 2059–65. doi: 10.1101/gr.205701 - DOI - PMC - PubMed
    1. Brouha B, Schustak J, Badge RM, Lutz-Prigge S, Farley AH, Moran JV, et al. Hot L1s account for the bulk of retrotransposition in the human population. Proc Natl Acad Sci U S A. 2003;100: 5280–5. doi: 10.1073/pnas.0831042100 - DOI - PMC - PubMed

Publication types