. 2018 Mar 5;16(3):e2003067.

doi: 10.1371/journal.pbio.2003067. eCollection 2018 Mar.

Spliced integrated retrotransposed element (SpIRE) formation in the human genome

Peter A Larson¹, John B Moldovan¹, Naveen Jasti¹, Jeffrey M Kidd^{1

2}, Christine R Beck¹, John V Moran^{1

3}

Affiliations

¹ Department of Human Genetics, University of Michigan Medical School, Ann Arbor, Michigan, United States of America.
² Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, Michigan, United States of America.
³ Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, Michigan, United States of America.

PMID: 29505568
PMCID: PMC5860796
DOI: 10.1371/journal.pbio.2003067

Spliced integrated retrotransposed element (SpIRE) formation in the human genome

Peter A Larson et al. PLoS Biol. 2018.

. 2018 Mar 5;16(3):e2003067.

doi: 10.1371/journal.pbio.2003067. eCollection 2018 Mar.

Authors

Peter A Larson¹, John B Moldovan¹, Naveen Jasti¹, Jeffrey M Kidd^{1

2}, Christine R Beck¹, John V Moran^{1

3}

Affiliations

¹ Department of Human Genetics, University of Michigan Medical School, Ann Arbor, Michigan, United States of America.
² Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, Michigan, United States of America.
³ Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, Michigan, United States of America.

PMID: 29505568
PMCID: PMC5860796
DOI: 10.1371/journal.pbio.2003067

Abstract

Human Long interspersed element-1 (L1) retrotransposons contain an internal RNA polymerase II promoter within their 5' untranslated region (UTR) and encode two proteins, (ORF1p and ORF2p) required for their mobilization (i.e., retrotransposition). The evolutionary success of L1 relies on the continuous retrotransposition of full-length L1 mRNAs. Previous studies identified functional splice donor (SD), splice acceptor (SA), and polyadenylation sequences in L1 mRNA and provided evidence that a small number of spliced L1 mRNAs retrotransposed in the human genome. Here, we demonstrate that the retrotransposition of intra-5'UTR or 5'UTR/ORF1 spliced L1 mRNAs leads to the generation of spliced integrated retrotransposed elements (SpIREs). We identified a new intra-5'UTR SpIRE that is ten times more abundant than previously identified SpIREs. Functional analyses demonstrated that both intra-5'UTR and 5'UTR/ORF1 SpIREs lack Cis-acting transcription factor binding sites and exhibit reduced promoter activity. The 5'UTR/ORF1 SpIREs also produce nonfunctional ORF1p variants. Finally, we demonstrate that sequence changes within the L1 5'UTR over evolutionary time, which permitted L1 to evade the repressive effects of a host protein, can lead to the generation of new L1 splicing events, which, upon retrotransposition, generates a new SpIRE subfamily. We conclude that splicing inhibits L1 retrotransposition, SpIREs generally represent evolutionary "dead-ends" in the L1 retrotransposition process, mutations within the L1 5'UTR alter L1 splicing dynamics, and that retrotransposition of the resultant spliced transcripts can generate interindividual genomic variation.

PubMed Disclaimer

Conflict of interest statement

JVM is an inventor on the patent: “Kazazian, H.H., Boeke, J.D., Moran, J.V., and Dombroski, B.A. Compositions and methods of use of mammalian retrotransposons. Application No. 60/006,831; Patent No. 6,150,160; Issued November 21, 2000.” JVM has not made any money from this patent and voluntarily discloses this information.

Figures

**Fig 1. L1 mRNA contains potential SD and SA sites.**
(A) Schematic of a full-length retrotransposition competent genomic L1. Top: the 5′ and 3′ UTRs (gray rectangles), ORF1 (yellow rectangle), and ORF2 (blue rectangle) are indicated in the schematic. The approximate positions of sense transcription initiation and antisense transcription initiation are indicated with black arrows on the top and bottom of the 5′UTR, respectively. The approximate positions of the coiled-coil (CC), RNA recognition motif (RRM), and C-terminal domain (CTD) are indicated in black lettering in ORF1. The endonuclease (EN), reverse transcriptase (RT), and cysteine-rich (C) domain are indicated in white lettering in ORF2. The 3′UTR ends in an A_N. The L1 is flanked by target-site duplications (black arrowheads) in genomic DNA (black helical lines). Bottom: a magnified schematic of the 5′UTR and 5′ end of ORF1. The black arrow indicates the relative position of sense transcription initiation. The SD (red) and SA (green) sequences used to generate SpIREs are indicated above the 5′UTR (gray rectangle) and ORF1 (yellow rectangle). The position of the SD and SA sequences relative to L1.3 are indicated with superscript numbers. The relative positions of *Cis*-acting transcription factor binding sequences are indicated in the 5′UTR. (B–D) Schematics of the splicing events generating SpIRE_97/622, SpIRE_97/790, and SpIRE_97/976. The SD (red underlined GU nucleotides) and SA (green underlined AG nucleotides) demark the intron boundaries used to generate each class of SpIRE. The left half of the figure depicts the L1 mRNA sequence before splicing and the right half of the figure depicts L1 mRNA after splicing. A_N, poly(A) tract; C, cysteine-rich; CC, coiled-coil; CTD, C-terminal domain; EN, endonuclease; L1, Long interspersed element-1; ORF, open reading frame; poly(A), polyadenosine; RRM, RNA recognition motif; RT, reverse transcriptase; RUNX3, runt related transcription factor 3; SA, splice acceptor; SD, splice donor; SpIRE, spliced integrated retrotransposed element; SP1, specificity protein 1; SRY, sex determining region Y; UTR, untranslated region; YY1, yin and yang 1.

**Fig 2. Intra-5′UTR splicing reduces L1 promoter activity.**
(A) Schematic of the luciferase constructs and the relative position of northern blot probes. The L1.3 5′UTR (gray rectangle) was used to drive the transcription of the firefly luciferase reporter gene (green rectangle) present in plasmid pGL4.11. The following plasmids were created: pPL_WTLUC contains the full-length L1.3 5′UTR; pPL_97/622LUC contains the SpIRE_97/622 5′UTR; pPL_SDmLUC contains a U₉₉C SD mutation (red asterisk) in the L1.3 5′UTR; pPL_SAmLUC contains an A₆₂₀C SA mutation (light blue asterisk) in the L1.3 5′UTR. The relative positions of complementary riboprobes used in the northern blot experiments (ribonucleotides 7–99 [purple line], ribonucleotides 103–336 [red line], and the 3′ end of the luciferase gene [blue line]) are indicated below the schematic. (B) Representative northern blots. The black arrowhead indicates the predicted size of full-length L1/luciferase mRNA (about 2.7 kb). Construct names are indicated above the gel lanes; UTF = untransfected HeLa-JVM cells. The probe used in the northern blot experiment is indicated below the autoradiograph. Actin served as an mRNA loading control (2.1 kb). RNA size standards (kb) (Millenium RNA Markers) are indicated to the left of the autoradiograph panels. (C) Results from the luciferase assays. The x-axis indicates the name of the luciferase expression plasmid. The y-axis indicates the relative firefly luciferase units normalized to a co-transfected Renilla luciferase internal control. These data represent the averages of three biological replicates (S2 Table). Each biological replicate contained six technical replicates. Error bars indicate the standard deviation between three biological replicates. P-values were determined using a Student one-tailed t test. (D) Results from RT-PCR assays: A 1.2% agarose gel depicting the results from a representative qualitative RT-PCR experiment. DNA size markers (1 kb Plus DNA Ladder) are indicated at the left of the gel. Plasmid names are indicated above the gel; UTF = untransfected HeLa-JVM cells, H2O = water control for PCR reactions. The inset to below the gel indicates the major (* and #) and minor (+) cDNA products detected in the experiments. FL, full-length; H2O, water control for PCR reactions; kb, kilobase; L1, Long interspersed element-1; M, marker; RT-PCR, reverse transcription PCR; SA, splice acceptor; SD, splice donor; SpIRE, spliced integrated retrotransposed element; UTF, untransfected HeLa-JVM cells; UTR; untranslated region; WT, wild-type.

**Fig 3. ORF1p expression from intra-5′UTR and 5′UTR/ORF1 SpIREs.**
(A) Schematics of the engineered L1 constructs. The L1 5′UTR (gray rectangle), ORF1 (yellow rectangle), and ORF2 (blue rectangle) are indicated in the constructs. Relative positions of the SpIRE_97/622 and SpIRE_97/976 deletions (red triangles) are indicated on the bottom two constructs, respectively. The CMV promoter (white arrowhead) and the *mneoI* retrotransposition indicator cassette (green rectangle = *neo* gene sequence; black “v” line = intron interrupting *neo* coding sequence, SD = splice donor site, SA = splice acceptor site) are indicated at the 5′ and 3′ ends of the constructs, respectively. The black lollipop at the 3′ end on top of the constructs indicates the sense SV40 polyadenylation signal. The black arrow and gray lollipop on the bottom of the constructs are embedded within the *mneoI* retrotransposition indicator cassette and indicate an SV40 early promoter and herpes simplex virus thymidine kinase polyadenylation signal, respectively, in the antisense orientation. (B) Representative ORF1p western blot from WCLs. Molecular weight standards (kDa) are indicated to the left of the image. The black arrowhead indicates the predicted size of full-length ORF1p (about 40 kDa). Construct names are indicated above the image; pCEP/GFP = negative control. The antibody used in the western blot experiment is indicated to the right of the gel (α-N-ORF1p). The eIF3 protein (110 kDa) served as a loading control. Western blots were performed three times, yielding similar results. (C) Schematic of ORF1 and relative location of antibody binding. Top: The relative positions in ORF1 (yellow rectangle) of the SA sequence at nucleotides 974–975 (green), the canonical ORF1 initiator methionine (AUG, black, 40 kDa), the two putative initiator methionine codons (AUG, orange, 33 kDa; AUG, blue, 27 kDa), and the N- and C-terminal epitopes recognized by the ORF1p Ab (red and purple stars, respectively) are indicated in the figure. (D) Representative western blots from WCLs: molecular weight standards (kDa) are indicated to the left of the gels. The predicted sizes of full-length ORF1p (black arrowhead) and the N-terminal truncated ORF1p variants (orange and blue arrows, respectively) are highlighted on the gel. Construct names are indicated above the image; pCEP/GFP = negative control. The antibodies used in the western blot experiments are indicated to the left (α-N-ORF1p) and right (α-C-ORF1p) of the gel images, respectively. The eIF3 protein (110 kDa) served as a loading control. The unlabeled band at about 25 kDa in the α-C-ORF1p experiment is an unknown cross-reacting product that was not detected in RNPs or with an antibody to a C-terminal ORF1p T7-*gene10* epitope tag (S4A Fig and S4B Fig). Western blots were performed three times, yielding similar results. α-C-ORF1p, C-terminal ORF1p antibody; α-elF3, eukaryotic initiation factor 3 antibody; α-N-ORF1p, N-terminal ORF1p antibody; Ab, antibody; AUG, translation initiation codon; CMV, cytomegalovirus; kDa, kilodalton; L1, Long interspersed element-1; ORF, open reading frame; SA, splice acceptor; SD, splice donor; SpIRE, spliced integrated retrotransposed element; UTR, untranslated region; WCL, whole cell lysate.

**Fig 4. Intra-5′UTR and 5′UTR/ORF1 SpIREs are retrotransposition-defective.**
(A) Results from the SpIRE_97/622 retrotransposition assay. The x-axis indicates the construct names. The y-axis indicates the relative retrotransposition efficiency (%). The CMV promoter either augments L1 expression (+CMV, black bars) or is absent (ΔCMV, gray bars) from the L1 expression construct. The relative retrotransposition efficiencies are normalized to pJM101/L1.3 (set at 100%). The pJM105/L1.3 plasmid served as a negative control. The images and data are from one representative experiment (S3 Table). Error bars represent the standard deviation of technical triplicates for the depicted assay. Each assay was repeated three times, yielding similar results. (B) Results from the SpIRE_97/976 retrotransposition assay. The x-axis indicates the construct names. The y-axis indicates the relative retrotransposition efficiency (%). A CMV promoter augments L1 expression (+CMV, black bars). The relative retrotransposition efficiencies are normalized to pJM101/L1.3 (set at 100%). The pJM105/L1.3 plasmid served as a negative control. The images and data are from one representative experiment (S4 Table). Error bars represent the standard deviation of technical triplicates for the depicted assay. Each assay was repeated three times, yielding similar results. (C) Results from the SpIRE_97/976 *Trans*-complementation assay. The x-axis indicates the “reporter” (top text) and the “driver” (bottom text) construct names. The y-axis indicates the relative *Trans*-complementation efficiency (%). The results of each assay were normalized to the pPL_97/976/L1.3 “reporter” plasmid + pJBM561 “driver plasmid” co-transfection experiment, which was set at 100%. The image at the bottom right-hand side of the figure represents the efficiency of pJM101/L1.3 retrotransposition in *cis*. The pPL_97-976/L1.3 “reporter” plasmid + pCEP4 “driver plasmid” co-transfection experiment served as a negative control. The images and data are from one representative experiment (S5 Table). Error bars represent standard deviations of technical triplicates for the depicted experiment. Each assay was repeated four times, yielding similar results. CMV, cytomegalovirus; L1, Long interspersed element-1; ORF, open reading frame; SpIRE, spliced integrated retrotransposed element; UTR, untranslated region.

**Fig 5. Sequence changes within the 5′UTR affect intra-5′UTR splice site choice.**
(A) Schematic of the L1PA1 and L1PA3 5′UTRs. Top schematic, the relative positions of the SD (red lettering), SA (green lettering), and putative branch point sequence (ACCTCAC, black lettering) in the L1PA1 5′UTR that led to the formation of SpIRE_97/790 are indicated in the schematic. Superscript numbers indicate the first and last nucleotide of the indicated sequence. Note that nucleotide positions are indicated in the context of L1.3 (accession #L19088). Numbers below the branch point (underlined A; 95.75) and above the SA A₇₈₈G₇₈₉ (84.95) indicate the predicted score of those sequences for utilization in a splicing reaction, as determined using Human Splicing Finder v.3.0 (http://www.umd.be/HSF3/) [105]. Note that predicted scores above 80 are considered “strong” [105]. Bottom schematic, the relative positions of the SD (red lettering), SAs A₈₅₁G₈₅₂ (purple lettering), A₉₁₆G₉₁₇ (green lettering), and putative branch point sequence (TCCAGAG, black lettering) in the L1PA3 5′UTR are indicated in the schematic. Superscript numbers indicate the first and last nucleotide of the indicated sequence. Numbers below the branch point (underlined A; 75.73) and SAs A₈₅₁G₈₅₂ (83.75) and A₉₁₆G₉₁₇ (79.66) indicate the predicted strength of those sequences for utilization in a splicing reaction, as determined using Human Splicing Finder v.3.0 (http://www.umd.be/HSF3/) [105]. The L1PA3 5′UTR contains a 129-bp sequence (gray triangle) containing the SA A₈₅₁G₈₅₂ that was lost in the transition from the L1PA3 to L1PA2/L1PA1 subfamilies. The 129-bp deletion results in repositioning the SA A₉₁₆G₉₁₇ in L1PA3 to closer proximity of a putative branch point in the L1PA2/L1PA1 subfamilies 5′UTR (now noted as A₇₈₈G₇₈₉ in the top schematic), leading to a higher predicted score (84.95 in PA1 compared to 79.66 in PA3). (B) Schematic of luciferase constructs and results from luciferase assays. Top panel: the L1_RP 5′UTR (gray rectangle) was used to drive the transcription of the firefly luciferase reporter gene (green rectangle) present in plasmid pGL4.11. The following plasmids were created: pJBM_WTLUC contains the full-length L1_RP 5′UTR; pJBM_WT129^PA4LUC contains the 129-bp (black box in 5′UTR) sequence derived from L1PA4 within the L1_RP 5′UTR; pJBM_WT129^SCRLUC contains a scrambled version of the 129-bp sequence (black and white striped box) within the 5′UTR. Bottom panel: luciferase assay. The x-axis indicates the name of the luciferase expression plasmid. The y-axis indicates the relative firefly luciferase units normalized to a co-transfected Renilla luciferase internal control. These data represent the averages of three biological replicates (S6 Table). Each biological replicate contained six technical replicates. Error bars indicate the standard deviation between three biological replicates. P-values were determined using a Student one-tailed t test and “n.s.” indicates that there was no statistical difference. (C) Results from the EGFP retrotransposition assay: the x-axis indicates the construct names. The y-axis indicates the relative retrotransposition efficiency (%). The relative retrotransposition efficiencies are normalized to pL1_RP-EGFP (set at 100%). The data are from one representative experiment (S7 Table). Error bars represent the standard deviation of technical triplicates for the depicted assay. Each assay was repeated four times, yielding similar results. (D) Results from RT-PCR assays: a 2.0% agarose gel depicting the results from a representative qualitative RT-PCR experiment. DNA size markers (1 kb Plus DNA Ladder) are shown at the left of the gel. Plasmid names are indicated above the gel; UTF = untransfected HeLa-JVM cells, H2O = water control for PCR reactions. The right half of the agarose gel (“NO RT”) indicates the results from a representative experiment conducted without the addition of reverse transcriptase. The inset to the right of the gel indicates the major (*, **, and ***) and minor (+, @, and $) cDNA products detected in the experiments. The assay was repeated four times, yielding similar results. H2O, water control for PCR reactions; L1, Long interspersed element-1; RT-PCR, reverse transcription PCR; SA, splice acceptor; SD, splice donor; SpIRE, spliced integrated retrotransposed element; UTF, untransfected HeLa-JVM cells; UTR, untranslated region.

**Fig 6. A working model for the generation of SpIREs.**
(A) Canonical L1 retrotransposition. An L1 is transcribed from a genomic location (red chromosome). Translation of the mRNA (multicolored wavy line) occurs in the cytoplasm and ORF1p (yellow circles) and ORF2p (blue oval) bind back onto their respective mRNA (*Cis*-preference) to form an RNP. The L1 RNP then enters the nucleus and a de novo L1 insertion occurs at a new genomic location (green chromosome) by TPRT. This insertion, if full length, could act as a source element, giving rise to new insertions (green arrow) at a new genomic location (gray chromosome). (B) Retrotransposition of intra-5′UTR spliced L1 isoform. A full-length L1 element is transcribed from its genomic location (red chromosome) and undergoes intra-5′UTR splicing. Translation of the mRNA (multicolored wavy line) occurs in the cytoplasm and ORF1p (yellow circles) and ORF2p (blue oval) bind back onto their respective mRNA (*Cis*-preference) to form an RNP. The L1 RNP then enters the nucleus and L1 mRNAs subject to intra-5′UTR splicing can undergo a single round of retrotransposition (green chromosome) by TPRT. However, because the intra-5′UTR splicing event deletes sequences required for L1 promoter activity, the resultant insertion is unlikely to undergo subsequent rounds of retrotransposition in future generations (dashed green arrow). (C) Retrotransposition of 5′UTR/ORF1 spliced L1 isoform. An L1 is transcribed from its genomic location (red chromosome) and is subject to 5′UTR/ORF1 splicing. Translation of the mRNA (multicolored wavy line) occurs in the cytoplasm; however, because translation occurs at downstream AUG codons, ORF1p (yellow circles) is truncated and nonfunctional, the 5′UTR/ORF1 spliced L1 mRNA relies on a wild-type source of ORF1p to be supplied from another L1 in *trans*. In the rare instance that *Trans*-complementation occurs (dotted arrow), it is highly unlikely that the resultant SpIRE will generate RNAs that can undergo retrotransposition in future generations (dashed thin green arrow). L1, Long interspersed element-1; ORF, open reading frame; RNP, ribonucleoprotein particle; SpIRE, spliced integrated retrotransposed element; TPRT, target-site primed reverse transcription; UTR, untranslated region.

See this image and copyright information in PMC

Comment in

Reading the tea leaves: Dead transposon copies reveal novel host and transposon biology.
McLaughlin RN Jr. McLaughlin RN Jr. PLoS Biol. 2018 Mar 5;16(3):e2005470. doi: 10.1371/journal.pbio.2005470. eCollection 2018 Mar. PLoS Biol. 2018. PMID: 29505560 Free PMC article.

References

1. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409: 860–921. doi: 10.1038/35057062 - DOI - PubMed
1. Grimaldi G, Skowronski J, Singer MF. Defining the beginning and end of KpnI family segments. EMBO J. 1984;3: 1753–9. - PMC - PubMed
1. Kazazian HH Jr., Moran JV. The impact of L1 retrotransposons on the human genome. Nat Genet. 1998;19: 19–24. doi: 10.1038/ng0598-19 - DOI - PubMed
1. Ostertag EM, Kazazian HH Jr. Twin priming: a proposed mechanism for the creation of inversions in L1 retrotransposition. Genome research. 2001;11: 2059–65. doi: 10.1101/gr.205701 - DOI - PMC - PubMed
1. Brouha B, Schustak J, Badge RM, Lutz-Prigge S, Farley AH, Moran JV, et al. Hot L1s account for the bulk of retrotransposition in the human population. Proc Natl Acad Sci U S A. 2003;100: 5280–5. doi: 10.1073/pnas.0831042100 - DOI - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

Grants and funding

R01 GM060518/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Spliced integrated retrotransposed element (SpIRE) formation in the human genome

Affiliations

Spliced integrated retrotransposed element (SpIRE) formation in the human genome

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Comment in

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources