Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 May 19:(147):10.3791/59771.
doi: 10.3791/59771.

RNA Next-Generation Sequencing and a Bioinformatics Pipeline to Identify Expressed LINE-1s at the Locus-Specific Level

Affiliations

RNA Next-Generation Sequencing and a Bioinformatics Pipeline to Identify Expressed LINE-1s at the Locus-Specific Level

Tiffany Kaul et al. J Vis Exp. .

Abstract

Long INterspersed Elements-1 (LINEs/L1s) are repetitive elements that can copy and randomly insert in the genome resulting in genomic instability and mutagenesis. Understanding the expression patterns of L1 loci at the individual level will lend to the understanding of the biology of this mutagenic element. This autonomous element makes up a significant portion of the human genome with over 500,000 copies, though 99% are truncated and defective. However, their abundance and dominant number of defective copies make it challenging to identify authentically expressed L1s from L1-related sequences expressed as part of other genes. It is also challenging to identify which specific L1 locus is expressed due to the repetitive nature of the elements. Overcoming these challenges, we present an RNA-Seq bioinformatic approach to identify L1 expression at the locus specific level. In summary, we collect cytoplasmic RNA, select for polyadenylated transcripts, and utilize strand-specific RNA-Seq analyses to uniquely map reads to L1 loci in the human reference genome. We visually curate each L1 locus with uniquely mapped reads to confirm transcription from its own promoter and adjust mapped transcript reads to account for mappability of each individual L1 locus. This approach was applied to a prostate tumor cell line, DU145, to demonstrate the ability of this protocol to detect expression from a small number of the full-length L1 elements.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:. Workflow schematic.
Graphically described are the steps to identify expressed L1s in a human sample. Note that steps 1 and 2 do not need to be repeated if the appropriate files are already available. These appropriate files may be downloaded from Supplement File 1a-b and Supplement File 2. The boxes in red indicate the steps where bedtools coverage program is used to count the number of reads mapping to L1s in the same sense direction. These loci with sense oriented mapping reads are the L1s that should be manually curated.
Figure 2:
Figure 2:. Examples of curated L1 loci in DU145.
Loaded into IGV are the reference genome, the full-length L1 gff annotation file matching the reference genome version (Supplement File 1), the DU145 bam file, and lastly the genomic HeLa bam file to assess mappability, which are all available upon author request. Arrows have been added to aid in the visualization of direction of the annotated L1. Arrows and reads in red are oriented in sequence from right to left. Arrows and reads in blue are oriented in sequence from left to right. a) In IGV, this L1 locus appears to be expressed off its own promoter as there are no reads upstream of the L1 in the sense orientation for over 5 kb. This L1 has low mappability, it is not in a gene, and has evidence of expected antisense promoter activity. b) In IGV, this L1 locus appears to be expressed off its own promoter as there are no reads upstream the L1 in the sense orientation for over 5 kb. This L1 has low mappability and is within a gene of opposite direction. c) In IGV, this L1 locus was rejected as an expressed L1 as there are upstream reads in the same orientation within 5 kb. This L1 is within a gene of the same direction so the transcript reads are most likely originating from the promoter of the expressed gene. d) In IGV, this L1 locus was rejected as an expressed L1 as there are upstream reads in the same orientation within 5 kb. This L1 is downstream of a highly expressed gene in the same direction so the transcript reads are most likely originating from the promoter of that expressed gene and extending beyond the normal gene terminator. e) In IGV, this L1 locus was rejected as an expressed L1 as there are upstream reads in the same orientation within 5 kb. This L1 is not within or near an annotated gene in the reference gene so the origin of these transcripts within and upstream of the L1 element suggest an un-annotated promoter.
Figure 3:
Figure 3:. Background noise originates from truncated L1s as well.
Our L1 annotation does not include truncated L1s as they are a major source of background noise. Arrows have been added to aid in the visualization of direction of the annotated L1. Arrows and reads in blue are oriented in sequence from left to right. a) Demonstrated is an example of a truncated L1 in the L1MB5 sufamily that is 2706 bps. In IGV it is apparent that the reads originate from downstream extension of an expressed gene. b) Shown is another example of a truncated L1. This L1 is an L1PA11 that is 4767 bps long. In IGV it is apparent that the reads mapping uniquely to the L1 originate from the expressed exon, which the L1 is within.
Figure 4:
Figure 4:. Transcript reads that map uniquely to all full-length intact L1s in the human genome expressed in DU145 prostate tumor cell line.
In black are the specific loci to be identified as authentically expressed after manual curation and in red are the specific loci to be rejected as authentically expressed reads after manual curation. In grey are loci with less than ten reads mapping to each. As these loci represent a small fraction of transcript reads, they were not manually curate. The x-axis tick marks denote every 100 full-length, intact L1s. Approximately 4,500 loci are not graphically shown as they had zero mapped reads.
Figure 5:
Figure 5:. Transcript reads that map uniquely to authentically expressed full-length intact L1s in DU145 prostate tumor cell line.
Shown are the numbers of transcript reads that map to specific loci in DU145 cells after manual curation.
Figure 6:
Figure 6:. Reads mapping to authentically expressed L1 when adjusted by mappability.
Shown are the numbers of transcript reads adjusted by loci-specific mappability scores that map to manually curated L1 loci in DU145 cells.

Similar articles

Cited by

References

    1. International Human Genome Sequencing et al. Initial sequencing and analysis of the human genome. Nature. 409, 860 (2001). - PubMed
    1. Brouha B. et al. Hot L1s account for the bulk of retrotransposition in the human population. Proceedings of the National Academy of Sciences of the United States of America. 100 (9), 5280–5285 (2003). - PMC - PubMed
    1. Dombroski BA, Mathias SL, Nanthakumar E, Scott AF, and Kazazian HH Isolation of an active human transposable element. Science. 254 (5039), 1805 (1991). - PubMed
    1. Swergold GD Identification, characterization, and cell specificity of a human LINE-1 promoter. Molecular and Cellular Biology. 10 (12), 6718–6729 (1990). - PMC - PubMed
    1. Speek M. Antisense promoter of human L1 retrotransposon drives transcription of adjacent cellular genes. Molecular and Cellular Biology. 21 (6), 1973–1985 (2001). - PMC - PubMed

Publication types

LinkOut - more resources