Whole-genome resequencing allows detection of many rare LINE-1 insertion alleles in humans

Adam D Ewing¹, Haig H Kazazian Jr

Affiliations

PMID: 20980553
PMCID: PMC3106331
DOI: 10.1101/gr.114777.110

Whole-genome resequencing allows detection of many rare LINE-1 insertion alleles in humans

Adam D Ewing et al. Genome Res. 2011 Jun.

. 2011 Jun;21(6):985-90.

doi: 10.1101/gr.114777.110. Epub 2010 Oct 27.

Authors

Adam D Ewing¹, Haig H Kazazian Jr

Affiliation

¹ The McKusick-Nathans Institute for Genetic Medicine, The Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA.

PMID: 20980553
PMCID: PMC3106331
DOI: 10.1101/gr.114777.110

Abstract

High-throughput sequencing has recently begun to revolutionize the study of structural variants in the genomes of humans and other species. More recently, this technology and others have been applied to the study of human retrotransposon insertion polymorphisms (RIPs), yielding an unprecedented catalog of common and rare variants due to insertional mutagenesis. At the same time, the 1000 Genomes Project has released an enormous amount of whole-genome sequence data. In this article, we present evidence for 1016 L1 insertions across all studies to date that are not represented in the reference human genome assembly, many of which appear to be specific to populations or groups of populations, particularly Africans. Additionally, a cross-comparison of several studies shows that, on average, 27% of surveyed nonreference insertions is present in only one study, indicating the low frequency of many RIPs.

PubMed Disclaimer

Figures

**Figure 1.**
Histograms of filled-site allele frequencies estimated from L1 insertion site presence/absence counts for reference and nonreference L1 insertion sites (Supplemental Fig. S2a,b). (A) Estimated allele frequencies for detected L1 insertions shared with the reference genome. (B) Estimated allele frequencies for detected L1 insertions not present in the reference genome assembly.

**Figure 2.**
L1 insertions present in only one data set. The columns represent the fraction of insertions present in only the indicated data set out of all validated insertions in the data set (unique). Insertions can be validated by site-specific PCR, sequencing spanning the element, or presence in another independent data set, depending on the study. Iskow et al. (2010) presented data generated using two different sequencing methods as noted in the column labels. We analyzed the paired-end Illumina data from the 1000 Genomes Project (The 1000 Genomes Project Consortium 2010), Beck et al. (2010) employed a fosmid-end resequencing strategy, dbRIP cross-references data sets generated using a wide variety of techniques (Wang et al. 2006), and Ewing and Kazazian (2010) used Illumina sequencing.

**Figure 3.**
Bioinformatic procedures for identifying nonreference L1 insertions from whole-genome resequencing data. (Open boxes) Mapped reads indicating the presence of a nonreference L1; (gradient boxes) nonreference L1 insertions; (thicker horizontal lines) genomic sequence. (A) Identification of a nonreference L1 insertion from short-insert paired-end sequence reads. Short-insert paired-end reads where one end matches the reference genome and the other matches an L1 reference are clustered based on mapping location to the human genome reference assembly (*top*). The criteria for detection as discussed in Methods are labeled with numbers: (1) The 3′ end of the L1 insertion must be represented. (2) Reads must form tight clusters based on the locations of reads mapping to both the reference genome and the reference L1. (3) The minimum distance between the locations of genomic reads must be <100 bp, this interval contains the L1 insertion site (vertical bar). The orientation of the reads is annotated next to the open boxes representing the mapped read positions. (B) L1 insertions may be inverted on the 5′ end (Ostertag and Kazazian 2001), resulting in reads aligning to the reference L1 in the same orientation at the 5′ and 3′ ends of the L1 element. (C) Examples of outlier reads that are filtered as described in Methods. (1) The shaded paired read is an outlier because the locations of the reads corresponding to the L1 and the reference genome do not satisfy criteria 2 in panel A. (2) The shaded paired read is an outlier in terms of the reference L1 location. (3) The location of the shaded paired read is an outlier in terms of the reference genome relative to other reads in the cluster. (D) Identifying reads corresponding to the 3′ junction between the L1 poly-A tail and the reference genome sequence. Reads with 5′ or 3′ poly-T or poly-A stretches of at least six bases (1) are trimmed (2) and aligned to the reference genome assembly (3). Trimmed reads aligning to locations within the predicted L1 insertion (*A, 3*) site are identified (4).

See this image and copyright information in PMC

References

1. The 1000 Genomes Project Consortium 2010. A map of human genome variation from population scale sequencing. Nature 467: 1061–1073 - PMC - PubMed
1. Ahn SM, Kim TH, Lee S, Kim D, Ghang H, Kim DS, Kim BC, Kim SY, Kim WY, Kim C 2009. The first Korean genome sequence and analysis: full genome sequencing for a socio-ethnic group. Genome Res 19: 1622–1629 - PMC - PubMed
1. Akagi K, Li J, Stephens RM, Volfovsky N, Symer DE 2008. Extensive variation between inbred mouse strains due to endogenous L1 retrotransposition. Genome Res 18: 869–880 - PMC - PubMed
1. Badge RM, Alisch RS, Moran JV 2003. ATLAS: A system to selectively identify human-specific L1 insertions. Am J Hum Genet 72: 823–838 - PMC - PubMed
1. Beck CR, Collier P, Macfarlane C, Malig M, Kidd JM, Eichler EE, Badge RM, Moran JV 2010. LINE-1 retrotransposition activity in human genomes. Cell 141: 1159–1170 - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Whole-genome resequencing allows detection of many rare LINE-1 insertion alleles in humans

Affiliation

Whole-genome resequencing allows detection of many rare LINE-1 insertion alleles in humans

Authors

Affiliation

Abstract

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources