Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Nov 27;51(21):11453-11465.
doi: 10.1093/nar/gkad821.

The landscape of human SVA retrotransposons

Affiliations

The landscape of human SVA retrotransposons

Chong Chu et al. Nucleic Acids Res. .

Abstract

SINE-VNTR-Alu (SVA) retrotransposons are evolutionarily young and still-active transposable elements (TEs) in the human genome. Several pathogenic SVA insertions have been identified that directly mutate host genes to cause neurodegenerative and other types of diseases. However, due to their sequence heterogeneity and complex structures as well as limitations in sequencing techniques and analysis, SVA insertions have been less well studied compared to other mobile element insertions. Here, we identified polymorphic SVA insertions from 3646 whole-genome sequencing (WGS) samples of >150 diverse populations and constructed a polymorphic SVA insertion reference catalog. Using 20 long-read samples, we also assembled reference and polymorphic SVA sequences and characterized the internal hexamer/variable-number-tandem-repeat (VNTR) expansions as well as differing SVA activity for SVA subfamilies and human populations. In addition, we developed a module to annotate both reference and polymorphic SVA copies. By characterizing the landscape of both reference and polymorphic SVA retrotransposons, our study enables more accurate genotyping of these elements and facilitate the discovery of pathogenic SVA insertions.

PubMed Disclaimer

Figures

Graphical Abstract
Graphical Abstract
Figure 1.
Figure 1.
SVA retrotransposon analysis workflow. (A) First, we run the xTea germline module on 3646 whole-genome samples. The integrated call set provides a comprehensive SVA reference map, defines population-specific SVA insertions, and identifies ‘hot’ source elements based on transductions. (B) In addition, we run the xTea long-read module on 20 Oxford Nanopore and PacBio long-read samples, and construct the full copies of both polymorphic and reference SVAs. (C) We developed a new module for SVA annotation. With refined annotation, we annotate the internal structure of the fully constructed SVAs, which allows us to characterize the distribution of hexamer and VNTR lengths and construct the SVA phylogeny tree to explore the SVA activity.
Figure 2.
Figure 2.
Polymorphic SVA insertions from diverse populations and accuracy benchmarking. (A) Based on the 7554 polymorphic insertions we identified, the number of SVA insertions per sample is shown. On average, African samples have more SVA insertions, and Central Asian samples have fewer SVA insertions. (B) Between our 7554 polymorphic SVA insertions and the 6417 released in gnomAD-SV, only 1565 were shared. Most of the gnomAD-SV-specific insertions had low population allele frequency (AF) (<0.01); for xTea-specific ones, the AF distribution was shifted to the right, with the majority higher AF (>0.01). The overlapping insertions showed similar density in the two groups, with a small portion showing lower AF in gnomAD-SV. (C) We identified and annotated 635 SVA insertions from the HPRC pan-genome graph. 39 of the 42 samples used to construct the graph had short read data. To benchmark the performance of xTea, we ran it on those 39 samples and generate 716 polymorphic SVA insertions. Between the two sets, 515 are in common, 201 are xTea-specific, and 120 are HPRC-specific. In comparison, the HPRC and the gnomAD-SV (v2.1.1) sets have only 323 (50.9%) in common. (D) We selected 9 samples for further analysis. Among the xTea calls, those overlapping with the HPRC are shown in purple. Of the rest, some overlap with the call set generated by Sniffles2 (an SV caller for long read data). (E, F) We validated those overlapping with Sniffles2 with PCR. 7 (out of 9) and 10 (out of 11) candidates were validated for HG02055 and HG02145, respectively.
Figure 3.
Figure 3.
Population-specific polymorphic SVA insertions and the reference SVA copies. (A) Within the 7554 polymorphic insertions we identified, many population-specific insertions had high AF, especially for OCN followed by AFR. (B) PCA analysis showed the population specificity of the SVA insertions, especially for AFR, EAS and EUR. (C) Of the 7554 polymorphic insertions, 2670 and 4884 were of full length and truncated, respectively. Of the 5107 reference SVA copies, 1927 and 3180 were full-length and truncated, respectively. (D) Among the subfamilies, SVA_D and SVA_A were well represented. Within all the subfamilies, more than half of the SVA copies (2618/5107) fell in intronic regions.
Figure 4.
Figure 4.
Polymorphic SVA insertion from long reads and internal repeats expansion. (A) From 20 long read samples, we fully constructed 26 SVA_D, 125 SVA_E, 145 SVA_F, 18 SVA_F1 and specifically 39 CH10_SVA_F polymorphic SVA insertions. (B) 78% (274/353) SVA insertions are found in ⇐3 samples and 51% (179/353) are found in only one sample, indicating that SVAs are young and active. Fully assembled SVA copies provide the opportunity to check the length of SVA. (C) The length of both reference and polymorphic SVA copies is variable among the subfamilies. On average, SVA_E and SVA_F are longer than other subfamilies, while SVA_A is longer than SVA_B, SVA_C and SVA_D. (D) The length of the hexamer is also variable by subfamily (SVA_F1 and CH10_SVA_F do not have hexamer, thus not shown), with the some polymorphic SVA_E have long hexamers. (E) Similarly, the length of the VNTR regions is variable by subfamily and it is the major contributor to the variable length of the full copies. For SVA_D, SVA_E and SVA_F, the polymorphic copies are clearly longer than the reference ones (C).
Figure 5.
Figure 5.
Phylogenetic analysis of SVA retrotransposons and activity by subfamily. For subfamilies SVA_E and SVA_F, we selected the long read-assembled polymorphic SVA insertions with an integrated SINE-R region and merged them with those full-length reference SVA copies. (A) Left: We built the phylogenetic tree for the 118 polymorphic and 100 reference full-length SVA_E copies. The highlighted branch (in blue) is the youngest and a very active branch with 57 (out of 70) polymorphic SVA copies. Right: Similarly, for 156 polymorphic and 192 full-length reference SVA_F copies. Surprisingly, some middle-aged branches are active. For example, the green and purple branches have 41 (out of 68) and 45 (out of 52) polymorphic SVA copies, respectively. (B) We summarized the source copies that have ≥5 offspring insertions from the germline insertion set called from the 3646 samples, divided by subfamily (SVA_E or SVA_F) and transduction type (5′ or 3′). From SVA_F, one ‘hot’ SVA_E source element at chr17 has 65 offspring insertions with a 5′ transduction. (C) The first column shows the total number of SVA transductions per population. The table shows the population AF for selected ‘hot’ SVA_E and SVA_F source elements. Each column is one selected hot SVA source element, and each cell is the ratio of the number of offspring from the specific population to the total number of transductions of the population.

Similar articles

Cited by

References

    1. Shen L., Wu L.C., Sanlioglu S., Chen R., Mendoza A.R., Dangel A.W., Carroll M.C., Zipf W.B., Yu C.-Y.. Structure and genetics of the partially duplicated gene RP located immediately upstream of the complement C4A and the C4B genes in the HLA class III region. Molecular cloning, exon-intron structure, composite retroposon, and breakpoint of gene duplication. J. Biol. Chem. 1994; 269:8466–8476. - PubMed
    1. Wang H., Xing J., Grover D., Hedges D.J., Han K., Walker J.A., Batzer M.A.. SVA elements: a hominid-specific retroposon family. J. Mol. Biol. 2005; 354:994–1007. - PubMed
    1. Hancks D.C., Ewing A.D., Chen J.E., Tokunaga K., Kazazian H.H. Jr. Exon-trapping mediated by the human retrotransposon SVA. Genome Res. 2009; 19:1983–1991. - PMC - PubMed
    1. Han K., Konkel M.K., Xing J., Wang H., Lee J., Meyer T.J., Huang C.T., Sandifer E., Hebert K., Barnes E.W.et al.. Mobile DNA in Old World monkeys: a glimpse through the rhesus macaque genome. Science. 2007; 316:238–240. - PubMed
    1. Hancks D.C., Kazazian H.H. Jr. SVA retrotransposons: evolution and genetic instability. Semin. Cancer Biol. 2010; 20:234–245. - PMC - PubMed