Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Meta-Analysis
. 2023 Feb 7;15(2):465.
doi: 10.3390/v15020465.

Specialized DNA Structures Act as Genomic Beacons for Integration by Evolutionarily Diverse Retroviruses

Affiliations
Meta-Analysis

Specialized DNA Structures Act as Genomic Beacons for Integration by Evolutionarily Diverse Retroviruses

Hinissan P Kohio et al. Viruses. .

Abstract

Retroviral integration site targeting is not random and plays a critical role in expression and long-term survival of the integrated provirus. To better understand the genomic environment surrounding retroviral integration sites, we performed a meta-analysis of previously published integration site data from evolutionarily diverse retroviruses, including new experimental data from HIV-1 subtypes A, B, C and D. We show here that evolutionarily divergent retroviruses exhibit distinct integration site profiles with strong preferences for integration near non-canonical B-form DNA (non-B DNA). We also show that in vivo-derived HIV-1 integration sites are significantly more enriched in transcriptionally silent regions and transcription-silencing non-B DNA features of the genome compared to in vitro-derived HIV-1 integration sites. Integration sites from individuals infected with HIV-1 subtype A, B, C or D viruses exhibited different preferences for common genomic and non-B DNA features. In addition, we identified several integration site hotspots shared between different HIV-1 subtypes, all of which were located in the non-B DNA feature slipped DNA. Together, these data show that although evolutionarily divergent retroviruses exhibit distinct integration site profiles, they all target non-B DNA for integration. These findings provide new insight into how retroviruses integrate into genomes for long-term survival.

Keywords: HIV; genome; integration; integration hotspots; non-B DNA; retroviruses; slipped DNA.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Evolutionarily diverse retroviruses exhibit distinct integration site preferences. (A) Heatmaps depicting the fold enrichment or depletion of integration sites near common genomic features compared to matched random controls. Darker shades represent larger fold-changes in the ratio of integration sites to matched random control sites. Blue color indicates enriched sites, red for depleted). Bins represent the distance of the integration sites from each genomic feature. Bin 0 = within the feature; Bin 1 = 1–499 bp; Bin 2 = 500–4999 bp; Bin 3 = 5000–49,999 bp; Bin 4 = >49,999 bp away from the feature. Heatmaps of the diverse retrovirus genera were superimposed on a BioNJ tree constructed using their reverse transcriptase amino acid sequences using the Dayhoff substitution model with 1000 bootstraps. All branches are scaled according to the number of amino acid changes per site. The phylogenetic tree shows the evolutionary relatedness of the different retrovirus genera only. Significant differences are denoted by asterisks (*p < 0.05; ** p < 0.01; *** p < 0.001; **** p < 0.0001) (Fisher’s exact test, two-tailed). HIV-1 = human immunodeficiency virus, SIV = simian immunodeficiency virus (isolated from a pig-tailed macaque), FIV = feline immunodeficiency virus, HTLV-1 = human T-lymphotrophic virus Type 1, MLV = murine leukemia virus, FV = foamy virus, ASLV = avian sarcoma leucosis virus, MMTV = mouse mammary tumor virus. (B) Proportion of the retroviral integration sites located within genes, compared to the random control (blue lines). (C) Nuclear localization of integration sites was determined by quantifying the proportion of total integrations that fell within a lamin-associated domain (LAD) (=1) as opposed to outside an LAD (=0). (D) Pairwise analysis was performed on the retroviral integration site profile preferences (based on fold enrichment and depletion values within 5000 bp of each feature) using the Euclidean distance as the measurement method (Heatmapper) [56]. Weaker relationships between retroviral integration site profiles are indicated by darker red color in the pairwise distance matrix, whereas stronger relationships are indicated by darker blue color. * p < 0.05; ** p < 0.01; *** p < 0.001; **** p < 0.0001; n.s., not significant; Fisher’s exact test, two-sided. Infinite number (inf), 1 or more integrations were observed when 0 integrations were expected by chance. Not a number (nan), 0 integrations were observed and 0 were expected by chance.
Figure 2
Figure 2
Evolutionarily diverse retroviruses target non-B DNA for integration. (A) Heatmaps illustrating the fold-enrichment or -depletion of unique retroviral integration sites near non-B DNA features compared to matched random controls. Darker shades represent larger fold-changes in the ratio of integration sites to matched random control sites. Blue color indicates enriched sites, red for depleted). The distance in base pairs away from the non-B DNA features are shown above each heatmap. Heatmaps of the diverse retrovirus genera were superimposed on a BioNJ tree constructed using their reverse transcriptase amino acid sequences using the Dayhoff substitution model with 1000 bootstraps. All branches are scaled according to number of amino acid changes per site. The phylogenetic tree shows the evolutionary relatedness of the different retrovirus genera only. (B) Fold change in the percentage of integration sites within 500 bp of various non-B DNA compared to random. (C) Pairwise analysis was performed on the retroviral integration site profile preferences (based on fold enrichment and depletion values within 500 bp of each feature) using the Euclidean distance as the measurement method (Heatmapper) [56]. Weaker relationships between retroviral integration site profiles are indicated by darker red color in the pairwise distance matrix, whereas stronger relationships are indicated by darker blue color. Significant differences are denoted by asterisks * p < 0.05; ** p < 0.01; *** p < 0.001; **** p < 0.0001) (Fisher’s exact test, two-sided).
Figure 3
Figure 3
Integration site profiles differ between in vitro- and in vivo-derived datasets. (AC) Heatmaps illustrating the fold-enrichment or depletion of unique integration sites (compared to the matched random control) near common genomic features from in vivo-derived datasets (n = 22,372 sites) (A), in vitro-derived datasets (n = 67,659 sites) (B), or a comparison of in vivo-derived with in vitro-derived sites (C). Numbers represent the fold-change in the percentage of integration sites. (D) Comparison of the percentage of integration sites within 5000 bp of common genomic features between in vitro- and in vivo-derived datasets. (E) Venn diagram showing the number of genes targeted for integration that were unique, or shared by, the in vivo- and in vitro-derived integration site datasets. (FH) Heatmaps illustrating the fold-enrichment of unique integration sites compared to the matched random control near non-B DNA from in vivo-derived (F) and in vitro-derived (G) or a comparison of in vivo-derived with in vitro-derived sites datasets (H). (I) Comparisons of the percentage of integration sites within 500 bp of non-B DNA between in vitro- and in vivo-derived datasets. Significant differences are denoted by asterisks * p < 0.05; ** p < 0.01; *** p < 0.001; **** p < 0.0001) (Fisher’s exact test, two-sided).
Figure 4
Figure 4
HIV-1 subtype A, B, C and D have different integration site preferences for genomic features. (A) Comparison of the percentage of integration sites in vivo near common genomic features between HIV-1 subtypes A, B, C and D. Inset numbers represent the percentages of total integrations directly within the feature. Statistical comparisons were performed with respect to subtype B. Significant differences are denoted by asterisks * p < 0.05; ** p < 0.01; *** p < 0.001; **** p < 0.0001 (Fisher’s exact test, two-sided). (B) Heatmaps depicting the fold enrichment or depletion of integration sites near common genomic features compared to the matched random control. Darker shades represent higher fold-changes in the ratio of integration sites to matched random control sites. Distance bins in A and B represent the distance of the integration sites in base pairs away from the genomic feature. (C) Pairwise analysis was performed on the retroviral integration site profile preferences (based on fold enrichment and depletion values within 5000 bp of each feature) using the Euclidean distance as the measurement method (Heatmapper) [56]. Weaker relationships between retroviral integration site profiles are indicated by darker red color in the pairwise distance matrix, whereas stronger relationships are indicated by darker blue color. Significant differences are denoted by asterisks * p < 0.05; ** p < 0.01; *** p < 0.001; **** p < 0.0001 (Fisher’s exact test, two-sided).
Figure 5
Figure 5
HIV-1 subtype A, B, C and D have different integration site preferences for non-B DNA. (A) Comparison of the percentage of integration sites in vivo near non-B DNA features between HIV-1 subtypes A, B, C and D. Inset percentages refer to the total integrations within 500 bp of the feature. Statistical comparisons were performed with respect to subtype B. Significant differences are denoted by asterisks * p < 0.05; ** p < 0.01; *** p < 0.001; **** p < 0.0001 (Fisher’s exact test, two-sided). (B) Heatmaps depicting the fold enrichment or depletion of integration sites near non-B DNA compared to the matched random control. Darker shades represent higher fold-changes in the ratio of integration sites to matched random control sites. Bins in A and B represent the distance of the integration sites in base pairs away from the non-B DNA feature. (C) Pairwise analysis was performed on the retroviral integration site profile preferences (based on fold enrichment and depletion values within 500 bp of each feature) using the Euclidean distance as the measurement method (Heatmapper) [56]. Weaker relationships between retroviral integration site profiles are indicated by darker red color in the pairwise distance matrix, whereas stronger relationships are indicated by darker blue color. Significant differences are denoted by asterisks * p < 0.05; ** p < 0.01; *** p < 0.001; **** p < 0.0001 (Fisher’s exact test, two-sided). (D) Estimates of evolutionary divergence over sequence pairs between groups. The number of amino acid substitutions per site from averaging over all sequence pairs between groups are shown. Analyses were conducted using the Poisson correction model. This analysis involved 486 amino acid sequences. The coding data was translated assuming a standard genetic code table. All ambiguous positions were removed for each sequence pair (pairwise deletion option). There was a total of 265 positions in the final dataset. Evolutionary analyses were conducted in MEGA X. (E) Amino acid alignment of the C-terminal domain of HIV-1 integrase from subtypes A, B, C and D.
Figure 6
Figure 6
Integration hotspots identified from individuals infected with different HIV-1 subtypes. (A) 1000 bp windows of genomic DNA hosting two or more integration sites (“hotspots”) were quantified for each HIV-1 subtype and summarized as the percentage of the total number of integration sites falling within a hotspot. **, p < 0.01; ***, p < 0.001; ns, not significant; Fisher’s exact test, two-sided. (B) All genes targeted by each HIV-1 subtype were filtered and compared to each other to identify genes uniquely targeted by each subtype or genes targeted by more than one subtype. The Venn diagram shows the number of unique and shared genes between the different subtypes. (C) All genes hosting two or more integration sites (‘gene hotspots’) were filtered for each HIV-1 subtype and compared to each other to identify genes targeted by two or more subtypes. The ribbons emerging from each subtype in the Circos plot connect to the genes (each represented by a different colored box) shared by other subtypes. The multi-colored bars next to each gene name summarize the subtypes targeting those genes. (D) All genes hosting two or more integration sites that were <1000 bp apart (‘gene super-hotspots) were filtered for each HIV-1 subtype and compared. The chromosomal location and gene names of the gene hotspots targeted by two or more subtypes are shown at their approximate chromosomal location on their respective human chromosome. Identical chromosomal locations indicate shared integration sites between the different datasets.
Figure 7
Figure 7
Integration site hotspots for HIV-1 subtypes A, B, C and D are located in non-B DNA. (A) Genomic sequences were extracted from a window of 100 nucleotides upstream and 100 nucleotides downstream of each integration site. Sequences from integration sites located in hotspots were compared to sequences from sites not located in hotspots using DiffLogo. Consensus sequences were analyzed for the presence of non-B DNA motifs and represented by colored lines above each DiffLogo image (orange, slipped DNA motif; blue, G4 DNA motif). The top half of each DiffLogo represents sequences from hotspots and the lower half represents sequences from non-hotspots. (BD) Example sequences and graphical representations of slipped DNA (B), G4 DNA (C) and slipped plus G4 DNA (D) features.

References

    1. Daniel R., Greger J.G., Katz R.A., Taganov K.D., Wu X., Kappes J.C., Skalka A.M. Evidence That Stable Retroviral Transduction and Cell Survival Following DNA Integration Depend on Components of the Nonhomologous End Joining Repair Pathway. J. Virol. 2004;78:8573–8581. doi: 10.1128/JVI.78.16.8573-8581.2004. - DOI - PMC - PubMed
    1. Wu X., Li Y., Crise B., Burgess S.M. Transcription Start Regions in the Human Genome Are Favored Targets for MLV Integration. Science. 2003;300:1749–1751. doi: 10.1126/science.1083413. - DOI - PubMed
    1. Felice B., Cattoglio C., Cittaro D., Testa A., Miccio A., Ferrari G., Luzi L., Recchia A., Mavilio F. Transcription Factor Binding Sites Are Genetic Determinants of Retroviral Integration in the Human Genome. PLoS ONE. 2009;4:e4571. doi: 10.1371/journal.pone.0004571. - DOI - PMC - PubMed
    1. Trobridge G.D., Miller D.G., Jacobs M.A., Allen J.M., Kiem H.-P., Kaul R., Russell D.W. Foamy Virus Vector Integration Sites in Normal Human Cells. Proc. Natl. Acad. Sci. USA. 2006;103:1498–1503. doi: 10.1073/pnas.0510046103. - DOI - PMC - PubMed
    1. Barr S.D., Leipzig J., Shinn P., Ecker J.R., Bushman F.D. Integration Targeting by Avian Sarcoma-Leukosis Virus and Human Immunodeficiency Virus in the Chicken Genome. J. Virol. 2005;79:12035–12044. doi: 10.1128/JVI.79.18.12035-12044.2005. - DOI - PMC - PubMed

Publication types

LinkOut - more resources