Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Jun;2(6):e60.
doi: 10.1371/journal.ppat.0020060. Epub 2006 Jun 23.

Retroviral DNA integration: viral and cellular determinants of target-site selection

Affiliations

Retroviral DNA integration: viral and cellular determinants of target-site selection

Mary K Lewinski et al. PLoS Pathog. 2006 Jun.

Abstract

Retroviruses differ in their preferences for sites for viral DNA integration in the chromosomes of infected cells. Human immunodeficiency virus (HIV) integrates preferentially within active transcription units, whereas murine leukemia virus (MLV) integrates preferentially near transcription start sites and CpG islands. We investigated the viral determinants of integration-site selection using HIV chimeras with MLV genes substituted for their HIV counterparts. We found that transferring the MLV integrase (IN) coding region into HIV (to make HIVmIN) caused the hybrid to integrate with a specificity close to that of MLV. Addition of MLV gag (to make HIVmGagmIN) further increased the similarity of target-site selection to that of MLV. A chimeric virus with MLV Gag only (HIVmGag) displayed targeting preferences different from that of both HIV and MLV, further implicating Gag proteins in targeting as well as IN. We also report a genome-wide analysis indicating that MLV, but not HIV, favors integration near DNase I-hypersensitive sites (i.e., +/- 1 kb), and that HIVmIN and HIVmGagmIN also favored integration near these features. These findings reveal that IN is the principal viral determinant of integration specificity; they also reveal a new role for Gag-derived proteins, and strengthen models for integration targeting based on tethering of viral IN proteins to host proteins.

PubMed Disclaimer

Conflict of interest statement

Competing interests. The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Retroviral DNA Integration and the Chimeric Viruses Used in This Study
(A) The DNA breaking and joining reactions mediating integration. Gray ovals represent integrase monomers, thick red lines represent viral DNA, black lines represent target DNA, and dots represent 5′ ends. (1) Linear blunt-ended viral cDNA is bound by integrase as part of the preintegration complex. (2) Integrase removes two nucleotides from the 3′ ends of the viral DNA, exposing recessed 3′ hydroxyl groups. (3) IN joins the recessed 3′ ends of viral DNA to the target DNA. (4) Unpairing of the target DNA between the joined ends of the viral DNA yields gaps in the target DNA. (5) DNA repair enzymes fill in the gaps. (6) The provirus is flanked by repeated segments of the target DNA. (B) Chimeric HIV derivatives containing segments of MLV. At the top is the HIV parent virus, with vpr and env inactivated and the puromycin resistance gene in place of nef. Following that are the chimeras, with substitutions of MLV gag gene segments (MA-, p12-, and CA-encoding regions) for HIV MA and CA, or substitution of MLV IN for HIV IN, or both [20,21]. The MLV genome (indicated by an asterisk) is shown for comparison. The MLV used in this study (MLVPuro) was an MLV-based vector (LPCX) encoding the puromycin resistance gene with Gag, Pol, and Env provided in trans. Although we refer to “Gag” in the text, we note that Gag is in fact a polyprotein which is cleaved into individual functional proteins by the action of the viral protease. (C) Target-sequence duplication lengths made by HIV, MLV, and the chimeric viruses. (D) Primary sequences at the site of integration for HIV, MLV, the chimeric viruses, and a previously published MLV dataset (MLV-Burgess). On the x-axis are the top strand positions surrounding the point of integration, which is represented by the blue arrow and line (between positions −1 and 0). For each dataset, the proportion of each base at a given location was divided by the proportion of that base in the matched random control set, such that a base with a y value >1 is present at an increased frequency, while a base with a y value <1 is present at a decreased frequency compared to random sites. A dashed red box surrounds the target sequence that is duplicated upon integration.
Figure 2
Figure 2. Positions of Retroviral Integration Sites on the Human Chromosomes
The human chromosomes are shown numbered. Centromeric regions (which are mostly unsequenced) are shown in gray. Relative gene density is indicated in the top bar on each chromosome by the intensity of the cyan coloration. Integration-site datasets (lower bars) are color-coded as indicated. Sites of integration near transcription start sites (within ± 5 kb), CpG islands (within ± 1 kb of a CpG midpoint), or 2 DNase I cleavage sites are shown as red dashes; other sites are black.
Figure 3
Figure 3. Frequency of Integration near Transcription Start Sites, CpG Islands, and DNase I Cleavage Sites, Illustrating the Contribution of MLV IN to Specificity
(A and B) The percentage of integration sites (per kb) within each interval is shown for (A) transcription start sites, and (B) CpG islands. (C) DNase I cleavage sites, For each dataset, the proportion of integration sites found within ± 1 kb of two DNase I cleavage sites was divided by the proportion in the matched random control set. The dotted line represents the expected bar height if the observed data did not differ from the random control set. L1 is displayed as the ratio over an unmatched random set. Three asterisks denote p < 0.0001 by chi-square comparison to random sites. Single asterisk denotes p = 0.0396.
Figure 4
Figure 4. Diagram of the Relationship of Transcription-Factor Binding Sites Enriched in the MLVPuro, MLV-Burgess, HIVmIN, and HIVmGagmIN Integration-Site Datasets
The genomic sequences within 1 kb of each integration site were used for analysis. Ten matched random-control integration sites were compared to each experimental integration site. The cut-off value for over-representation was 2.0-fold. All comparisons achieved p ≤ 0.001. The number of enriched transcription-factor binding sites in each dataset is shown with the number of factors unique to each in parentheses. The edge labels show the number of commonly enriched sites between pairs of datasets.
Figure 5
Figure 5. Effects of Gag Proteins on Integration Targeting
(A) Clustering by the machine learning algorithm RandomForest, illustrating an influence of Gag determinants as well as IN. For a detailed description of the method, see Protocol S3. (B) An analysis of the G/C percentage at integration sites.
Figure 6
Figure 6. Effects of Selection for Provirus Gene Expression on the Distribution of Integration Sites
(A) Frequencies of integration in RefSeq genes in the HIV-Burgess and HIVPuro datasets. (B) Comparison of the relative frequency of integration in the HIV-Burgess and HIVPuro datasets as a function of transcriptional intensity. The proportion of integration sites from each dataset in regions of varying transcriptional intensity was plotted increasing from left to right along the x-axis (in groups divided according to deciles of density). For “expressed genes” in this plot, we counted the number of genes whose expression level in HeLa cells was in the upper 1/8th of genes assayed on the HG-U133A microarray. The p-value given is the result of fitting a cubic polynomial to the expressed gene-density values. (C) Frequencies of integration in RefSeq genes in the MLV-Burgess and MLVPuro datasets. (D) Comparison of the relative frequency of integration in the MLV-Burgess and MLVPuro datasets as a function of transcriptional intensity. See Protocols S4 and S5 for details and additional plots.

References

    1. Check E. Gene therapy put on hold as third child develops cancer. Nature. 2005;433:561. - PubMed
    1. Hacein-Bey-Abina S, von Kalle C, Schmidt M, Le Deist F, Wulffraat N, et al. A serious adverse event after successful gene therapy for X-linked severe combined immunodeficiency. N Engl J Med. 2003;348:255–256. - PubMed
    1. Hacein-Bey-Abina S, Von Kalle C, Schmidt M, McCormack MP, Wulffraat N, et al. LMO2-associated clonal T cell proliferation in two patients after gene therapy for SCID-X1. Science. 2003;302:415–419. - PubMed
    1. Carteau S, Hoffmann C, Bushman F. Chromosome structure and human immunodeficiency virus type 1 cDNA integration: Centromeric alphoid repeats are a disfavored target. J Virol. 1998;72:4005–4014. - PMC - PubMed
    1. Holman AG, Coffin JM. Symmetrical base preferences surrounding HIV-1, avian sarcoma/leukosis virus, and murine leukemia virus integration sites. Proc Natl Acad Sci U S A. 2005;102:6103–6107. - PMC - PubMed

Publication types

MeSH terms