Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Sep;15(18):1294-8.
doi: 10.1038/gt.2008.99. Epub 2008 Jun 26.

Automated analysis of viral integration sites in gene therapy research using the SeqMap web resource

Affiliations

Automated analysis of viral integration sites in gene therapy research using the SeqMap web resource

B Peters et al. Gene Ther. 2008 Sep.

Abstract

Research in gene therapy involving genome-integrating vectors now often includes analysis of vector integration sites across the genome using methods such as ligation-mediated PCR (LM-PCR) or linear amplification-mediated PCR (LAM-PCR). To help researchers analyze these sites and the functions of nearby genes, we have developed SeqMap (http://seqmap.compbio.iupui.edu/) a secure, web-based comprehensive vector integration site management tool that automatically analyzes and annotates large numbers of vector integration sites derived from LM-PCR experiments in human and model organisms upon a common genome database. We believe the use of this resource will enable better reproducibility and understanding of this important data.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Automated workflow for annotation in SeqMap
First, the user must specify vector sequence to be removed from the input sequences, this only needs to be done once per LM-PCR protocol. Next, sequences are inputted in FASTA format, one for each submitted sequence. Then each sequence is mapped to the genome, using the following three step protocol. First, Chaos, a local alignment method, is used to identify regions of vector in the inputted sequence, regions of vector are replaced with the letter `N'. Second, Censor, a repeat masking algorithm, is applied to the vector removed sequence to remove repeating elements. Finally, the resulting sequence is mapped to the genome build using Blat. Then, nearby RefSeq (http://www.ncbi.nlm.nih.gov/RefSeq/) gene products are identified near the integration site using the both the Ensembl and UCSC annotation databases, only genes with Entrez Gene IDs or MGI IDs are considered. Distances are from the specified integration site to the `txStart' location in UCSC or the `Start' location in Ensembl. Parameters for each of these tools are published on our online reference guide at http://seqmap.compbio.iupui.edu/.
Figure 2
Figure 2. Example of an integration site submission
At left, is the submission summary (`A' in figure), and below, each submission is summarized by sequence name (B). At right, is a specific integration site summary page (C). Each sequence submission is summarized with whether it is confirmed by a technician, used for analysis, contains a technician comment, whether it is found on the genome, and what UCSC and Ensembl have annotated as the closest gene. Genes in the RTCGD are highlighted in bold. A gene structure map is visualized showing any original sequence errors (“N's”), the blocks removed by vector removal, any repeating elements, the genomic region mapped to the genome, and the proposed integration site. The structure of the inputted sequence is shown in user configurable colors in both image map and the sequences. A summary of all status, comments and nearby genes found with links to a complete summary of that region of the genome for each annotation database. For model organisms, human orthologs are provided using the Jackson laboratory ortholog tables (http://www.informatics.jax.org/). The sequences outputted by each step of the preparation process.
Figure 3
Figure 3. Example analysis of specific groups of integration sites
On the gene summary page for a submission, a map of the genomic region around an integration site is displayed for each integration event with the BLAT hit on both the UCSC and Ensembl datasets. Nearby transcripts are displayed in green, with an arrow pointing toward coding direction. Exons (red boxes) and introns (black lines) are also displayed showing transcript structure.

References

    1. Wu X, Burgess SM. Integration target site selection for retroviruses and transposable elements. Cell Mol Life Sci. 2004;61:2588–2596. - PMC - PubMed
    1. Ott MG, Schmidt M, Schwarzwaelder K, Stein S, Siler U, Koehl U, et al. Correction of X-linked chronic granulomatous disease by gene therapy, augmented by insertional activation of MDS1-EVI1, PRDM16 or SETBP1. Nat Med. 2006;12:401–409. - PubMed
    1. Kustikova OS, Geiger H, Li Z, Brugman MH, Chambers SM, Shaw CA, et al. Retroviral vector insertion sites associated with dominant hematopoietic clones mark “stemness” pathways. Blood. 2006 - PMC - PubMed
    1. Du Y, Spence SE, Jenkins NA, Copeland NG. Cooperating cancer-gene identification through oncogenic-retrovirus-induced insertional mutagenesis. Blood. 2005;106:2498–2505. - PMC - PubMed
    1. Hematti P, Hong BK, Ferguson C, Adler R, Hanawa H, Sellers S, et al. Distinct genomic integration of MLV and SIV vectors in primate hematopoietic stem and progenitor cells. PLoS Biol. 2004;2:e423. - PMC - PubMed

Publication types