Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Sep 1;39(16):6864-78.
doi: 10.1093/nar/gkr337. Epub 2011 May 23.

Genome-wide analysis of mobile genetic element insertion sites

Affiliations

Genome-wide analysis of mobile genetic element insertion sites

Kamal Rawal et al. Nucleic Acids Res. .

Abstract

Mobile genetic elements (MGEs) account for a significant fraction of eukaryotic genomes and are implicated in altered gene expression and disease. We present an efficient computational protocol for MGE insertion site analysis. ELAN, the suite of tools described here uses standard techniques to identify different MGEs and their distribution on the genome. One component, DNASCANNER analyses known insertion sites of MGEs for the presence of signals that are based on a combination of local physical and chemical properties. ISF (insertion site finder) is a machine-learning tool that incorporates information derived from DNASCANNER. ISF permits classification of a given DNA sequence as a potential insertion site or not, using a support vector machine. We have studied the genomes of Homo sapiens, Mus musculus, Drosophila melanogaster and Entamoeba histolytica via a protocol whereby DNASCANNER is used to identify a common set of statistically important signals flanking the insertion sites in the various genomes. These are used in ISF for insertion site prediction, and the current accuracy of the tool is over 65%. We find similar signals at gene boundaries and splice sites. Together, these data are suggestive of a common insertion mechanism that operates in a variety of eukaryotes.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Schematic view of the various programs in the ELAN pipeline.
Figure 2.
Figure 2.
Flow chart depicting the sequence of procedures followed by DNA SCANNER to generate profiles for given DNA sequences. The score as function of position (x) is computed as formula image is defined as the parametric score of substring j (di/trinucleotide) derived from parameter file summed over the m substrings generated for a window of size w.
Figure 3.
Figure 3.
Adenine density upstream of insertion sites of Alu in human chromosome 2. The y-axis represents the value of the property under study (here this is the ‘A Rule’) and the x-axis represents the position with respect to insertion site (taken as position 0). In all cases, the properties we examine have been computed for both positive and negative datasets.
Figure 4.
Figure 4.
Distribution of Alu element across human genome. Four classes of elements (see ‘Materials and Methods’ section) are indicated with four different colors. The y-axis represents the frequency of elements found on the different chromosomes (marked along the x-axis).
Figure 5.
Figure 5.
Various signals upstream of the insertion sites of Alu in chromosome 2. The y axis represents value of the property and the x-axis gives the relative position with respect to the insertion site (taken to be 0) (Figure 3).
Figure 6.
Figure 6.
(A) DNA-denaturation profile of pre-insertion loci of ABCD1 gene reported to be disrupted by Alu element at position 0. (B) Bendability profile of preinsertion loci of Dystrophin gene reported to be disrupted by L1 element at position 0. (C) DNA-denaturation profile of pre-insertion loci of Spectrin gene reported to be disrupted by SVA element at position 0. (D) DNA-denaturation profile of pre-insertion loci of APC gene reported to be disrupted by Alu element at position 0. In this example the 1000 bp of sequence flanking insertion site of L1 element.
Figure 7.
Figure 7.
(A) The DNA-denaturation profile of DNA sequences from EPD comprising B. taurus promoters (−500 bp) and genes (400 bp). The +1 represent TSS of the gene. The window size was 100 bp of total length. (B) Propeller twist profile of same dataset. (C) Propeller twist profile of promoters of Xenopus. (D) DNA-denaturation profile of viral genes.
Figure 8.
Figure 8.
Most instances of the canonical TTAAAA motifs are unrelated to known Alu insertion sites. Shown here is a histogram of distances of TTAAAA to the nearest Alu insertion site, and as can be seen, >70% are >100-bp away from the nearest Alus (see http://nldsps.jnu.ac.in/elan.html for more details).

Similar articles

Cited by

References

    1. Birney E, Andrews D, Caccamo M, Chen Y, Clarke L, Coates G, Cox T, Cunningham F, Curwen V, Cutts T, et al. Ensembl 2006. Nucleic Acids Res. 2006;34:D556–D561. - PMC - PubMed
    1. Vallenet D, Labarre L, Rouy Z, Barbe V, Bocs S, Cruveiller S, Lajus A, Pascal G, Scarpelli C, Medigue C. MaGe–A microbial genome annotation system supported by synteny results. Nucleic Acids Res. 2006;34:53–65. - PMC - PubMed
    1. Meyer F, Goesmann A, McHardy AC, Bartels D, Bekel T, Clausen J, Kalinowski J, Linke B, Rupp O, Giegerich R, et al. GenDB—an open source genome annotation system for prokaryote genomes. Nucleic Acids Res. 2003;31:2187–2195. - PMC - PubMed
    1. Sakata K, Nagamura Y, Numa H, Antonio BA, Nagasaki H, Idonuma A, Watanabe W, Shimizu Y, Horiuchi I, Matsumoto T, et al. RiceGAAS: an automated annotation system and database for rice genome sequence. Nucleic Acids Res. 2002;30:98–102. - PMC - PubMed
    1. Burge C, Karlin S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 1997;268:78–94. - PubMed

Publication types

LinkOut - more resources