Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Jul;39(Web Server issue):W567-75.
doi: 10.1093/nar/gkr506.

inGAP-sv: a novel scheme to identify and visualize structural variation from paired end mapping data

Affiliations

inGAP-sv: a novel scheme to identify and visualize structural variation from paired end mapping data

Ji Qi et al. Nucleic Acids Res. 2011 Jul.

Abstract

Mining genetic variation from personal genomes is a crucial step towards investigating the relationship between genotype and phenotype. However, compared to the detection of SNPs and small indels, characterizing large and particularly complex structural variation is much more difficult and less intuitive. In this article, we present a new scheme (inGAP-sv) to detect and visualize structural variation from paired-end mapping data. Under this scheme, abnormally mapped read pairs are clustered based on the location of a gap signature. Several important features, including local depth of coverage, mapping quality and associated tandem repeat, are used to evaluate the quality of predicted structural variation. Compared with other approaches, it can detect many more large insertions and complex variants with lower false discovery rate. Moreover, inGAP-sv, written in Java programming language, provides a user-friendly interface and can be performed in multiple operating systems. It can be freely accessed at http://ingap.sourceforge.net/.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
The pipeline of SV detection in inGAP-sv. (A) A three-step strategy to detect SVs. (B) The workflow of DOC signature detection. Briefly, a DOC signature can be defined as a part of reference sequence covered by much fewer normally mapped reads than the local physical DOC. As shown in B, a gap is identified initially by pairing of its left and right boundaries. In a region with continuously descending pDOC values, the left boundary is set to be the location whose pDOC is smaller than three-quarters of its upstream local pDOC. The right boundary is determined based on the same rule. Incomplete gap signatures with only one side of boundaries will be ignored, which possibly result from sequencing coverage bias. Subsequently, inGAP-sv uses abnormally mapped read pairs adjacent to each gap to recalibrate boundaries. Finally, gaps with fine adjustment of boundaries are filtered out if its average pDOC exceeds three-fourth of the local pDOC value.
Figure 2.
Figure 2.
Illustrations of PEM patterns for different types of SVs. Grey links indicate normally mapped read pairs with proper read orientation and distance. Light blue links represent read pairs with proper read orientation but longer distance, which may indicate a deletion event in the query sequence. Green links represent read pairs with proper orientation but shorter distance, and thus indicate an insertion. Dark blue links show read pairs with abnormal orientation, in which paired ends are mapped to the wrong strand(s). Yellow lines indicate single-end mapped reads (SE reads), in which only one of the paired reads is mapped. Pink lines indicate a pair of reads mapped to different chromosomes. All gap signatures for different SVs are shown in blue oval circles. (A) For a small insertion (< the insert size), a fraction of paired reads (in green) that span the insertion is mapped too closely in the reference. Meantime, the insertion is surrounded by a set of single-end mapped reads (in yellow). (B) For a large insertion, no paired reads can span the insertion and only single-end mapped reads are present. (C) For a homozygous deletion, all the paired reads (in blue) are mapped farther than expected. (D) For a heterozygous deletion, normally mapped pairs (in grey) span the gap signature. (E) A translocation is represented by two sets of distantly mapped pairs and one set of inverted mapped pair (in dark blue). (F) An inversion causes the paired reads to change the orientation, and both ends will map to the same strand. (G) A segmental tandem duplication is represented by one set of distantly mapped reads and one set of inverted mapped reads.
Figure 3.
Figure 3.
Performance of indel detection by inGAP-sv on both simulated (A and B) and real data sets (C). (A) True positive rate of indel detection at different levels of standard deviation of insert size. When the insert size variance increases, the true positive rate of deletion detection slightly decreases; (B) Plot of true positive rate of indel size. inGAP-sv fails to detect very small indels (<50 bp), while works well on the detection of large indels. (C) Size of gap signatures detected by inGAP-sv under different sequence depth from NA18507; (D) Distribution of indel size predicted by inGAP-sv with 38X data from NA18507. Blue bars indicate the number of small deletions, while red bars indicate small insertions. Moreover, inGAP-sv detected 729 deletions >400 bp and 524 insertions larger than the insert size, which are not shown in the graph.
Figure 4.
Figure 4.
Performance comparison between inGAP-sv and other tools (Breakdancer, VariationHunter and Spanner). (A) The Gold standard SV set (GS2) is used to assess the detection sensitivity of the four methods for an individual NA12878. inGAP-sv can call 80% of deletions in GS2, which is slightly higher than the other three tools. Most importantly, the number of deletions (shown in blue) identified by inGAP-sv and another tool simultaneously is significantly higher than that without inGAP-sv (shown in red). The detailed list of identified deletions for each tool is shown in Supplementary Table S1. (B) A Venn diagram shows the comparison of the deletion calls made by the four tools on chr20 of NA12878.

References

    1. Feuk L, Carson AR, Scherer SW. Structural variation in the human genome. Nat. Rev. Genet. 2006;7:85–97. - PubMed
    1. Sharp AJ, Cheng Z, Eichler EE. Structural variation of the human genome. Annu. Rev. Genomics Hum. Genet. 2006;7:407–442. - PubMed
    1. Stankiewicz P, Lupski JR. Structural variation in the human genome and its role in disease. Annu. Rev. Med. 2010;61:437–455. - PubMed
    1. Tuzun E, Sharp AJ, Bailey JA, Kaul R, Morrison VA, Pertz LM, Haugen E, Hayden H, Albertson D, Pinkel D, et al. Fine-scale structural variation of the human genome. Nat. Genet. 2005;37:727–732. - PubMed
    1. Korbel JO, Urban AE, Affourtit JP, Godwin B, Grubert F, Simons JF, Kim PM, Palejev D, Carriero NJ, Du L, et al. Paired-end mapping reveals extensive structural variation in the human genome. Science. 2007;318:420–426. - PMC - PubMed

Publication types