Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Nov 16;18(4):20210032.
doi: 10.1515/jib-2021-0032.

Fast alignment of reads to a variation graph with application to SNP detection

Affiliations

Fast alignment of reads to a variation graph with application to SNP detection

Maurilio Monsu et al. J Integr Bioinform. .

Abstract

Sequencing technologies has provided the basis of most modern genome sequencing studies due to its high base-level accuracy and relatively low cost. One of the most demanding step is mapping reads to the human reference genome. The reliance on a single reference human genome could introduce substantial biases in downstream analyses. Pangenomic graph reference representations offer an attractive approach for storing genetic variations. Moreover, it is possible to include known variants in the reference in order to make read mapping, variant calling, and genotyping variant-aware. Only recently a framework for variation graphs, vg [Garrison E, Adam MN, Siren J, et al. Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat Biotechnol 2018;36:875-9], have improved variation-aware alignment and variant calling in general. The major bottleneck of vg is its high cost of reads mapping to a variation graph. In this paper we study the problem of SNP calling on a variation graph and we present a fast reads alignment tool, named VG SNP-Aware. VG SNP-Aware is able align reads exactly to a variation graph and detect SNPs based on these aligned reads. The results show that VG SNP-Aware can efficiently map reads to a variation graph with a speedup of 40× with respect to vg and similar accuracy on SNPs detection.

Keywords: SNP detection; reads alignment; variation graph.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Example of a variation graph with a biallelic SNP. The two alternative bases are included in two distinct paths.
Figure 2:
Figure 2:
(a) Mapping of the read “ATGTT”, which includes an alternative allele of an SNP, on the variation graph in the forward direction; (b) mapping of read “AAAAT”, which maps to the reference, on the complemented variation graph in the reverse direction.
Figure 3:
Figure 3:
The vg pipeline: graph construction, graph indexing, reads mapping, variant calling.

References

    1. The 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 2010. 467:1061–73. 10.1038/nature09534. - DOI - PMC - PubMed
    1. Arita M, Karsch-Mizrachi I, Guy C, INSDC . The international nucleotide sequence database collaboration. Nucleic Acids Res 2020;49:D121–4. 10.1093/nar/gkaa967. - DOI - PMC - PubMed
    1. Brandt DYC, Aguiar VRC, Bitarello BD, Nunes K, Goudet J, Meyer D. Mapping bias overestimates reference allele frequencies at the HLA genes in the 1000 genomes project phase I data. G3: Genes, Genomes, Genet 2015;5:931–41. 10.1534/g3.114.015784. - DOI - PMC - PubMed
    1. Günther T, Nettelblad C. The presence and impact of reference bias on population genomic studies of prehistoric human populations. PLoS Genet 2019;15:1–20. 10.1371/journal.pgen.1008302. - DOI - PMC - PubMed
    1. Salavati M, Bush SJ, Palma-Vera S, McCulloch MEB, Hume DA, Clark EL. Elimination of reference mapping bias reveals robust immune related allele-specific expression in crossbred sheep. Front Genet 2019;10:863. 10.3389/fgene.2019.00863. - DOI - PMC - PubMed

LinkOut - more resources