Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Aug 30:18:20-27.
doi: 10.1016/j.isci.2019.07.011. Epub 2019 Jul 12.

MALVA: Genotyping by Mapping-free ALlele Detection of Known VAriants

Affiliations

MALVA: Genotyping by Mapping-free ALlele Detection of Known VAriants

Luca Denti et al. iScience. .

Abstract

The amount of genetic variation discovered in human populations is growing rapidly leading to challenging computational tasks, such as variant calling. Standard methods for addressing this problem include read mapping, a computationally expensive procedure; thus, mapping-free tools have been proposed in recent years. These tools focus on isolated, biallelic SNPs, providing limited support for multi-allelic SNPs and short insertions and deletions of nucleotides (indels). Here we introduce MALVA, a mapping-free method to genotype an individual from a sample of reads. MALVA is the first mapping-free tool able to genotype multi-allelic SNPs and indels, even in high-density genomic regions, and to effectively handle a huge number of variants. MALVA requires one order of magnitude less time to genotype a donor than alignment-based pipelines, providing similar accuracy. Remarkably, on indels, MALVA provides even better results than the most widely adopted variant discovery tools.

Keywords: Bioinformatics; Biological Sciences; Genetics; Genomics.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

None
Graphical abstract
Figure 1
Figure 1
Time and RAM Required by Each Tool to Analyze Both Datasets The running times are partitioned by steps performed, whereas the RAM usage represents the peak memory of the entire process. For ease of presentation, we denoted the FullGenome dataset as FG and the HalfGenome dataset as HG. Note that we did not include VarGeno running time and RAM usage on the FullGenome dataset since it crashed after 20 min.
Figure 2
Figure 2
Influence of Indel Size on the Recall Achieved by the Four Considered Tools on the FullGenome Dataset The histogram shows the frequency distribution (on logarithmic scale) of the indels with respect to their length. The scatterplot shows the recall of the tools with respect to the indel size.
Figure 3
Figure 3
Comparison between Real Genotype (Provided by the 1000 Genomes Project) and Genotype Called by MALVA and VarGeno HomoRef stands for Homozygous Reference, HetRef stands for Heterozygous Reference, HomoAlt stands for Homozygous Alternate, HetAlt stands for Heterozygous Alternate, and Uncalled means that the given variant was not called by the tool.
Figure 4
Figure 4
Comparison between Real Genotype (Provided by the 1000 Genomes Project) and Genotype Called by MALVA and VarGeno, Normalized by Rows HomoRef stands for Homozygous Reference, HetRef stands for Heterozygous Reference, HomoAlt stands for Homozygous Alternate, HetAlt stands for Heterozygous Alternate, and Uncalled means that the given variant was not called by the tool.

References

    1. Computational Pan-Genomics Consortium Computational pan-genomics: status, promises and challenges. Brief. Bioinform. 2016;19:118–135. - PMC - PubMed
    1. DePristo M.A., Banks E., Poplin R., Garimella K.V., Maguire J.R., Hartl C., Philippakis A.A., Del Angel G., Rivas M.A., Hanna M. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 2011;43:491–498. - PMC - PubMed
    1. Eggertsson H.P., Jonsson H., Kristmundsdottir S., Hjartarson E., Kehr B., Masson G., Zink F., Hjorleifsson K.E., Jonasdottir A., Jonasdottir A. Graphtyper enables population-scale genotyping using pangenome graphs. Nat. Genet. 2017;49:1654–1660. - PubMed
    1. Hasan M.S., Wu X., Zhang L. Performance evaluation of indel calling tools using real short-read data. Hum. Genomics. 2015;9:20. - PMC - PubMed
    1. Iqbal Z., Caccamo M., Turner I., Flicek P., McVean G. De novo assembly and genotyping of variants using colored de bruijn graphs. Nat. Genet. 2012;44:226–232. - PMC - PubMed