Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul 11:11:775.
doi: 10.12688/f1000research.122840.1. eCollection 2022.

vcferr: Development, validation, and application of a single nucleotide polymorphism genotyping error simulation framework

Affiliations

vcferr: Development, validation, and application of a single nucleotide polymorphism genotyping error simulation framework

V P Nagraj et al. F1000Res. .

Abstract

Motivation: Genotyping error can impact downstream single nucleotide polymorphism (SNP)-based analyses. Simulating various modes and levels of error can help investigators better understand potential biases caused by miscalled genotypes. Methods: We have developed and validated vcferr, a tool to probabilistically simulate genotyping error and missingness in variant call format (VCF) files. We demonstrate how vcferr could be used to address a research question by introducing varying levels of error of different type into a sample in a simulated pedigree, and assessed how kinship analysis degrades as a function of the kind and type of error. Software availability: vcferr is available for installation via PyPi (https://pypi.org/project/vcferr/) or conda (https://anaconda.org/bioconda/vcferr). The software is released under the MIT license with source code available on GitHub (https://github.com/signaturescience/vcferr).

Keywords: GWAS; benchmarking; bioinformatics; genealogy; kinship; python; simulation.

PubMed Disclaimer

Conflict of interest statement

No competing interests were disclosed.

Figures

Figure 1.
Figure 1.. Impact of genotyping error on inferred degree classification accuracy.
Each line corresponds to a type of error, with error rates from 0 to 0.2 stepped at 0.01 increments while holding all other types of error constant at 0. The solid black line at the bottom represents the accuracy at guessing, which is the floor for relationship degree classification.

References

    1. Nielsen R, Paul JS, Albrechtsen A, Song YS: Genotype and SNP calling from next-generation sequencing data. Nat. Rev. Genet. June 2011;12(6):443–451. 10.1038/nrg2986 - DOI - PMC - PubMed
    1. Kim S, Misra A: SNP genotyping: Technologies and biomedical applications. Annu. Rev. Biomed. Eng. 2007;9:289–320. 10.1146/annurev.bioeng.9.060906.152037 - DOI - PubMed
    1. Pompanon F, Bonin A, Bellemain E, Taberlet P: Genotyping errors: causes, consequences and solutions. Nat. Rev. Genet. November 2005;6(11):847–859. 10.1038/nrg1707 - DOI - PubMed
    1. Gorden EM, Greytak EM, Sturk-Andreaggi K, et al. : Extended kinship analysis of historical remains using SNP capture. Forensic Sci. Int. Genet. March 2022;57:102636. 10.1016/j.fsigen.2021.102636 - DOI - PubMed
    1. Hui R, D’Atanasio E, Cassidy LM, et al. : Evaluating genotype imputation pipeline for ultra-low coverage ancient genomes. Sci. Rep. October 2020;10(1):18542. 10.1038/s41598-020-75387-w - DOI - PMC - PubMed