Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 May 23:2:33.
doi: 10.12688/wellcomeopenres.11640.1. eCollection 2017.

ClinVar data parsing

Affiliations

ClinVar data parsing

Xiaolei Zhang et al. Wellcome Open Res. .

Abstract

This software repository provides a pipeline for converting raw ClinVar data files into analysis-friendly tab-delimited tables, and also provides these tables for the most recent ClinVar release. Separate tables are generated for genome builds GRCh37 and GRCh38 as well as for mono-allelic variants and complex multi-allelic variants. Additionally, the tables are augmented with allele frequencies from the ExAC and gnomAD datasets as these are often consulted when analyzing ClinVar variants. Overall, this work provides ClinVar data in a format that is easier to work with and can be directly loaded into a variety of popular analysis tools such as R, python pandas, and SQL databases.

Keywords: ClinVar; Mendelian disease; XML parsing; pathogenic variants; variant interpretation.

PubMed Disclaimer

Conflict of interest statement

Competing interests: No competing interests were disclosed.

Figures

Figure 1.
Figure 1.. The workflow to parse ClinVar data.

References

    1. Landrum MJ, Lee JM, Benson M, et al. : Clinvar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 2016;44(D1):D862–8. 10.1093/nar/gkv1222 - DOI - PMC - PubMed
    1. Lek M, Karczewski KJ, Minikel EV, et al. : Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016;536(7616):285–291. 10.1038/nature19057 - DOI - PMC - PubMed
    1. Tan A, Abecasis GR, Kang HM: Unified representation of genetic variants. Bioinformatics. 2015;31(13):2202–4. 10.1093/bioinformatics/btv112 - DOI - PMC - PubMed
    1. Li H: Tabix: fast retrieval of sequence features from generic TAB-delimited files. Bioinformatics. 2011;27(5):718–9. 10.1093/bioinformatics/btq671 - DOI - PMC - PubMed
    1. Whiffin N, Minikel E, Walsh R, et al. : Using high-resolution variant frequencies to empower clinical genome interpretation. Genet Med. 2017. 10.1038/gim.2017.26 - DOI - PMC - PubMed