Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Nov;39(11):1690-1701.
doi: 10.1002/humu.23637.

ClinGen Allele Registry links information about genetic variants

Affiliations

ClinGen Allele Registry links information about genetic variants

Piotr Pawliczek et al. Hum Mutat. 2018 Nov.

Abstract

Effective exchange of information about genetic variants is currently hampered by the lack of readily available globally unique variant identifiers that would enable aggregation of information from different sources. The ClinGen Allele Registry addresses this problem by providing (1) globally unique "canonical" variant identifiers (CAids) on demand, either individually or in large batches; (2) access to variant-identifying information in a searchable Registry; (3) links to allele-related records in many commonly used databases; and (4) services for adding links to information about registered variants in external sources. A core element of the Registry is a canonicalization service, implemented using in-memory sequence alignment-based index, which groups variant identifiers denoting the same nucleotide variant and assigns unique and dereferenceable CAids. More than 650 million distinct variants are currently registered, including those from gnomAD, ExAC, dbSNP, and ClinVar, including a small number of variants registered by Registry users. The Registry is accessible both via a web interface and programmatically via well-documented Hypertext Transfer Protocol (HTTP) Representational State Transfer Application Programming Interface (REST-APIs). For programmatic interoperability, the Registry content is accessible in the JavaScript Object Notation for Linked Data (JSON-LD) format. We present several use cases and demonstrate how the linked information may provide raw material for reasoning about variant's pathogenicity.

Keywords: HGVS representation; linked data; pathogenicity of genetic variants; variant centric resources; variant identifiers.

PubMed Disclaimer

Conflict of interest statement

SEP is a member of the Baylor Genetics Scientific Advisory Panel. AM is an employee of BCM and performs integration consulting services for BCM‐developed software including Genboree through IP Genesis, Inc. LB is employed by Sunquest Information Systems company. Sunquest is a commercial laboratory software vendor. Other authors do not have any conflicts of interest. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Figures

Figure 1
Figure 1
Conceptual model of Allele Registry entities based on the Allele Model developed by ClinGen Data Model Working Group
Figure 2
Figure 2
(a) Design and workflow of ClinGen Allele Registry. (b) Screenshot of current core registry‐hosted links for a typical variant in the user interface
Figure 3
Figure 3
Registry API services permit on‐demand linking of variant information from external sources. (i) The external source indicates their RFC6570 URI template for their API and, optionally, for their UI. (ii) Then the external source associates one or more parameters with CAids about which they have information via PUT requests to the Registry API. Bulk uploads of associations are also supported. These parameters will be used to fill the templates, thereby creating the appropriate link. (iii) The Registry can now include links to these external sources in addition to its own core variant metadata. For the Allelic Epigenome case, because their API directly employs CAids, no parameter values need be supplied when registering a link via the PUT requests to the Registry. In contrast, if CIViC were to add links from Registry alleles to their data, two parameter values (p1, p2) would be registered for each CAid. Based on the CIViC templates shown, both parameter values are needed to construct the appropriate web page URL, whereas only one is needed to form the CIViC “api” URL
Figure 4
Figure 4
Reference sequences currently supported by the Registry. The NM, NP, and NR represent known and XM, XP, and XR represent modeled reference sequences from RefSeq (O'Leary et al., 2016). NC represents sequence of chromosomes, whereas NW, NT, and NG represent various genomic scaffolds. LRG, LRGt, and LRGp are genomic, transcript, and protein sequences from Locus Reference Genomic Database (MacArthur et al., 2014). ENST and ENSP are transcript and amino acid sequences from ENSEMBL (Aken et al., 2016)
Figure 5
Figure 5
Query and registration functions accessible via the Registry web interface. (a) Example of HGVS‐based search from the Registry landing page (left) and a typical page presented to user when the variant is not registered. For logged‐in users, one click on “Get Identifier” provides canonical allele identifier. (b) Search interface for fuzzy queries where the exact transcript for which the variation is defined is not known (left). Results of example queries are shown on the right
Figure 6
Figure 6
Adoption of canonical allele identifiers by variant‐centric resources. (a) ClinGen variant and gene curation interface, (b) CIViC, and (c) ClinVar. Other systems that use Allele Registry identifiers (including ClinGen Pathogenicity Calculator and Database of pathogenic variants at Keio University) are not shown for brevity

References

    1. Aken, B. L. , Ayling, S. , Barrell, D. , Clarke, L. , Curwen, V. , Fairley, S. , … Searle, S. M. J. (2016). The Ensembl gene annotation system. Database (Oxford), 10.1093/database/baw093 - DOI - PMC - PubMed
    1. Bean, L. J. , & Hegde, M. R. (2016). Gene variant databases and sharing: Creating a global genomic variant database for personalized medicine. Human Mutation, 37(6), 559–563. - PMC - PubMed
    1. Chang, X. , & Wang, K. (2012). wANNOVAR: Annotating genetic variants for personal genomes via the web. Journal of Medical Genetics, 49(7), 433–436. - PMC - PubMed
    1. Fokkema, I. F. , Taschner, P. E. , Schaafsma, G. C. , Celli, J. , Laros, J. F. , & den Dunnen, J. T. (2011). LOVD v.2.0: The next generation in gene variant databases. Human Mutation, 32(5), 557–563. - PubMed
    1. Forbes, S. A. , Beare, D. , Boutselakis, H. , Bamford, S. , Bindal, N. , Tate, J. , … Campbell, P. J. (2017). COSMIC: Somatic cancer genetics at high‐resolution. Nucleic Acids Research, 45(D1), D777–D783. - PMC - PubMed

Publication types