Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 May 4;8(1):124.
doi: 10.1038/s41597-021-00905-y.

A resource to explore the discovery of rare diseases and their causative genes

Affiliations

A resource to explore the discovery of rare diseases and their causative genes

Friederike Ehrhart et al. Sci Data. .

Abstract

Here, we describe a dataset with information about monogenic, rare diseases with a known genetic background, supplemented with manually extracted provenance for the disease itself and the discovery of the underlying genetic cause. We assembled a collection of 4166 rare monogenic diseases and linked them to 3163 causative genes, annotated with OMIM and Ensembl identifiers and HGNC symbols. The PubMed identifiers of the scientific publications, which for the first time described the rare diseases, and the publications, which found the genes causing the diseases were added using information from OMIM, PubMed, Wikipedia, whonamedit.com, and Google Scholar. The data are available under CC0 license as spreadsheet and as RDF in a semantic model modified from DisGeNET, and was added to Wikidata. This dataset relies on publicly available data and publications with a PubMed identifier, but by our effort to make the data interoperable and linked, we can now analyse this data. Our analysis revealed the timeline of rare disease and causative gene discovery and links them to developments in methods.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Workflow of information acquisition, creation of dataset and downstream analysis.
Fig. 2
Fig. 2
(A) Number of first descriptions of rare diseases per year. (B) Number of publications identifying new disease description (orange bars) and new gene-disease links per year (blue bars). The black dots indicate the (rolling) median number of years the diseases had been known before the causative gene was identified in that year. The data are displayed from the year 1984, from which there were constantly more than four genes per year discovered. (C) Total current citation counts for the papers shown for gene-disease relationship papers shown as blue bars in panel B shown for the year these were published. One dot represents one publication from our dataset.
Fig. 3
Fig. 3
(A) Network of gene-rare disease relationships. Blue nodes are genes (HGNC symbols), orange nodes are diseases (OMIM disease names). (B) The Rett syndrome causing genes pathway from WikiPathways, https://www.wikipathways.org/instance/WP4312 was imported as a network to Cytoscape environment using the WikiPathways app of Cytoscape. Using CyTargetLinker app, the MECP2 network was extended to predict and visualize overlap of pathway genes causing other rare diseases provided by the gene-RD-Provenance_V2 linkset. The expression data was taken from Miller et al. and the data was originally produced by Lin et al.. (C) Timeline of rare disease superclass descriptions in blocks of 20 years. The numbers are normalized to percentages of the maximum number of diseases in each disease superclass discovered (Table 4).

References

    1. Genomes Project C, et al. A global reference for human genetic variation. Nature. 2015;526:68–74. doi: 10.1038/nature15393. - DOI - PMC - PubMed
    1. Gilissen C, Hoischen A, Brunner HG, Veltman JA. Disease gene identification strategies for exome sequencing. Eur J Hum Genet. 2012;20:490–497. doi: 10.1038/ejhg.2011.258. - DOI - PMC - PubMed
    1. Townend GS, et al. MECP2 variation in Rett syndrome-An overview of current coverage of genetic and phenotype data within existing databases. Hum Mutat. 2018;39:914–924. doi: 10.1002/humu.23542. - DOI - PMC - PubMed
    1. Pinero J, et al. DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants. Nucleic Acids Res. 2017;45:D833–D839. doi: 10.1093/nar/gkw943. - DOI - PMC - PubMed
    1. McKusick VA. Mendelian Inheritance in Man and its online version, OMIM. Am J Hum Genet. 2007;80:588–604. doi: 10.1086/514346. - DOI - PMC - PubMed

Publication types