Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2017 Oct;18(10):599-612.
doi: 10.1038/nrg.2017.52. Epub 2017 Aug 14.

Settling the score: variant prioritization and Mendelian disease

Affiliations
Review

Settling the score: variant prioritization and Mendelian disease

Karen Eilbeck et al. Nat Rev Genet. 2017 Oct.

Abstract

When investigating Mendelian disease using exome or genome sequencing, distinguishing disease-causing genetic variants from the multitude of candidate variants is a complex, multidimensional task. Many prioritization tools and online interpretation resources exist, and professional organizations have offered clinical guidelines for review and return of prioritization results. In this Review, we describe the strengths and weaknesses of widely used computational approaches, explain their roles in the diagnostic and discovery process and discuss how they can inform (and misinform) expert reviewers. We place variant prioritization in the wider context of gene prioritization, burden testing and genotype-phenotype association, and we discuss opportunities and challenges introduced by whole-genome sequencing.

PubMed Disclaimer

Conflict of interest statement

Competing interests statement

The authors declare competing interests: see Web version for details.

Figures

Figure 1
Figure 1. A demonstration of the multiple possible effects of a single variant across transcripts and genes
The complexity of genomic annotation adds to the complexity of variant annotation. In this example, two genes, coiled-coil domain-containing 113 (CCDC113) and protease serine 54 (PRSS54) overlap on different strands of the genome, and both have multiple observed transcripts. Variants intersecting this extent of the genome show different effects depending on the gene and the transcript inspected. For example, the rs780162055 variant from the single nucleotide polymorphism database (dbSNP) is a missense variant with a protein effect for PRSS54 and a 3′ untranslated region (3′ UTR) variant for CCDC113. This proliferation of effects has data management implications for variant interpretation.
Figure 2
Figure 2. Population stratification and regional constraint within a gene are critical to variant interpretation
a | For a particular variant, although the overall allele frequency may be low enough to be a plausible candidate with respect to a disease phenotype, the allele frequency is often substantially higher in specific subpopulations, thereby casting doubt on its relevance to rare disease phenotypes. In the example shown (source: http://gnomad.broadinstitute.org/variant/1-216172299-C-G), the rs79444516 variant of usherin (USH2A) is low in European populations but considerably higher in African populations. b | Constraint (that is, tolerance to genetic variation) can vary dramatically from region to region in a given gene. In this example, potassium voltage-gated channel subfamily Q member 2 (KCNQ2) shows higher constraint in the functionally important ion transport domain, as indicated by the scarcity of missense and loss-of-function (LOF) variants, relative to regions of lower functional importance in the same gene.
Figure 3
Figure 3. Phenotypes are described across a spectrum of granularity, and different terminologies are used to define these features
In this example, medium-chain acyl co-enzyme A dehydrogenase (ACADM) is used to show this granularity. At the broadest level, it is associated with the condition medium-chain acyl co-enzyme A dehydrogenase deficiency (MCADD), a metabolic disorder that is classified in databases such as Online Mendelian Inheritance in Man (OMIM) and Orphanet. Clinical terminologies such as Snomed and MedGen may also be used to categorize the condition. A condition is generally composed of multiple clinical features (such as lethargy) that describe the observable phenotypes. The Human Phenotype Ontology (HPO) is a widely used terminology that describes these features organized by the body system they manifest in. A key product of the HPO is the annotation of phenotype-to-gene and phenotype-to-condition files that are used in many downstream prioritization tools. At the most fine-grained level, the molecular phenotype of the patient is defined by the clinical measurements such as the concentration of urine organic acids. The most widely used terminology for these measurements are provided by Logical Observation Identifiers Names and Codes (LOINC), a universal code system for clinical data. A patient may be identified early in life as a result of newborn screening — by detecting unusual ratios of metabolites — or may be detected later in life as a result of experiencing one or more clinical features. These different levels of phenotypes are used to guide the patient towards the most appropriate test and to guide the prioritization of the genes and associated variants in the genetic analysis.

References

    1. Bamshad MJ, et al. Exome sequencing as a tool for Mendelian disease gene discovery. Nat Rev Genet. 2011;12:745–755. - PubMed
    1. Chong JX, et al. The genetic basis of Mendelian phenotypes: discoveries, challenges, and opportunities. Am J Hum Genet. 2015;97:199–215. This review summarizes findings from the study of more than 8,000 families with Mendelian disease phenotypes by the Centers for Mendelian Genomics. - PMC - PubMed
    1. The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature. 2015;526:68–74. By sequencing the genomes of more than 2,500 individuals from diverse world ancestries, this study provides the first genome-wide map of both common and rare human genetic variation. - PMC - PubMed
    1. Lek M, et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016;536:285–291. The ExAC-integrated exome sequencing data from 60,706 individuals provides an invaluable reference data set of genetic variation in protein-coding genes. Assessing variant allele frequencies in ExAC facilitates the interpretation of candidate variants observed in Mendelian disease families. - PMC - PubMed
    1. Cooper GM, Shendure J. Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nat Rev Genet. 2011;12:628–640. - PubMed

Publication types