Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Oct 15;20(1):496.
doi: 10.1186/s12859-019-3026-8.

VarSight: prioritizing clinically reported variants with binary classification algorithms

Collaborators, Affiliations

VarSight: prioritizing clinically reported variants with binary classification algorithms

James M Holt et al. BMC Bioinformatics. .

Abstract

Background: When applying genomic medicine to a rare disease patient, the primary goal is to identify one or more genomic variants that may explain the patient's phenotypes. Typically, this is done through annotation, filtering, and then prioritization of variants for manual curation. However, prioritization of variants in rare disease patients remains a challenging task due to the high degree of variability in phenotype presentation and molecular source of disease. Thus, methods that can identify and/or prioritize variants to be clinically reported in the presence of such variability are of critical importance.

Methods: We tested the application of classification algorithms that ingest variant annotations along with phenotype information for predicting whether a variant will ultimately be clinically reported and returned to a patient. To test the classifiers, we performed a retrospective study on variants that were clinically reported to 237 patients in the Undiagnosed Diseases Network.

Results: We treated the classifiers as variant prioritization systems and compared them to four variant prioritization algorithms and two single-measure controls. We showed that the trained classifiers outperformed all other tested methods with the best classifiers ranking 72% of all reported variants and 94% of reported pathogenic variants in the top 20.

Conclusions: We demonstrated how freely available binary classification algorithms can be used to prioritize variants even in the presence of real-world variability. Furthermore, these classifiers outperformed all other tested methods, suggesting that they may be well suited for working with real rare disease patient datasets.

Keywords: Binary classification; Clinical genome sequencing; Variant prioritization.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Receiver operator and precision-recall curves. These figures show the performance of the four classifiers on the testing set after hyperparameter tuning and fitting to the training set. On the left, we show the receiver operator curve (false positive rate against the true positive rate). On the right, we show the precision recall curve. Area under the curve (AUROC or AUPRC) is reported beside each method in the legend

References

    1. Ramoni RB, Mulvihill JJ, Adams DR, Allard P, Ashley EA, Bernstein JA, Gahl WA, Hamid R, Loscalzo J, McCray AT, et al. The undiagnosed diseases network: accelerating discovery about health and disease. Am J Hum Genet. 2017;100(2):185–92. - PMC - PubMed
    1. Bagnall RD, Ingles J, Dinger ME, Cowley MJ, Ross SB, Minoche AE, Lal S, Turner C, Colley A, Rajagopalan S, et al. Whole genome sequencing improves outcomes of genetic testing in patients with hypertrophic cardiomyopathy. J Am Coll Cardiol. 2018;72(4):419–29. - PubMed
    1. Sweeney NM, Nahas SA, Chowdhury S, Campo MD, Jones MC, Dimmock DP, and SFK. The case for early use of rapid whole-genome sequencing in management of critically ill infants: late diagnosis of coffin–siris syndrome in an infant with left congenital diaphragmatic hernia, congenital heart disease, and recurrent infections. Mol Case Stud. 2018;4(3):002469. - PMC - PubMed
    1. Worthey EA. Analysis and annotation of whole-genome or whole-exome sequencing derived variants for clinical diagnosis. Curr Protoc Hum Genet. 2017;95(1):9–24. - PubMed
    1. Roy S, Coldren C, Karunamurthy A, Kip NS, Klee EW, Lincoln SE, Leon A, Pullambhatla M, Temple-Smolkin RL, Voelkerding KV, Wang C, Carter AB. Standards and guidelines for validating next-generation sequencing bioinformatics pipelines. J Mol Diagn. 2018;20(1):4–27. - PubMed