VarSight: prioritizing clinically reported variants with binary classification algorithms
- PMID: 31615419
- PMCID: PMC6792253
- DOI: 10.1186/s12859-019-3026-8
VarSight: prioritizing clinically reported variants with binary classification algorithms
Abstract
Background: When applying genomic medicine to a rare disease patient, the primary goal is to identify one or more genomic variants that may explain the patient's phenotypes. Typically, this is done through annotation, filtering, and then prioritization of variants for manual curation. However, prioritization of variants in rare disease patients remains a challenging task due to the high degree of variability in phenotype presentation and molecular source of disease. Thus, methods that can identify and/or prioritize variants to be clinically reported in the presence of such variability are of critical importance.
Methods: We tested the application of classification algorithms that ingest variant annotations along with phenotype information for predicting whether a variant will ultimately be clinically reported and returned to a patient. To test the classifiers, we performed a retrospective study on variants that were clinically reported to 237 patients in the Undiagnosed Diseases Network.
Results: We treated the classifiers as variant prioritization systems and compared them to four variant prioritization algorithms and two single-measure controls. We showed that the trained classifiers outperformed all other tested methods with the best classifiers ranking 72% of all reported variants and 94% of reported pathogenic variants in the top 20.
Conclusions: We demonstrated how freely available binary classification algorithms can be used to prioritize variants even in the presence of real-world variability. Furthermore, these classifiers outperformed all other tested methods, suggesting that they may be well suited for working with real rare disease patient datasets.
Keywords: Binary classification; Clinical genome sequencing; Variant prioritization.
Conflict of interest statement
The authors declare that they have no competing interests.
Figures
References
-
- Bagnall RD, Ingles J, Dinger ME, Cowley MJ, Ross SB, Minoche AE, Lal S, Turner C, Colley A, Rajagopalan S, et al. Whole genome sequencing improves outcomes of genetic testing in patients with hypertrophic cardiomyopathy. J Am Coll Cardiol. 2018;72(4):419–29. - PubMed
-
- Sweeney NM, Nahas SA, Chowdhury S, Campo MD, Jones MC, Dimmock DP, and SFK. The case for early use of rapid whole-genome sequencing in management of critically ill infants: late diagnosis of coffin–siris syndrome in an infant with left congenital diaphragmatic hernia, congenital heart disease, and recurrent infections. Mol Case Stud. 2018;4(3):002469. - PMC - PubMed
-
- Worthey EA. Analysis and annotation of whole-genome or whole-exome sequencing derived variants for clinical diagnosis. Curr Protoc Hum Genet. 2017;95(1):9–24. - PubMed
-
- Roy S, Coldren C, Karunamurthy A, Kip NS, Klee EW, Lincoln SE, Leon A, Pullambhatla M, Temple-Smolkin RL, Voelkerding KV, Wang C, Carter AB. Standards and guidelines for validating next-generation sequencing bioinformatics pipelines. J Mol Diagn. 2018;20(1):4–27. - PubMed
