Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Sep;38(9):1182-1192.
doi: 10.1002/humu.23280. Epub 2017 Jul 7.

Working toward precision medicine: Predicting phenotypes from exomes in the Critical Assessment of Genome Interpretation (CAGI) challenges

Affiliations

Working toward precision medicine: Predicting phenotypes from exomes in the Critical Assessment of Genome Interpretation (CAGI) challenges

Roxana Daneshjou et al. Hum Mutat. 2017 Sep.

Abstract

Precision medicine aims to predict a patient's disease risk and best therapeutic options by using that individual's genetic sequencing data. The Critical Assessment of Genome Interpretation (CAGI) is a community experiment consisting of genotype-phenotype prediction challenges; participants build models, undergo assessment, and share key findings. For CAGI 4, three challenges involved using exome-sequencing data: Crohn's disease, bipolar disorder, and warfarin dosing. Previous CAGI challenges included prior versions of the Crohn's disease challenge. Here, we discuss the range of techniques used for phenotype prediction as well as the methods used for assessing predictive models. Additionally, we outline some of the difficulties associated with making predictions and evaluating them. The lessons learned from the exome challenges can be applied to both research and clinical efforts to improve phenotype prediction from genotype. In addition, these challenges serve as a vehicle for sharing clinical and research exome data in a secure manner with scientists who have a broad range of expertise, contributing to a collaborative effort to advance our understanding of genotype-phenotype relationships.

Keywords: Crohn's disease; bipolar disorder; exomes; machine learning; phenotype prediction; warfarin.

PubMed Disclaimer

Conflict of interest statement

Conflicts of interest:

R.M. has participated in Illumina sponsored meetings over the past four years and received travel reimbursement and an honorarium for presenting at these events. Illumina had no role in decisions relating to the study/work to be published, data collection and analysis of data and the decision to publish.

R.M. has participated in Pacific Biosciences sponsored meetings over the past three years and received travel reimbursement for presenting at these events.

R.M. is a founder and shared holder of Orion Genomics, which focuses on plant genomics and cancer genetics.

R.M. is a SAB member for RainDance Technologies, Inc.

Figures

Figure 1
Figure 1
Clustering of patients from the CAGI 2 Crohn’s Disease Challenge. The black and gray bars at the bottom represent the controls; the red represents the cases. Many of the controls cluster together, likely due to batch effects. For instance, the controls represented in black were sequenced separately from the gray controls and the cases.
Figure 2
Figure 2
Clustering of samples for CAGI 3 Crohn’s Disease challenge. Black represents controls, while red represents cases. This dataset included healthy family members of cases as well as random controls. Samples with a “ped” designation in the sample name came from a pedigree; samples that share the same “ped” number came from the same pedigree.
Figure 3
Figure 3
Clustering of samples for CAGI 4 Crohn’s Disease challenge. Black represents controls, and red represents cases.
Figure 4
Figure 4
CAGI 4 Crohn’s disease challenge distribution of AUCs across all methods.
Figure 5
Figure 5
CAGI 4 bipolar disorder challenge distribution of AUCs across all methods.
Figure 6A–D
Figure 6A–D
A. R2 between methods and actual dose. B. Sum of squared errors C. Mean z-scores between predicted doses with standard deviations and actual doses. D. Mean coefficient of variation (CV) and mean CV multiplied by mean z-score.
Figure 6A–D
Figure 6A–D
A. R2 between methods and actual dose. B. Sum of squared errors C. Mean z-scores between predicted doses with standard deviations and actual doses. D. Mean coefficient of variation (CV) and mean CV multiplied by mean z-score.
Figure 6A–D
Figure 6A–D
A. R2 between methods and actual dose. B. Sum of squared errors C. Mean z-scores between predicted doses with standard deviations and actual doses. D. Mean coefficient of variation (CV) and mean CV multiplied by mean z-score.
Figure 6A–D
Figure 6A–D
A. R2 between methods and actual dose. B. Sum of squared errors C. Mean z-scores between predicted doses with standard deviations and actual doses. D. Mean coefficient of variation (CV) and mean CV multiplied by mean z-score.

References

    1. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25(1):25–9. - PMC - PubMed
    1. Ashley EA. The precision medicine initiative: a new national effort. JAMA. 2015;313(21):2119–20. - PubMed
    1. Ashley EA, Butte AJ, Wheeler MT, Chen R, Klein TE, Dewey FE, Dudley JT, Ormond KE, Pavlovic A, Morgan AA, Pushkarev D, Neff NF, Hudgins L, Gong L, Hodges LM, Berlin DS, Thorn CF, Sangkuhl K, Hebert JM, Woon M, Sagreiya H, Whaley R, Knowles JW, Chou MF, Thakuria JV, Rosenbaum AM, Zaranek AW, Church GM, Greely HT, Quake SR, Altman RB. Clinical assessment incorporating a personal genome. Lancet. 2010;375(9725):1525–35. - PMC - PubMed
    1. Bauer KA. Recent progress in anticoagulant therapy: oral direct inhibitors of thrombin and factor Xa. J Thromb Haemost. 2011;9(Suppl 1):12–9. - PubMed
    1. Bell RM, Koren Y. Lessons from the Netflix prize challenge. SIGKDD Explor. Newsl. 2007;9(2):75–79.

Publication types