Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Apr 3;94(4):599-610.
doi: 10.1016/j.ajhg.2014.03.010.

Phevor combines multiple biomedical ontologies for accurate identification of disease-causing alleles in single individuals and small nuclear families

Affiliations

Phevor combines multiple biomedical ontologies for accurate identification of disease-causing alleles in single individuals and small nuclear families

Marc V Singleton et al. Am J Hum Genet. .

Abstract

Phevor integrates phenotype, gene function, and disease information with personal genomic data for improved power to identify disease-causing alleles. Phevor works by combining knowledge resident in multiple biomedical ontologies with the outputs of variant-prioritization tools. It does so by using an algorithm that propagates information across and between ontologies. This process enables Phevor to accurately reprioritize potentially damaging alleles identified by variant-prioritization tools in light of gene function, disease, and phenotype knowledge. Phevor is especially useful for single-exome and family-trio-based diagnostic analyses, the most commonly occurring clinical scenarios and ones for which existing personal genome diagnostic tools are most inaccurate and underpowered. Here, we present a series of benchmark analyses illustrating Phevor's performance characteristics. Also presented are three recent Utah Genome Project case studies in which Phevor was used to identify disease-causing alleles. Collectively, these results show that Phevor improves diagnostic accuracy not only for individuals presenting with established disease phenotypes but also for those with previously undescribed and atypical disease presentations. Importantly, Phevor is not limited to known diseases or known disease-causing alleles. As we demonstrate, Phevor can also use latent information in ontologies to discover genes and disease-causing alleles not previously associated with disease.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Variant Prioritization for Known Disease-Causing Alleles Performance comparisons of four different variant-prioritization tools before (A) and after (B) postprocessing them with Phevor. Two copies of a known disease-causing allele were randomly selected from HGMD and spiked into a single target exome at the reported genomic location; hence, these results model simple, recessive diseases. This process was repeated 100 times for 100 different, randomly selected already disease-associated genes for determining margins of error. Bar charts show the percentage of time for which the disease-associated gene was ranked among the top ten candidates genome-wide (red) or among the top 100 candidates (blue); white denotes a rank greater than 100 in the candidate list. For the Phevor analyses in (B), each tool’s output files were fed to Phevor along with phenotype report containing the HPO terms annotated to each disease-associated gene. The table below the bar charts summarizes this information in more detail. Bars do not reach 100% because of false negatives, i.e., not every tool is able to prioritize every disease-causing allele. When the target gene’s disease-causing alleles were unscored or predicted to be benign by a tool, the gene was placed at the midpoint of the list of the 22,107 annotated human genes.
Figure 2
Figure 2
Variant Prioritization for Genes Previously Unassociated with Disease The procedure used in Figure 1B was repeated, but instead the disease-associated gene’s ontological annotations were removed from all but the specified ontologies prior to running Phevor. For economic reasons, only VAAST results are shown. Removing all the disease-associated gene’s annotations from all ontologies mimics the case of a previously unreported allele in a gene with unknown GO function, process, and cellular location and no previous association with a known disease or phenotype. This is equivalent to running VAAST alone (“none”), and the leftmost bar chart and table column summarize these results. The right-hand bar and table column (“All”) summarize the results of running VAAST and Phevor with the current ontological annotations of the disease-associated gene. The “GO only” column reports the results of removing the disease-associated gene’s phenotype annotations, depicting discovery success with only GO ontological annotations. This column models the ability of Phevor to identify a disease association when that gene is annotated to GO but has no disease, human, or model-organism phenotype annotations. In contrast The “MPO, HPO, and DO” column assays the impact of removing a gene’s GO annotations but leaving its disease, human, and model-organism phenotype annotations intact.
Figure 3
Figure 3
Comparison of Phevor to the Exomiser’s PHIVE Comparison of disease-allele-identification success rates for Phevor and the PHIVE methodology, which is available through the Exomiser. The Exomiser is based upon ANNOVAR’s filtering logic; thus, the Phevor comparison uses ANNOVAR as the variant-prioritization tool. Shown are the results of 100 searches of known recessive disease-associated genes. Identical variant files and phenotype descriptions were given to Exomiser + PHIVE and ANNOVAR + Phevor. Bar charts show the percentage of time for which the target, i.e., disease-associated, gene was ranked among the top ten candidates genome-wide (red) or among the top 100 candidates (blue); white denotes a rank greater than 100 in the candidate list. The table below the bar charts summarizes this information in more detail. Bars do not reach 100% because of false negatives, i.e., the tool reported the disease-causing allele to be nondeleterious; these cases were placed at the midpoint of the list of 22,107 annotated human genes.
Figure 4
Figure 4
Phevor Accuracy and Atypical Disease Presentation In order to evaluate the impact of incorrect diagnosis or atypical phenotypic presentation on Phevor’s accuracy, we repeated the analysis shown in Figure 1; this time, we randomly shuffled the phenotype descriptions for each gene at runtime and used the same phenotype descriptions for every member of a case cohort. For economic reasons, only VAAST results are shown. The results of running VAAST with and without Phevor for case cohorts of one, three, and five unrelated individuals are shown. As would be expected, providing Phevor with incorrect phenotype data significantly affected its diagnostic accuracy. For a single affected individual, Phevor declined in accuracy from ranking the damaged gene in the top ten candidates genome-wide in 100% of the cases to ranking it in 26% of cases. Nevertheless, Phevor was still able to improve upon VAAST’s performance alone. Phevor placed 95% of the damaged genes in the top ten candidates with cohorts of three and five unrelated affected individuals, despite the misleading phenotype data, given that the additional statistical power provided by VAAST increasingly outweighed the incorrect prior probabilities provided by Phevor.
Figure 5
Figure 5
Phevor Analyses of Three Clinical Cases Plotted on the x axes of each Manhattan plot are the genomic coordinates of the candidate genes. The y axes show the log10 value of the ANNOVAR score, VAAST p value, or Phevor score depending upon the panel. Black, filled circles denote top ranked gene(s), all of which had either the same ANNOVAR score or the same VAAST p value. Red circles denote the gene containing disease-causing allele(s). For purposes of comparison to VAAST, we transformed the ANNOVAR scores to frequencies by dividing the number of gene candidates identified by ANNOVAR by the total number of annotated human genes. (A) Phevor identified NFKB2 as a disease-associated gene. (Top) Results of running ANNOVAR (left) and VAAST (right) on the union of variants identified in affected members of family A and those in the affected individual from family B. Both ANNOVAR and VAAST identified a large number of equally likely candidate genes. NFKB2 (shown in red) was among them in both cases. (Bottom) Phevor identified a single best candidate, NFKB2, by using the VAAST output, and NFKB2 was ranked second with the ANNOVAR output (two other genes were tied for first place). (B) Phevor identified a de novo variant in STAT1 as responsible for a previously undescribed phenotype in an already disease-associated gene. (Top) Results of running ANNOVAR (left) and VAAST (right) on the single affected individual’s exome. Both ANNOVAR and VAAST identified multiple candidate genes. STAT1 (shown in red) was among them in both cases. (Bottom) Phevor identified a single best candidate, STAT1, by using the VAAST output. STAT1 was the third best candidate with the ANNOVAR output. (C) Phevor identified a mutation in ABCB11, a known disease-associated gene. (Top) Results of running ANNOVAR (left) and VAAST (right) on the single affected child’s exome. Both ANNOVAR and VAAST identified a number of equally likely candidate genes. ABCB11 (shown in red) was among them. (Bottom) Phevor identified a single best candidate, ABCB11, by using the ANNOVAR and VAAST outputs.

References

    1. Wang K., Li M., Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010;38:e164. - PMC - PubMed
    1. Hu H., Huff C.D., Moore B., Flygare S., Reese M.G., Yandell M. VAAST 2.0: improved variant classification and disease-gene identification using a conservation-controlled amino acid substitution matrix. Genet. Epidemiol. 2013;37:622–634. - PMC - PubMed
    1. Yandell M., Huff C., Hu H., Singleton M., Moore B., Xing J., Jorde L.B., Reese M.G. A probabilistic disease-gene finder for personal genomes. Genome Res. 2011;21:1529–1542. - PMC - PubMed
    1. Chen K., Coonrod E.M., Kumánovics A., Franks Z.F., Durtschi J.D., Margraf R.L., Wu W., Heikal N.M., Augustine N.H., Ridge P.G. Germline mutations in NFKB2 implicate the noncanonical NF-κB pathway in the pathogenesis of common variable immunodeficiency. Am. J. Hum. Genet. 2013;93:812–824. - PMC - PubMed
    1. Ng S.B., Buckingham K.J., Lee C., Bigham A.W., Tabor H.K., Dent K.M., Huff C.D., Shannon P.T., Jabs E.W., Nickerson D.A. Exome sequencing identifies the cause of a mendelian disorder. Nat. Genet. 2010;42:30–35. - PMC - PubMed

Publication types