Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Feb 7;16(2):e1007613.
doi: 10.1371/journal.pcbi.1007613. eCollection 2020 Feb.

SmartPhase: Accurate and fast phasing of heterozygous variant pairs for genetic diagnosis of rare diseases

Affiliations

SmartPhase: Accurate and fast phasing of heterozygous variant pairs for genetic diagnosis of rare diseases

Paul Hager et al. PLoS Comput Biol. .

Abstract

There is an increasing need to use genome and transcriptome sequencing to genetically diagnose patients suffering from suspected monogenic rare diseases. The proper detection of compound heterozygous variant combinations as disease-causing candidates is a challenge in diagnostic workflows as haplotype information is lost by currently used next-generation sequencing technologies. Consequently, computational tools are required to phase, or resolve the haplotype of, the high number of heterozygous variants in the exome or genome of each patient. Here we present SmartPhase, a phasing tool designed to efficiently reduce the set of potential compound heterozygous variant pairs in genetic diagnoses pipelines. The phasing algorithm of SmartPhase creates haplotypes using both parental genotype information and reads generated by DNA or RNA sequencing and is thus well suited to resolve the phase of rare variants. To inform the user about the reliability of a phasing prediction, it computes a confidence score which is essential to select error-free predictions. It incorporates existing haplotype information and applies logical rules to determine variants that can be excluded as causing a recessive, monogenic disease. SmartPhase can phase either all possible variant pairs in predefined genetic loci or preselected variant pairs of interest, thus keeping the focus on clinically relevant results. We compared SmartPhase to WhatsHap, one of the leading comparable phasing tools, using simulated data and a real clinical cohort of 921 patients. On both data sets, SmartPhase generated error-free predictions using our derived confidence score threshold. It outperformed WhatsHap with regard to the percentage of resolved pairs when parental genotype information is available. On the cohort data, SmartPhase enabled on average the exclusion of approximately 22% of the input variant pairs in each singleton patient and 44% in each trio patient. SmartPhase is implemented as an open-source Java tool and freely available at http://ibis.helmholtz-muenchen.de/smartphase/.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Visualization of the bit flag system.
If a variant pair could be phased, it is either labeled as cis or trans. Additionally, it can be labeled as innocuous. If a variant pair could not be phased, there was either too little evidence for calling cis or trans or one of both variant alleles could not be found in the mapped reads.
Fig 2
Fig 2. Boxplots showing the distribution of relative amounts of pairs labeled as cis, trans, and innocuous (only for trio phasing) as well as the percentages of pairs that are cleared, confidently cleared after removing low quality phasing predictions, and pairs that can be excluded as being non-pathogenic.
The plots show results for SmartPhase using only read information for 800 singleton patients (a), using both trio and read phasing for 121 trio patients (b) and the results for the same individuals using physical phasing information provided by the HaplotypeCaller of GATK (c) & (d).
Fig 3
Fig 3. Boxplots showing the percentage of cleared pairs for SmartPhase (SP) and WhatsHap (WH) in read only and in combined read & trio mode on the 21, 066 variant pairs identified in the 121 trio patients of the clinical WES data cohort.

References

    1. Amberger JS, Bocchini CA, Schiettecatte F, Scott AF, Hamosh A. OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders. Nucleic Acids Research. 2015;43(D1):D789–D798. 10.1093/nar/gku1205 - DOI - PMC - PubMed
    1. Wright CF, FitzPatrick DR, Firth HV. Paediatric genomics: diagnosing rare disease in children. Nature Reviews Genetics. 2018;19(5):253–268. 10.1038/nrg.2017.116 - DOI - PubMed
    1. Cummings BB, Marshall JL, Tukiainen T, Lek M, Donkervoort S, Foley AR, et al. Improving genetic diagnosis in Mendelian disease with transcriptome sequencing. Science Translational Medicine. 2017;9(386):eaal5209 10.1126/scitranslmed.aal5209 - DOI - PMC - PubMed
    1. Kremer LS, Bader DM, Mertes C, Kopajtich R, Pichler G, Iuso A, et al. Genetic diagnosis of Mendelian disorders via RNA sequencing. Nature Communications. 2017;8:15824 10.1038/ncomms15824 - DOI - PMC - PubMed
    1. Ng SB, Nickerson DA, Bamshad MJ, Shendure J. Massively parallel sequencing and rare disease. Human Molecular Genetics. 2010;19(R2):R119–R124. 10.1093/hmg/ddq390 - DOI - PMC - PubMed

Publication types