Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2015:2015:923491.
doi: 10.1155/2015/923491. Epub 2015 Apr 6.

Identifying Highly Penetrant Disease Causal Mutations Using Next Generation Sequencing: Guide to Whole Process

Affiliations
Review

Identifying Highly Penetrant Disease Causal Mutations Using Next Generation Sequencing: Guide to Whole Process

A Mesut Erzurumluoglu et al. Biomed Res Int. 2015.

Abstract

Recent technological advances have created challenges for geneticists and a need to adapt to a wide range of new bioinformatics tools and an expanding wealth of publicly available data (e.g., mutation databases, and software). This wide range of methods and a diversity of file formats used in sequence analysis is a significant issue, with a considerable amount of time spent before anyone can even attempt to analyse the genetic basis of human disorders. Another point to consider that is although many possess "just enough" knowledge to analyse their data, they do not make full use of the tools and databases that are available and also do not fully understand how their data was created. The primary aim of this review is to document some of the key approaches and provide an analysis schema to make the analysis process more efficient and reliable in the context of discovering highly penetrant causal mutations/genes. This review will also compare the methods used to identify highly penetrant variants when data is obtained from consanguineous individuals as opposed to nonconsanguineous; and when Mendelian disorders are analysed as opposed to common-complex disorders.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Steps in whole-exome sequencing. Understanding how the VCF file was created is important, as it can give an idea about where something may have gone wrong. The stages proceed from top to bottom and we have proposed “consideration points” for each step (below each title).
Figure 2
Figure 2
Post-VCF file procedures (example for sequencing data). Every step here can be automated through the use of pipelines and bioinformatics tools. Whilst performing the steps listed above, one must always bear in mind the assumptions behind the procedures. Where feasible, ranking of rare SNVs would be advised over filtering as it allows the researcher to observe all variants as a continuum from most likely to least likely.
Figure 3
Figure 3
Finding “the one” in Mendelian disorders. Searching for the causal variant (using a WES example). After potentially causal variants are identified, one must put into practice what past literature suggests about the disorder and make certain decisions about which path to follow in Figure 3. Familial (very rare) disorders are more likely to be following a recessive mode of inheritance; thus family data is crucial (to rule out the possibility of de novo mutations). Also it is crucial to include as many family members as possible. For common Mendelian disorders, if the disorder is following a recessive inheritance model, the possibility of the existence of compound heterozygotes should be taken into account when fitting the data into a recessive model. Finally, functional postanalysis of candidate variant(s), especially in mouse knockouts, can be crucial. This figure is here to serve as an example and by no means reflects an exhaustive model; there are alternative routes that researchers can take to identify Mendelian causal variants. If a consanguineous family, identifies regions where there are long runs of homozygosity (LRoH) for each individual, and amongst these regions, the ones which are shared by the affected and not by the unaffected.
Figure 4
Figure 4
Summary of whole analysis process. DNA sample to identification of variant. The tools mentioned here are the ones we prefer to use for a variety of reasons such as having user-friendly documentation, ease of use, performance, multiplatform compatibility, and speed. See Supplementary Material and Methods for examples of parameters/commands to use where applicable.
Figure 5
Figure 5
Filtering steps applied to all mutations in the exome (primary ciliary dyskinesia example). After all the filtering steps in the above figure are applied, the total will be reduced to a single candidate. The numbers here are for illustration purposes only (adapted from [39]). Homozygosity step is added as PCD is an autosomal recessive disorder. Φ mutations are “predicted high impact” mutations as proposed by Alsaadi et al. [39] (see PHI_SO_terms.txt in Supplementary data).

References

    1. Danecek P., Auton A., Abecasis G., et al. The variant call format and VCFtools. Bioinformatics. 2011;27(15):2156–2158. doi: 10.1093/bioinformatics/btr330. - DOI - PMC - PubMed
    1. McKenna A., Hanna M., Banks E., et al. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research. 2010;20(9):1297–1303. doi: 10.1101/gr.107524.110. - DOI - PMC - PubMed
    1. Metzker M. L. Sequencing technologies—the next generation. Nature Reviews Genetics. 2010;11(1):31–46. doi: 10.1038/nrg2626. - DOI - PubMed
    1. Bonetta L. Whole-genome sequencing breaks the cost barrier. Cell. 2010;141(6):917–919. doi: 10.1016/j.cell.2010.05.034. - DOI - PubMed
    1. Pettersson E., Lundeberg J., Ahmadian A. Generations of sequencing technologies. Genomics. 2009;93(2):105–111. doi: 10.1016/j.ygeno.2008.10.003. - DOI - PubMed

Publication types

LinkOut - more resources