Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Feb;24(2):332-343.
doi: 10.1016/j.gim.2021.09.015. Epub 2021 Nov 30.

The importance of automation in genetic diagnosis: Lessons from analyzing an inherited retinal degeneration cohort with the Mendelian Analysis Toolkit (MATK)

Affiliations

The importance of automation in genetic diagnosis: Lessons from analyzing an inherited retinal degeneration cohort with the Mendelian Analysis Toolkit (MATK)

Erin Zampaglione et al. Genet Med. 2022 Feb.

Abstract

Purpose: In Mendelian disease diagnosis, variant analysis is a repetitive, error-prone, and time consuming process. To address this, we have developed the Mendelian Analysis Toolkit (MATK), a configurable, automated variant ranking program.

Methods: MATK aggregates variant information from multiple annotation sources and uses expert-designed rules with parameterized weights to produce a ranked list of potentially causal solutions. MATK performance was measured by a comparison between MATK-aided and human-domain expert analyses of 1060 families with inherited retinal degeneration (IRD), analyzed using an IRD-specific gene panel (589 individuals) and exome sequencing (471 families).

Results: When comparing MATK-assisted analysis with expert curation in both the IRD-specific gene panel and exome sequencing (1060 subjects), 97.3% of potential solutions found by experts were also identified by the MATK-assisted analysis (541 solutions identified with MATK of 556 solutions found by conventional analysis). Furthermore, MATK-assisted analysis identified 114 additional potential solutions from the 504 cases unsolved by conventional analysis.

Conclusion: MATK expedites the process of identification of likely solving variants in Mendelian traits, and reduces variability stemming from human error and researcher bias. MATK facilitates data reanalysis to keep up with the constantly improving annotation sources and next-generation sequencing processing pipelines. The software is open source and available at https://gitlab.com/matthew_maher/mendelanalysis.

Keywords: Automation; Mendelian analysis; Variant ranking.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest The authors declare no conflict of interest related to the work presented in this manuscript. Naomi E. Wagner is currently a full-time employee of and owns stocks in Invitae Corporation; however, none of the work she did on this study was done while at the company.

Figures

Figure 1:
Figure 1:. Standard variant assessment protocol for human analysts.
A) The process of NGS variant analysis without MATK, which involves hard filtering of variants down to a manageable list, then using the analyst’s disease-specific expertise to weigh the evidence for each variant, often going through multiple iterations of filtering. B) In the analysis with MATK, much of the domain expertise is encoded in the Gene Model File (GMF) and the evidence is weighed by the Annotation Binding Code (ABC) which allows for a standardized analysis. The GMF and the ABC are fully customizable. Created with BioRender.com
Figure 2:
Figure 2:. Functionality of MATK and study design.
A) The two major components of MATK: The Annotation Binding Code (ABC), is the weighting function used to assign a score to each variant. The function used on the IRD cohort was tuned empirically, and utilized population frequency, sequence consequence, CADD scores, regulatory information, and prior reports, capped at 20 points. The Gene Model File (GMF) encapsulates disease specific gene-level information such as known inheritance modes, allele frequencies, haploinsufficient dominant status, and level of confidence in the field that a gene may be disease causing. B) 1060 IRD patients were analyzed with IRD-panel (589 subjects) and exome sequencing (471 subjects/families). Each sequence underwent two independent analysis rounds: with a conventional presentation of variants and hard filtering performed by the analysis and with MATK-assisted analysis. Solutions from both were analyzed for overlap and discrepancies.
Figure 3:
Figure 3:. Comparison of 589 panel sequenced IRD patients analyzed with and without MATK assistance.
A) A flow chart detailing all of the results obtained from the conventional and MATK-assisted variant analyses. There was a high degree of overlap between the two methods, with MATK assisted analysts missing only 9 potential solutions and non-MATK assisted analysts missing 78 potential solutions. The overlap of the solutions in all three confidence tiers is also presented. B) A bar graph illustrating the overall performance of both methods.
Figure 4:
Figure 4:. Comparison of Exomiser and MATK in 96 panel sequenced samples.
Out of 56 total solutions found using MATK, Exomiser successfully ranked 20 total solutions, 15 in tier 1, three in tier 2, and two in tier 3, without finding any new solutions that were missed by MATK.
Figure 5:
Figure 5:. Comparison of 471 exome sequenced IRD patients analyzed with MATK versus the conventional analysis pipeline.
A) Venn diagrams of the results obtained from the conventional and MATK-assisted variant analyses, showing a high degree of overlap between the two methods, with MATK assisted analysts missing only 6 potential solutions and non-MATK assisted analysts missing 36 potential solutions. B) A bar graph illustrating the overall performance of both methods C) Comparison of gene-specific MATK vs generalized MATK. Out of 27 total solutions, the generalized MATK successfully ranked 17 of 21 solutions in tier 1, with additional 3 partial solutions (monoallelic recessive solution). All tier 2 and 2/3 of tier 3 solutions were also fully identified. The generalized MATK was able to find one solution that was unsolved in the gene-specific MATK analysis run, but was determined to be a tier 1 solution.

References

    1. Jamuar SS, Tan E-C. Clinical application of next-generation sequencing for Mendelian diseases. Hum Genomics. 2015;9(1):10. doi:10.1186/s40246-015-0031-5 - DOI - PMC - PubMed
    1. Pandey KR, Maden N, Poudel B, Pradhananga S, Sharma AK. The curation of genetic variants: difficulties and possible solutions. Genomics Proteomics Bioinformatics. 2012;10(6):317–325. doi:10.1016/j.gpb.2012.06.006 - DOI - PMC - PubMed
    1. Laurie S, Fernandez‐Callejo M, Marco‐Sola S, et al. From Wet‐Lab to Variations: Concordance and Speed of Bioinformatics Pipelines for Whole Genome and Whole Exome Sequencing. Hum Mutat. 2016;37(12):1263–1271. doi:10.1002/humu.23114 - DOI - PMC - PubMed
    1. McKenna A, Hanna M, Banks E, et al. The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20(9):1297–1303. doi:10.1101/gr.107524.110 - DOI - PMC - PubMed
    1. McCarthy DJ, Humburg P, Kanapin A, et al. Choice of transcripts and software has a large effect on variant annotation. Genome Med. 2014;6(3):26. doi:10.1186/gm543 - DOI - PMC - PubMed

Publication types

LinkOut - more resources