Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Mar 24:arXiv:2503.18810v1.

Combining multiplexed functional data to improve variant classification

Affiliations

Combining multiplexed functional data to improve variant classification

Jeffrey D Calhoun et al. ArXiv. .

Abstract

With the surge in the number of variants of uncertain significance (VUS) reported in ClinVar in recent years, there is an imperative to resolve VUS at scale. Multiplexed assays of variant effect (MAVEs), which allow the functional consequence of 100s to 1000s of genetic variants to be measured in a single experiment, are emerging as a source of evidence which can be used for clinical gene variant classification. Increasingly, there are multiple published MAVEs for the same gene, sometimes measuring different aspects of variant impact. Where multiple functional consequences may need to be considered to get a more complete understanding of variant effects for a given gene, combining data from multiple MAVEs may lead to the assignment of increased evidence strength which could impact variant classifications. Here, we provide guidance for combining such multiplexed functional data, incorporating a stepwise process from data curation and collection to model generation and validation. We illustrate the potential of this approach by showing the integration of multiplexed functional data from four MAVEs for the gene TP53. By following these steps, researchers can maximize the value of MAVEs, strengthen the functional evidence for clinical variant classification, reclassify more VUS, and potentially uncover novel mechanisms of pathogenicity for clinically relevant genes.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:. Assessment of assay performance prior to combining multiplexed functional data.
(A) Functional score distributions for a hypothetical perfect and real-world MAVEs for assumed functionally normal (Synonymous & BLB) variants and assumed functionally abnormal PTC & PLP variants. (B) An example of real-world multiplexed functional data illustrating overlap between the synonymous and PTC variants scored in the assay. (C) Critical metrics for MAVE performance.
Figure 2:
Figure 2:. Evaluating the utility of the integrated functional score.
A method for combining data will involve merging the datasets from different MAVEs. A final score for each variant will be generated using an appropriate statistical, or machine learning method. The integrated score must be assessed for improved variant resolution compared to the multiplexed functional data from individual MAVEs based on how well the different scores distinguish BLB and PLP variants in the truth set.
Figure 3:
Figure 3:. Example of combining multiplexed functional data for four TP53 MAVEs.
(A) Principal component analysis of the 4 datasets. The first two principal components are shown. We then used an unsupervised machine learning method, K-means clustering, to visualize candidate BLB and PLP clusters within the dataset. For each method, we calculated the OddsPath (B) or the inverse of OddsPath (C) to determine what strength of evidence could be applied towards pathogenicity (B) or benignity (C) compared to two of the best performing single MAVEs alone.

Similar articles

References

    1. Landrum MJ, Lee JM, Benson M, Brown GR, Chao C, Chitipiralla S, Gu B, Hart J, Hoffman D, Jang W, Karapetyan K, Katz K, Liu C, Maddipatla Z, Malheiro A, McDaniel K, Ovetsky M, Riley G, Zhou G, Holmes JB, Kattman BL, Maglott DR. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 2018;46(D1):D1062–D7. doi: 10.1093/nar/gkx1153.. - DOI - PMC - PubMed
    1. Dawood M, Fayer S, Pendyala S, Post M, Kalra D, Patterson K, Venner E, Muffley LA, Fowler DM, Rubin AF, Posey JE, Plon SE, Lupski JR, Gibbs RA, Starita LM, Robles-Espinoza CD, Coyote-Maestas W, Gallego Romero I. Defining and Reducing Variant Classification Disparities. medRxiv. 2024. Epub . doi: 10.1101/2024.04.11.24305690.. - DOI - PMC - PubMed
    1. Richards S, Aziz N, Bale S, Bick D, Das S, Gastier-Foster J, Grody WW, Hegde M, Lyon E, Spector E, Voelkerding K, Rehm HL, Committee ALQA. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med. 2015;17(5):405–24. Epub 20150305. doi: 10.1038/gim.2015.30.. - DOI - PMC - PubMed
    1. Fowler DM, Fields S. Deep mutational scanning: a new style of protein science. Nat Methods. 2014;11(8):801–7. doi: 10.1038/nmeth.3027.. - DOI - PMC - PubMed
    1. Buckley M, Terwagne C, Ganner A, Cubitt L, Brewer R, Kim DK, Kajba CM, Forrester N, Dace P, De Jonghe J, Shepherd STC, Sawyer C, McEwen M, Diederichs S, Neumann-Haefelin E, Turajlic S, Ivakine EA, Findlay GM. Saturation genome editing maps the functional spectrum of pathogenic VHL alleles. Nat Genet. 2024;56(7):1446–55. Epub 20240705. doi: 10.1038/s41588-024-01800-z.. - DOI - PMC - PubMed

Publication types

LinkOut - more resources