. 2024 Dec 3;16(1):143.

doi: 10.1186/s13073-024-01392-7.

Using multiplexed functional data to reduce variant classification inequities in underrepresented populations

Moez Dawood^{1

2

3}, Shawn Fayer^{4

5}, Sriram Pendyala^{5

6}, Mason Post⁴, Divya Kalra⁷, Karynne Patterson⁵, Eric Venner^{7

8}, Lara A Muffley^{4

5}, Douglas M Fowler^{4

5

9}, Alan F Rubin^{10

11}, Jennifer E Posey⁸, Sharon E Plon^{7

8}, James R Lupski^{7

8

12

13}, Richard A Gibbs^{7

8}, Lea M Starita^{4

5}, Carla Daniela Robles-Espinoza^{14

15}, Willow Coyote-Maestas^{16

17}, Irene Gallego Romero^{18

19

20

21}

Affiliations

¹ Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA. Moez.Dawood@bcm.edu.
² Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA. Moez.Dawood@bcm.edu.
³ Medical Scientist Training Program, Baylor College of Medicine, Houston, TX, USA. Moez.Dawood@bcm.edu.
⁴ Brotman Baty Institute for Precision Medicine, Seattle, WA, USA.
⁵ Department of Genome Sciences, University of Washington, Seattle, WA, USA.
⁶ Medical Scientist Training Program, University of Washington, Seattle, WA, USA.
⁷ Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
⁸ Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
⁹ Department of Bioengineering, University of Washington, Seattle, WA, USA.
¹⁰ Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia.
¹¹ Department of Medical Biology, University of Melbourne, Melbourne, VIC, Australia.
¹² Texas Children's Hospital, Houston, TX, USA.
¹³ Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA.
¹⁴ Laboratorio Internacional de Investigación Sobre El Genoma Humano, Universidad Nacional Autónoma de México, Campus Juriquilla, Querétaro, Qro, Mexico.
¹⁵ CASM, Wellcome Sanger Institute, Hinxton, UK.
¹⁶ Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, USA. willow.coyote-maestas@ucsf.edu.
¹⁷ Quantitative Biosciences Institute, University of California, San Francisco, USA. willow.coyote-maestas@ucsf.edu.
¹⁸ Human Genomics and Evolution, St Vincent's Institute of Medical Research, Fitzroy, 3065, Australia. irene.gallego@svi.edu.au.
¹⁹ School of BioSciences and Melbourne Integrative Genomics, The University of Melbourne, Royal Parade, Parkville, 3010, Australia. irene.gallego@svi.edu.au.
²⁰ Center for Genomics, Evolution and Medicine, Institute of Genomics, University of Tartu, Riia 23B, 51010, Tartu, Estonia. irene.gallego@svi.edu.au.
²¹ Mary MacKillop Institute for Health Research, Australian Catholic University, Fitzroy, Australia. irene.gallego@svi.edu.au.

PMID: 39627863
PMCID: PMC11616159
DOI: 10.1186/s13073-024-01392-7

Using multiplexed functional data to reduce variant classification inequities in underrepresented populations

Moez Dawood et al. Genome Med. 2024.

. 2024 Dec 3;16(1):143.

doi: 10.1186/s13073-024-01392-7.

Authors

Affiliations

¹ Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA. Moez.Dawood@bcm.edu.
² Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA. Moez.Dawood@bcm.edu.
³ Medical Scientist Training Program, Baylor College of Medicine, Houston, TX, USA. Moez.Dawood@bcm.edu.
⁴ Brotman Baty Institute for Precision Medicine, Seattle, WA, USA.
⁵ Department of Genome Sciences, University of Washington, Seattle, WA, USA.
⁶ Medical Scientist Training Program, University of Washington, Seattle, WA, USA.
⁷ Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
⁸ Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
⁹ Department of Bioengineering, University of Washington, Seattle, WA, USA.
¹⁰ Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia.
¹¹ Department of Medical Biology, University of Melbourne, Melbourne, VIC, Australia.
¹² Texas Children's Hospital, Houston, TX, USA.
¹³ Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA.
¹⁴ Laboratorio Internacional de Investigación Sobre El Genoma Humano, Universidad Nacional Autónoma de México, Campus Juriquilla, Querétaro, Qro, Mexico.
¹⁵ CASM, Wellcome Sanger Institute, Hinxton, UK.
¹⁶ Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, USA. willow.coyote-maestas@ucsf.edu.
¹⁷ Quantitative Biosciences Institute, University of California, San Francisco, USA. willow.coyote-maestas@ucsf.edu.
¹⁸ Human Genomics and Evolution, St Vincent's Institute of Medical Research, Fitzroy, 3065, Australia. irene.gallego@svi.edu.au.
¹⁹ School of BioSciences and Melbourne Integrative Genomics, The University of Melbourne, Royal Parade, Parkville, 3010, Australia. irene.gallego@svi.edu.au.
²⁰ Center for Genomics, Evolution and Medicine, Institute of Genomics, University of Tartu, Riia 23B, 51010, Tartu, Estonia. irene.gallego@svi.edu.au.
²¹ Mary MacKillop Institute for Health Research, Australian Catholic University, Fitzroy, Australia. irene.gallego@svi.edu.au.

PMID: 39627863
PMCID: PMC11616159
DOI: 10.1186/s13073-024-01392-7

Abstract

Background: Multiplexed Assays of Variant Effects (MAVEs) can test all possible single variants in a gene of interest. The resulting saturation-style functional data may help resolve variant classification disparities between populations, especially for Variants of Uncertain Significance (VUS).

Methods: We analyzed clinical significance classifications in 213,663 individuals of European-like genetic ancestry versus 206,975 individuals of non-European-like genetic ancestry from All of Us and the Genome Aggregation Database. Then, we incorporated clinically calibrated MAVE data into the Clinical Genome Resource's Variant Curation Expert Panel rules to automate VUS reclassification for BRCA1, TP53, and PTEN.

Results: Using two orthogonal statistical approaches, we show a higher prevalence (p ≤ 5.95e - 06) of VUS in individuals of non-European-like genetic ancestry across all medical specialties assessed in all three databases. Further, in the non-European-like genetic ancestry group, higher rates of Benign or Likely Benign and variants with no clinical designation (p ≤ 2.5e - 05) were found across many medical specialties, whereas Pathogenic or Likely Pathogenic assignments were increased in individuals of European-like genetic ancestry (p ≤ 2.5e - 05). Using MAVE data, we reclassified VUS in individuals of non-European-like genetic ancestry at a significantly higher rate in comparison to reclassified VUS from European-like genetic ancestry (p = 9.1e - 03) effectively compensating for the VUS disparity. Further, essential code analysis showed equitable impact of MAVE evidence codes but inequitable impact of allele frequency (p = 7.47e - 06) and computational predictor (p = 6.92e - 05) evidence codes for individuals of non-European-like genetic ancestry.

Conclusions: Generation of saturation-style MAVE data should be a priority to reduce VUS disparities and produce equitable training data for future computational predictors.

Keywords: All of Us; Benign; Equity; Genetic ancestry; Inequity; MAVE; Missense; Multiplexed assay of variant effects; Pathogenic; VUS; Variants of uncertain significance; gnomAD.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: Not applicable. Consent for publication: Not applicable. Competing interests: JRL has stock ownership in 23andMe, is a paid consultant for Regeneron Genetics Center, and is a coinventor on multiple US and European patents related to molecular diagnostics for inherited neuropathies, eye diseases, and bacterial genomic fingerprinting. JRL serves on the Scientific Advisory Board of Baylor Genetics. EV, JRL, and RAG declare that Baylor Genetics is a Baylor College of Medicine affiliate that derives revenue from genetic testing. BCM and Miraca Holdings have formed a joint venture with shared ownership and governance of Baylor Genetics which performs clinical microarray analysis and other genomic studies (exome sequencing and whole genome sequencing) for patient and family care. EV is a co-founder of Codified Genomics, a provider of genetic interpretation. The remaining authors declare that they do not have any competing interests.

Figures

**Fig. 1**
Multiplexed Assays of Variant Effects (MAVEs) produce saturation-level variant effect maps containing functional scores for every variant in a target locus. a General scheme depicting the workflow of a MAVE starting with the design and construction of potentially every possible SNV or indel in a target locus. Next, the constructed variants are introduced into cells in vitro. MAVEs by their nature are able to test thousands of variants simultaneously across millions or potentially billions of cells ensuring each variant is programmed across thousands of cells for functional interrogation. After engineering the variants into the cells, a multiplexable phenotype such as cellular viability over time or fluorescence of an expressed protein is measured. Changes in the measured molecular phenotype for each variant are then read out via next-generation sequencing. Functional scores are then calculated from the sequencing data for each variant. When used within the standard ACMG/AMP clinical interpretation framework, potential PS3/BS3 evidence codes of varying strengths dependent on clinical calibration of the functional scores can reclassify VUS. b Both the top and bottom maps show the N-terminus of *BRCA1* exon 3 for comparison purposes. The top map represents all the known ClinVar classifications for this particular locus as of November 2023 in the style of a MAVE variant effect map. The bottom map is an excerpt and adaptation of the *BRCA1* MAVE variant effect map from Findlay et al. [24] where the experimental functional scores are depicted by shading and mutational consequences by the outline color of each SNV box. For both maps, reference nucleotides are indicated by the letters based on their position in the GRCh37 reference genome (position numbers of x-axis), the alternate nucleotides are indicated by the row labels (y-axis), and missing data is represented by no boxes. Notably, the MAVE variant effect map exhibits significantly higher information content with no missing SNV functional effects, while the map of clinical significance data contains much sparser information with VUS and missing data dominating the map. Of note, *BRCA1* is one of if not the most well studied gene in medical genetics. Thus, for most other genes the difference in information content would be even more pronounced as there would be even sparser clinical information, but the MAVE map would still be saturated. Further, because of the saturation nature of the MAVE map, there is no bias in variant selection to include in the assay—all variants in the target locus receive a functional score

**Fig. 2**
Higher VUS prevalence found in individuals of non-European-like genetic ancestry across medical specialties. Box plots corresponding to VUS allele prevalence (x-axis) in each gene (dot) for individuals of non-European-like (blue) versus European-like (orange) genetic ancestry for the corresponding medical specialty (y-axis) as best visualized in *All of Us* v7 for all coding variants. Genes with zero alleles for allele prevalence for either individuals of European-like or non-European-like genetic ancestry are omitted from the above visualization to maintain a reasonable scale for data visualization. However, genes with zero alleles for only one category of either individuals of European-like or non-European-like genetic ancestry are included in the Bonferroni-corrected, signed rank, matched pairs Wilcoxon statistical test. The Bonferroni-corrected p values associated with these comparisons are annotated as follows with “ns” indicating not significant, * for 1.19e − 04 < p ≤ 2.38e − 04, ** for 5.95e − 05 < p ≤ 1.19e − 04, *** for 5.95e − 06 < p ≤ 5.95e − 05, and **** for p ≤ 5.95e − 06. Across all medical specialties and categories shown, VUS are observed to be statistically significantly increased in individuals of non-European-like genetic ancestry compared to individuals of European-like genetic ancestry

**Fig. 3**
disparity in VUS prevalence is present even in the absence of missense variants. a Pie charts representing the variant spectrum of VUS for all genes within the particular medical specialty in gnomAD v3.1.2. The most prevalent VUS variant type, missense variants (light blue), accounts for at minimum 84% of VUS in any given specialty across all three databases. b Effect size with 95% confidence interval (plotted and denoted on the right) shown for the differences between VUS prevalence in individuals of non-European-like versus European-like genetic ancestry as measured by the rank biserial coefficient from the signed rank, matched pairs, Wilcoxon test with a Bonferroni correction as best visualized in gnomAD v3.1.2 (non v2). The total number of alleles from individuals of non-European-like versus European-like genetic ancestry is indicated on the left. Effect sizes in black were calculated from all coding variants while effect sizes in blue were calculated from all coding variants excluding missense variants corresponding to the medical specialty (y-axis). Thresholds as determined by Funder and Ozer [36] for quantifying the magnitude of the effect size difference are plotted as vertical dashed lines. Across medical specialties and categories, the disparity in VUS prevalence between individuals of non-European-like versus European-like genetic ancestry is not just statistically significant but very large. Further, the statistically significant disparity in VUS prevalence is still intact and medium to large even with the exclusion of missense VUS (~ 85–90% of all VUS) across the medical specialties. c Box plots corresponding to VUS allele prevalence (x-axis) in genes (dots) for individuals of non-European-like (blue) versus European-like (orange) genetic ancestry for the corresponding variant type (y-axis) across gnomAD v3.1.2 (non v2) for all coding variants in the set of curated clinical genes (GenCC). The total number of alleles from individuals of non-European-like (right) versus European-like (left) genetic ancestry is indicated under each variant type in parentheses. Genes (y-axis) with zero alleles for the corresponding variant type for allele prevalence for either individuals of European-like or non-European-like genetic ancestry are omitted from the visualization to maintain a reasonable scale for data visualization. However, genes with zero alleles for only one category of either individuals of European-like or non-European-like genetic ancestry are included in the Bonferroni-corrected, signed rank, matched pairs Wilcoxon statistical test. The Bonferroni-corrected p values associated with these comparisons are annotated as follows with “ns” indicating not significant, * for 1.52e − 04 < p ≤ 3.03e − 04, ** for 7.58e − 05 < p ≤ 1.52e − 04, *** for 7.58e − 06 < p ≤ 7.58e − 05, and **** for p ≤ 7.58e − 06. Also refer to Additional file 1: Tables S13–15. Overall, we observe a statistically significant increase in VUS in individuals of non-European-like genetic ancestry compared to individuals of European-like genetic ancestry for missense, synonymous, splice region, and inframe variants

**Fig. 4**
Comparison of counts of unique variants found in only one genetic ancestry group. Grouped bar graphs corresponding to unique coding variant counts (y-axis) for a VUS, b B/LB, c CI, d ND, and e P/LP variants found either only in individuals of European-like (orange) genetic ancestry or only in individuals of non-European-like (blue) genetic ancestry across the medical specialties (x-axis) in *All of Us* v7. The Bonferroni-corrected p values from the chi-square test of independence associated with these comparisons are annotated along with the estimated statistical power. Also refer to Additional file 1: Tables S2–4. Across all medical specialties and categories shown, VUS, B/LB, CI, and ND variants were found at a statistically significantly higher prevalence in individuals of non-European-like genetic ancestry. Conversely P/LP variants were found at a statistically significantly higher prevalence in individuals of European-like genetic ancestry

**Fig. 5**
MAVE data can reclassify non-European-like VUS at a statistically significant higher rate compared to European-like VUS. a The presence of VUS in individuals of non-European-like versus European-like genetic ancestry was statistically significantly higher in non-European-like superpopulation group. However, after using MAVE data for reclassification in the ClinGen VCEP frameworks, there was no statistically significant VUS disparity detected. b Sankey flow diagrams depicting VUS reclassification (read from left to right) for individuals of European-like (left) versus non-European-like (right) genetic ancestry before reclassification (No MAVE) and after reclassification (With MAVE). The examined VUS for *BRCA1*, *TP53*, and *PTEN* are the total VUS alleles summed from all three databases *All of Us* v7, gnomAD v2.1.1, and gnomAD v3.1.2 (non v2) corresponding to the coding region saturated by the MAVE. The VUS were reclassified as either Likely Benign (LB; light blue), Benign (B; dark blue), Likely Pathogenic (LP; red), or remained as Variants of Uncertain Significance (VUS; gray). Reclassification was conducted using an automated pipeline based on the ClinGen Variant Curation Expert Panel gene specific variant interpretation guidelines for each gene with the amendment of using clinically calibrated MAVE data for the functional evidence codes. c Bar graphs for each evidence code category (x-axis) used in VUS reclassification across *BRCA1*, *TP53*, and *PTEN* for all three databases, *All of Us* v7, gnomAD v2.1.1, and gnomAD v3.1.2 (non v2). Blue bars represent alleles from individuals of non-European-like genetic ancestry, whereas orange bars represent alleles from individuals of European-like genetic ancestry. Shading represents essential codes, codes which if removed from the set of evidence codes used to reclassify the VUS would cause the variant to regress back to VUS. MAVE evidence codes were used the most based on total allele count for both individuals of non-European-like and European-like genetic ancestry. However, computational predictor and allele frequency codes were more essential for individuals of European-like genetic ancestry. PP3, PP3_Moderate, and BP4 correspond to the computational predictor codes. PS3, PS3_Moderate, BS3, BS3_Moderate, and BS3_Supporting corresponded to the MAVE evidence codes. BA1, BS1, and BS1_Supporting correspond to the allele frequency codes. The aggregate analysis for essential codes for the computational predictors is reflective of the cumulative contribution of several commonly used predictors as prescribed by the respective ClinGen VCEP (*BRCA1* relies on BayesDel no-AF, *TP53* relies on both aGVGD and BayesDel, and *PTEN* relies on REVEL)

See this image and copyright information in PMC

References

1. Mata DA, Rotenstein LS, Ramos MA, Jena AB. Disparities according to genetic ancestry in the use of precision oncology assays. N Engl J Med. 2023;388:281–3. - DOI - PubMed
1. Fatumo S, et al. A roadmap to increase diversity in genomic studies. Nat Med. 2022;28:243–50. - DOI - PMC - PubMed
1. Borrell LN, et al. Race and genetic ancestry in medicine — a time for reckoning with racism. N Engl J Med. 2021;384:474–80. - DOI - PMC - PubMed
1. Martin AR, et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat Genet. 2019;51:584–91. - DOI - PMC - PubMed
1. Collins FS, Doudna JA, Lander ES, Rotimi CN. Human molecular genetics and genomics — important advances and exciting possibilities. N Engl J Med. 2021;384:1–4. - DOI - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

UM1 HG011969/HG/NHGRI NIH HHS/United States

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Using multiplexed functional data to reduce variant classification inequities in underrepresented populations

Affiliations

Using multiplexed functional data to reduce variant classification inequities in underrepresented populations

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials

Miscellaneous