Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Jun 30:2023.06.20.545702.
doi: 10.1101/2023.06.20.545702.

Mapping MAVE data for use in human genomics applications

Affiliations

Mapping MAVE data for use in human genomics applications

Jeremy A Arbesfeld et al. bioRxiv. .

Update in

  • Mapping MAVE data for use in human genomics applications.
    Arbesfeld JA, Da EY, Stevenson JS, Kuzma K, Paul A, Farris T, Capodanno BJ, Grindstaff SB, Riehle K, Saraiva-Agostinho N, Safer JF, Casper J, Haeussler M, Milosavljevic A, Foreman J, Firth HV, Hunt SE, Iqbal S, Cline MS, Rubin AF, Wagner AH. Arbesfeld JA, et al. Genome Biol. 2025 Jun 25;26(1):179. doi: 10.1186/s13059-025-03647-x. Genome Biol. 2025. PMID: 40563119 Free PMC article.

Abstract

The large-scale experimental measures of variant functional assays submitted to MaveDB have the potential to provide key information for resolving variants of uncertain significance, but the reporting of results relative to assayed sequence hinders their downstream utility. The Atlas of Variant Effects Alliance mapped multiplexed assays of variant effect data to human reference sequences, creating a robust set of machine-readable homology mappings. This method processed approximately 2.5 million protein and genomic variants in MaveDB, successfully mapping 98.61% of examined variants and disseminating data to resources such as the UCSC Genome Browser and Ensembl Variant Effect Predictor.

Keywords: deep mutational scanning; global alliance for genomics and health; massively parallel reporter assays; multiplexed assays of variant effect; variation representation specification.

PubMed Disclaimer

Conflict of interest statement

Declarations Competing interests The authors declare that they have no competing interests.

Figures

Fig 1.
Fig 1.. Mapping MAVE variants to the Human Genome
An overview of the MAVE variant mapping method. MAVE variants are described with respect to custom, user-submitted target sequences, but the absence of linkages to versioned human reference sequences limits the interoperability of MAVE data with human genomics applications (left). To overcome this limitation, we have developed a method to map MAVE variants to their corresponding human reference sequences (middle). Through the use of VRS, we are able to represent MAVE variants with respect to both assayed target sequences and versioned human reference sequences, creating robust homology maps (middle). The precise representation of MAVE variants using VRS ultimately facilitates the integration of MAVE data into downstream clinical and research applications (right).
Fig 2.
Fig 2.. MAVE Assay-Specific Sequence Challenges
A depiction of several key features of MAVE sequences that necessitate a mapping strategy to human reference sequences. (A) MAVE variants are described using the MAVE-HGVS nomenclature system, which describes variants on a user-submitted target sequence. Therefore, as MAVE variants are described with respect to assay-specific target sequences, mapping to a human reference sequence is required to append an accession to each variant and add important contextual information. (B) MAVE target sequences are often non-reference identical due to features of the genetic system used in the assay. In the example above, there is a synonymous nucleotide substitution between the target and reference sequences, that optimizes translation of the sequence in the assay. (C) MAVE sequences can contain assay-specific functional elements that do not align to the human genome. (D) MAVE protein variants may represent changes that would span exon boundaries on the human genome, but occur on a contiguous region on the MAVE target sequence.
Fig 3.
Fig 3.. MaveDB Score Set Breakdown/Summary Statistics
A summary of the MAVE data from MaveDB that was used for validation of the mapping method. (A) All score set entries in MaveDB are assigned an organism attribute (e.g. Homo sapiens, Saccharomyces cerevisiae). Score sets whose listed target organism was Homo sapiens (n = 209) were selected for testing of the mapping algorithm, and additional breakdowns describing the selected human score sets are presented. Made with SankeyMATIC. (B) MAVE experiments in MaveDB (n = 159) can be conducted in non-human cellular contexts, including yeast, bacteria, mice, and bacteriophage(40), (41). Experiments that do not report a cellular context are coded as “N/A” (n = 5).
Fig 4.
Fig 4.. Variant Mapping Algorithm Workflow
A depiction of the MAVE variant mapping workflow. For a given entry in MaveDB whose listed target organism is Homo sapiens, the provided MAVE sequence is aligned to GRCh38 using BLAT(21), returning data including the chromosome number, gene symbol, and a set of genomic coordinates (1). If a score set describes a protein coding element, the outputted data can be supplied as a query to the Universal Transcript Archive (UTA) database, ultimately allowing for a RefSeq protein accession to be derived and for an offset to be computed (2). With a RefSeq sequence selected and offset calculated, the assayed variants in a MaveDB variant matrix are described with respect to their unique human reference sequence using the GA4GH Variation Representation Specification (VRS) (3). The resulting VRS objects are then annotated with descriptive metadata and integrated into specific score set JSON files. Lastly, the JSON files are gzipped and uploaded to a publicly-accessible s3 bucket to be available for downstream integration.
Fig 5.
Fig 5.. Downstream Integrations of MaveDB Data
The mapping of MAVE data to the human genome permits downstream data integrations in various human genomics applications. (A) MAVE scores are visible as heatmaps for available genes in the Genomics 2 Proteins Portal (located in the red circle). (B) MAVE data has been added as a track hub in the UCSC Genome Browser. MAVE protein variant positions are mapped to their corresponding genomic coordinates, and the score, chromosome band, genomic size, and strand are also reported for each variant. (C) MAVE scores and a link to the associated score set are reported, when available, for queried variants in the Ensembl Variant Effect Predictor (located in the red circle). (D) The nucleotide change, protein change, experiment accession, PubMed ID, assay-specific variant effect score, variant accession, and publish date are included for MAVE data displayed in DECIPHER, with links to the experimental details and score set in MaveDB. Example displayed: https://www.deciphergenomics.org/sequence-variant/17-43045712-T-C/annotation/functional.

References

    1. Henrie A, Hemphill SE, Ruiz-Schultz N, Cushman B, DiStefano MT, Azzariti D, et al. ClinVar Miner: Demonstrating Utility of a Web-Based Tool for Viewing and Filtering ClinVar Data. Hum Mutat. 2018. Aug;39(8):1051. - PMC - PubMed
    1. Landrum MJ, Lee JM, Riley GR, Jang W, Rubinstein WS, Church DM, et al. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 2014. Jan 1;42(Database issue):D980. - PMC - PubMed
    1. Brnich SE, Abou Tayoun AN, Couch FJ, Cutting GR, Greenblatt MS, Heinen CD, et al. Recommendations for application of the functional evidence PS3/BS3 criterion using the ACMG/AMP sequence variant interpretation framework. Genome Med. 2019. Dec 31;12(1):1–12. - PMC - PubMed
    1. Tareen A, Kooshkbaghi M, Posfai A, Ireland WT, McCandlish DM, Kinney JB. MAVE-NN: learning genotype-phenotype maps from multiplex assays of variant effect. Genome Biol. 2022. Apr 15;23(1):98. - PMC - PubMed
    1. Fowler DM, Stephany JJ, Fields S. Measuring the activity of protein variants on a large scale using deep mutational scanning. Nat Protoc. 2014. Sep;9(9):2267–84. - PMC - PubMed

Publication types

LinkOut - more resources