Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Feb 1;40(2):btae073.
doi: 10.1093/bioinformatics/btae073.

PanEffect: a pan-genome visualization tool for variant effects in maize

Affiliations

PanEffect: a pan-genome visualization tool for variant effects in maize

Carson M Andorf et al. Bioinformatics. .

Abstract

Summary: Understanding the effects of genetic variants is crucial for accurately predicting traits and functional outcomes. Recent approaches have utilized artificial intelligence and protein language models to score all possible missense variant effects at the proteome level for a single genome, but a reliable tool is needed to explore these effects at the pan-genome level. To address this gap, we introduce a new tool called PanEffect. We implemented PanEffect at MaizeGDB to enable a comprehensive examination of the potential effects of coding variants across 50 maize genomes. The tool allows users to visualize over 550 million possible amino acid substitutions in the B73 maize reference genome and to observe the effects of the 2.3 million natural variations in the maize pan-genome. Each variant effect score, calculated from the Evolutionary Scale Modeling (ESM) protein language model, shows the log-likelihood ratio difference between B73 and all variants in the pan-genome. These scores are shown using heatmaps spanning benign outcomes to potential functional consequences. In addition, PanEffect displays secondary structures and functional domains along with the variant effects, offering additional functional and structural context. Using PanEffect, researchers now have a platform to explore protein variants and identify genetic targets for crop enhancement.

Availability and implementation: The PanEffect code is freely available on GitHub (https://github.com/Maize-Genetics-and-Genomics-Database/PanEffect). A maize implementation of PanEffect and underlying datasets are available at MaizeGDB (https://www.maizegdb.org/effect/maize/).

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Figure 1.
Figure 1.
Overview of the “Variant effects across the pan-genome” view. The figure shows a snapshot of the “Variant effects across the pan-genome” view for gene model Zm00001eb055490 (Glutamate decarboxylase). Top to bottom: Pfam domains; predicted secondary structures; heatmap of variant effects scores of naturally occurring variations in the maize pan-genome; and a zoomed-in region showing a region where a few proteins have high variant effect scores. Each amino acid variant is color-coded based on the level of potential functional effect, and each genome is color-coded based on the heterotic group. Insertions and deletions between the reference and target proteins are shown as gray vertical and horizontal regions on the display and labeled with a “-”. This example shows two areas of strong effect: Arrow A highlights a stretch of strong variation starting at position 206 overlapping the Pfam domain found in five maize lines used in Chinese breeding programs and the W22 genome, but are not present in the other heterotic groups. Arrow B points to a region where a histidine converted to a proline is predicted to have a strong effect at position 413 for the genomes A188, K0326Y, Mo17, and Jing724.

References

    1. Abakarova M, Marquet C, Rera M. et al. Alignment-based protein mutational landscape prediction: doing more with less. Genome Biology and Evolution 2023;15:evad201. 10.1101/2022.12.13.520259. - DOI - PMC - PubMed
    1. Bernhofer M, Dallago C, Karl T. et al. PredictProtein—predicting protein structure and function for 29 years. Nucleic Acids Res 2021;49:W535–40. - PMC - PubMed
    1. Brandes N, Goldman G, Wang CH. et al. Genome-wide prediction of disease variant effects with a deep protein language model. Nat Genet 2023;55:1512–22. - PMC - PubMed
    1. Cannon EKS, Birkett SM, Braun BL. et al. POPcorn: an online resource providing access to distributed and diverse maize project data. Int J Plant Genomics 2011;2011:923035. - PMC - PubMed
    1. Cannon S. Pandagma. Ames, Iowa, USA: GitHub 2023. https://github.com/legumeinfo/pandagma.

Publication types