Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Sep;123(6):e70483.
doi: 10.1111/tpj.70483.

Fishing for a reelGene: evaluating gene models with evolution and machine learning

Affiliations

Fishing for a reelGene: evaluating gene models with evolution and machine learning

Aimee J Schulz et al. Plant J. 2025 Sep.

Abstract

Assembled genomes and their associated annotations have transformed our study of gene function. However, each new annotated assembly generates new gene models. Inconsistencies between annotations likely arise from biological and technical causes, including pseudogene misclassification, transposon activity, and intron retention from sequencing of unspliced transcripts. To evaluate gene model predictions, we developed reelGene, a pipeline of machine learning models focused on (1) transcription boundaries, (2) mRNA integrity, and (3) protein structure. The first two models leverage sequence characteristics and evolutionary conservation across related taxa to learn the grammar of conserved transcription boundaries and mRNA sequences, while the third uses the conserved evolutionary grammar of protein sequences to predict whether a gene can produce a protein. Evaluating 1.8 million transcript models in Zea mays ssp. mays (maize), reelGene classified 28% as incorrectly annotated or non-functional. We find that reelGene classifies 92.2% of genes in the maize proteome and 99.2% of genes within the maize classical gene list as functional. reelGene also provides a way to further investigate genome biology- for instance, reelGene indicates that 10.3% of dispensable genes in B73 are functional, and within retained duplicate genes, reelGene identifies a 30% bias toward the retention of the M1 subgenome when one copy is functional and the other is non-functional. As an annotation-evaluating tool, reelGene is directly applicable to species of the Andropogoneae tribe, including other important crops like sorghum and miscanthus. As a community resource, reelGene has been integrated onto MaizeGDB both as a browser track and as an individual Shiny App, allowing researchers to evaluate gene model accuracy and further investigate genome biology.

Keywords: evolution; gene annotation; gene models; genome biology; machine learning; maize.

PubMed Disclaimer

References

    1. Bányai, L. & Patthy, L. (2016) Putative extremely high rate of proteome innovation in lancelets might be explained by high rate of gene prediction errors. Scientific Reports, 6, 30700.
    1. Barbaglia, A.M., Klusman, K.M., Higgins, J., Shaw, J.R., Hannah, L.C. & Lal, S.K. (2012) Gene capture by Helitron transposons reshuffles the transcriptome of maize. Genetics, 190(3), 965–975.
    1. Benegas, G., Batra, S.S. & Song, Y.S. (2023) DNA language models are powerful predictors of genome‐wide variant effects. Proceedings of the National Academy of Sciences of the United States of America, 120(44), e2311219120.
    1. Bennetzen, J.L., Coleman, C., Liu, R., Ma, J. & Ramakrishna, W. (2004) Consistent over‐estimation of gene number in complex plant genomes. Current Opinion in Plant Biology, 7(6), 732–736.
    1. Bernal‐Gallardo, J.J. & de Folter, S. (2024) Plant genome information facilitates plant functional genomics. Planta, 259(5), 117.

LinkOut - more resources