Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Aug 5;43(8):208.
doi: 10.1007/s00299-024-03294-9.

Advancing plant biology through deep learning-powered natural language processing

Affiliations
Review

Advancing plant biology through deep learning-powered natural language processing

Shuang Peng et al. Plant Cell Rep. .

Abstract

The application of deep learning methods, specifically the utilization of Large Language Models (LLMs), in the field of plant biology holds significant promise for generating novel knowledge on plant cell systems. The LLM framework exhibits exceptional potential, particularly with the development of Protein Language Models (PLMs), allowing for in-depth analyses of nucleic acid and protein sequences. This analytical capacity facilitates the discernment of intricate patterns and relationships within biological data, encompassing multi-scale information within DNA or protein sequences. The contribution of PLMs extends beyond mere sequence patterns and structure--function recognition; it also supports advancements in genetic improvements for agriculture. The integration of deep learning approaches into the domain of plant sciences offers opportunities for major breakthroughs in basic research across multi-scale plant traits. Consequently, the strategic application of deep learning methodologies, particularly leveraging the potential of LLMs, will undoubtedly play a pivotal role in advancing plant sciences, plant production, plant uses and propelling the trajectory toward sustainable agroecological and agro-food transitions.

Keywords: Biology; DNA; Deep learning; Large language models; Plant sciences; Proteins.

PubMed Disclaimer

References

    1. Abramson J, Adler J, Dunger J et al (2024) Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630:493–500. https://doi.org/10.1038/s41586-024-07487-w - DOI - PubMed - PMC
    1. Almagro Armenteros JJ, Johansen AR, Winther O, Nielsen H (2020) Language modelling for biological sequences – curated datasets and baselines. BioRxiv. https://doi.org/10.1101/2020.03.09.983585 - DOI
    1. Almeida-Silva F, Van de Peer Y (2023) Whole-genome duplications and the long-term evolution of gene regulatory networks in angiosperms. Mol Biol Evol. https://doi.org/10.1093/molbev/msad141 - DOI - PubMed - PMC
    1. Amani K, Shivnauth V, Castroverde CDM (2023) CBP60-DB: An AlphaFold-predicted plant kingdom-wide database of the CALMODULIN-BINDING PROTEIN 60 protein family with a novel structural clustering algorithm. Plant Direct 7:e509. https://doi.org/10.1002/pld3.509 - DOI - PubMed - PMC
    1. Amaratunga T (2023) What makes LLMs large? In: Amaratunga T (ed) Understanding Large Language Models. Apress, Berkeley, pp 81–117 - DOI

LinkOut - more resources