Language models for biological research: a primer
- PMID: 39122951
- DOI: 10.1038/s41592-024-02354-y
Language models for biological research: a primer
Abstract
Language models are playing an increasingly important role in many areas of artificial intelligence (AI) and computational biology. In this primer, we discuss the ways in which language models, both those based on natural language and those based on biological sequences, can be applied to biological research. This primer is primarily intended for biologists interested in using these cutting-edge AI technologies in their applications. We provide guidance on best practices and key resources for adapting language models for biology.
© 2024. Springer Nature America, Inc.
References
-
- OpenAI et al. GPT-4 technical report. Preprint at https://doi.org/10.48550/arXiv.2303.08774 (2024).
-
- Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023). This paper introduces ESM-2, a powerful protein language model, and ESMFold, a model that uses ESM-2 as a foundation to predict protein structure.
-
- Theodoris, C. V. et al. Transfer learning enables predictions in network biology. Nature 618, 616–624 (2023). This paper introduces Geneformer, a single-cell language model trained on gene expression profiles of single-cell transcriptomes.
-
- Vaswani, A. et al. In Proc. Advances in Neural Information Processing Systems 30 (eds. Guyon, I. et al.) 5998–6008 (Curran Associates, 2017). This paper introduces the transformer architecture, which powers all of the language models discussed in this paper and much of the field at large.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
