Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2023 Dec;23(23-24):e2300011.
doi: 10.1002/pmic.202300011. Epub 2023 Jun 29.

Leveraging transformers-based language models in proteome bioinformatics

Affiliations
Review

Leveraging transformers-based language models in proteome bioinformatics

Nguyen Quoc Khanh Le. Proteomics. 2023 Dec.

Abstract

In recent years, the rapid growth of biological data has increased interest in using bioinformatics to analyze and interpret this data. Proteomics, which studies the structure, function, and interactions of proteins, is a crucial area of bioinformatics. Using natural language processing (NLP) techniques in proteomics is an emerging field that combines machine learning and text mining to analyze biological data. Recently, transformer-based NLP models have gained significant attention for their ability to process variable-length input sequences in parallel, using self-attention mechanisms to capture long-range dependencies. In this review paper, we discuss the recent advancements in transformer-based NLP models in proteome bioinformatics and examine their advantages, limitations, and potential applications to improve the accuracy and efficiency of various tasks. Additionally, we highlight the challenges and future directions of using these models in proteome bioinformatics research. Overall, this review provides valuable insights into the potential of transformer-based NLP models to revolutionize proteome bioinformatics.

Keywords: bioinformatics; deep learning; drug discovery; explainable artificial intelligence; natural language processing; protein expression; protein function prediction; transformer attention.

PubMed Disclaimer

References

REFERENCES

    1. Bayat, A. (2002). Bioinformatics. BMJ, 324(7344), 1018-1022. https://doi.org/10.1136/bmj.324.7344.1018
    1. Levy, S. E., & Boone, B. E. (2019). Next-generation sequencing strategies. Cold Spring Harbor Perspectives in Medicine, 9(7). https://doi.org/10.1101/cshperspect.a025791
    1. Keerthikumar, S. (2017). An introduction to proteome bioinformatics. In S. Keerthikumar & S. Mathivanan (Eds.), Proteome bioinformatics (pp. 1-3). Springer New York.
    1. Santos, A., Colaço, A. R., Nielsen, A. B., Niu, L., Strauss, M., Geyer, P. E., Coscia, F., Albrechtsen, N. J. W., Mundt, F., Jensen, L. J., & Mann, M. (2022). A knowledge graph to interpret clinical proteomics data. Nature Biotechnology, 40(5), 692-702. https://doi.org/10.1038/s41587-021-01145-6
    1. Goh, W. W. B., & Wong, L. (2019). Advanced bioinformatics methods for practical applications in proteomics. Briefings in Bioinformatics, 20(1), 347-355. https://doi.org/10.1093/bib/bbx128

LinkOut - more resources