Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 14;12(1):1213.
doi: 10.1038/s41597-025-05528-1.

A comprehensive dataset of therapeutic peptides on multi-function property and structure information

Affiliations

A comprehensive dataset of therapeutic peptides on multi-function property and structure information

Baichuan Xiao et al. Sci Data. .

Abstract

This paper presents a comprehensive dataset comprising 58,583 experimentally validated therapeutic peptides with annotated structure information. These peptides are grouped into 47 categories based on their function or therapeutic property like antimicrobial or glucose-regulatory, of which 21,130 are multi-function peptides and 54,722 possess structural annotation information. We believe this dataset can be useful for the relevant research of therapeutic peptides, especially for computational tool developments in therapeutic peptide discovery and further exploration of the 'sequence-structure-function' relationship for therapeutic peptides.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Technical process for data compiling. The pipeline consists of four major steps (from left to right): (1) Indication scope (green box), describing the 47 specific function classes focused by this study; (2) Data source (purple box), listing the 33 databases or datasets used as data source of our dataset; (3) Data compiling (orange box), comprising 21,130 multifunctional peptides, 54,722 tertiary structure files; (4) Normalization (pink box), which describes quality measures like duplication checks or outlier removal, completeness validation, and consensus validation in this study.
Fig. 2
Fig. 2
Dataset comprising 15 major categories and 47 subcategories. Therapeutic peptides are classified into 15 major categories, each with specific subcategories.
Fig. 3
Fig. 3
Distribution of peptide lengths, origins, and function categories. (ac) Distribution of functional categories, sequence length and origin of therapeutic peptides.

Similar articles

References

    1. Basith, S., Manavalan, B., Hwan Shin, T. & Lee, G. Machine intelligence in peptide therapeutics: A next‐generation tool for rapid disease screening. Med. Res. Rev.40, 1276–1314 (2020). - PubMed
    1. Muttenthaler, M., King, G. F., Adams, D. J. & Alewood, P. F. Trends in peptide drug discovery. Nat. Rev. Drug Discov.20, 309–325 (2021). - PubMed
    1. Chen, Z., Wang, R., Guo, J. & Wang, X. The role and future prospects of artificial intelligence algorithms in peptide drug development. Biomed. Pharmacother.175, 116709 (2024). - PubMed
    1. Zanzoni, A., Ribeiro, D. M. & Brun, C. Understanding protein multifunctionality: from short linear motifs to cellular functions. Cell. Mol. Life Sci.76, 4407–4412 (2019). - PMC - PubMed
    1. Kustatscher, G. et al. Understudied proteins: opportunities and challenges for functional proteomics. Nat. Methods.19, 774–779 (2022). - PubMed

MeSH terms

LinkOut - more resources