Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar 26;20(3):e0319208.
doi: 10.1371/journal.pone.0319208. eCollection 2025.

PUNCH2: Explore the strategy for intrinsically disordered protein predictor

Affiliations

PUNCH2: Explore the strategy for intrinsically disordered protein predictor

Di Meng et al. PLoS One. .

Abstract

Intrinsically disordered proteins (IDPs) and their intrinsically disordered regions (IDRs) lack stable three-dimensional structures, posing significant challenges for computational prediction. This study introduces PUNCH2 and PUNCH2-light, advanced predictors designed to address these challenges through curated datasets, innovative feature extraction, and optimized neural architectures. By integrating experimental datasets from PDB (PDB_missing) and fully disordered sequences from DisProt (DisProt_FD), we enhanced model performance and robustness. Three embedding strategies-One-Hot, MSA-based, and PLM-based embeddings-were evaluated, with ProtTrans emerging as the most effective single embedding and combined embeddings achieving the best results. The predictors employ a 12-layer convolutional network (CNN_L12_narrow), offering a balance between accuracy and computational efficiency. PUNCH2 combines One-Hot, ProtTrans, and MSA-Transformer embeddings, while PUNCH2-light provides a faster alternative excluding MSA-based embeddings. PUNCH2 and its streamlined variant, PUNCH2-light, are competitive with other predictors on the CAID2 benchmark and rank as the top two predictors in the CAID3 competition. These tools provide efficient, accurate solutions to advance IDP research and understanding.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. IDR data collection process.
In the end, the IDR_Training dataset was searched against the Primary Benchmarking dataset (Disorder_PDB) by MMseqs2 with identity=0.3, and exclude the redundant sequences from the IDR_Training.
Fig 2
Fig 2. Dataset representation.
IDRs from PDB_missing: clstr30 serve as representative subsets for training, while DisProt_FD supplements with fully disordered sequences.
Fig 3
Fig 3. The structure of general CNN-based predictors.
N is the total number of Convolutional layers, and i is the ith Convolutional layer.
Fig 4
Fig 4. 2-stage CBRCNN structure for IDR prediction.
The 2 stages can be trained and evaluated separately.
Fig 5
Fig 5. Comparison of CNN_L3_wide and CNN_L11_narrow using ProtTrans embedding.
The architectures have approximately 157K and 180K parameters, respectively.
Fig 6
Fig 6. Two-stage CNN structure. Stage 1 corresponds to the best-performing architecture (CNN_L11_narrow), while Stage 2 adds a standalone CNN structure.
Fig 7
Fig 7. Performance on CAID2 and CAID3.
ROC and PR curves for the performance of the predictors on Disorder_PDB (a&b, from CAID2) and Disorder_PDB_3 (c&d, from CAID3).

References

    1. Alberts B, Johnson A, Lewis J. Analyzing Protein Structure and Function. In: Chapter 3. New York: Garland Science; 2002. Available from: https://www.ncbi.nlm.nih.gov/books/NBK26820/
    1. van der Lee R, Buljan M, Lang B, Weatheritt RJ, Daughdrill GW, Dunker AK, et al. Classification of intrinsically disordered regions and proteins. Chem Rev. 2014;114(13):6589–631. doi: 10.1021/cr400525m - DOI - PMC - PubMed
    1. Walsh I, Fishman D, Garcia-Gasulla D, Titma T, Pollastri G, ELIXIR Machine Learning Focus Group, et al. DOME: recommendations for supervised machine learning validation in biology. Nat Methods. 2021;18(10):1122–7. doi: 10.1038/s41592-021-01205-4 - DOI - PubMed
    1. Dunker AK, Babu MM, Barbar E, Blackledge M, Bondos SE, Dosztányi Z, et al.. What’s in a name? Why these proteins are intrinsically disordered: why these proteins are intrinsically disordered. Intrinsically Disord Proteins 2013;1(1):e24157. doi: 10.4161/idp.24157 - DOI - PMC - PubMed
    1. Kryshtafovych A, Schwede T, Topf M, Fidelis K, Moult J. Critical assessment of methods of protein structure prediction (CASP)-Round XV. Proteins. 2023;91(12):1539–49. doi: 10.1002/prot.26617 - DOI - PMC - PubMed

Substances

LinkOut - more resources