Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Sep 22;65(18):9623-9638.
doi: 10.1021/acs.jcim.5c01076. Epub 2025 Sep 4.

A Machine Learning Model for the Proteome-Wide Prediction of Lipid-Interacting Proteins

Affiliations

A Machine Learning Model for the Proteome-Wide Prediction of Lipid-Interacting Proteins

Jonathan Chiu-Chun Chou et al. J Chem Inf Model. .

Abstract

Lipids are essential metabolites that play critical roles in multiple cellular pathways. Like many primary metabolites, mutations that disrupt lipid synthesis can be lethal. Proteins involved in lipid synthesis, trafficking, and modification, are targets for therapeutic intervention in infectious disease and metabolic disorders. The ability to rapidly detect these proteins can accelerate their evaluation as targets for deranged lipid pathologies. However, it remains challenging to identify lipid binding motifs in proteins because the rules that govern protein engagement with specific lipids are poorly understood. As such, new bioinformatic tools that reveal conserved features in lipid binding proteins are necessary. Here, we present Structure-based Lipid-interacting Pocket Predictor (SLiPP), an algorithm that leverages machine learning to detect protein cavities capable of binding to lipids in protein structures. SLiPP uses a Random Forest classifier and operates at scale to predict lipid binding pockets with an accuracy of 96.8% and an F1 score of 86.9% when testing against a set of 8,380 pockets embedded within proteins. Our analyses revealed that the algorithm relies on hydrophobicity-related features to distinguish lipid binding pockets from those that bind to other ligands. SLiPP is fast and does not require substantial computational resources. Use of the algorithm to detect lipid binding proteins in various proteomes produced hits annotated or verified as bona fide lipid binding proteins. Additionally, SLiPP identified many new putative lipid binders in well studied proteomes. Because of its ability to identify novel lipid binding proteins, SLiPP can spur the discovery of new and "targetable" lipid-sensitive pathways.

PubMed Disclaimer

Figures

1
1
(A) Schematic description of data curation and the machine learning (ML) workflow. (B) Schematic description of pocket extraction via the fpocket algorithm. Description of the workflows to establish (C) and to use (D) SLiPP.
2
2
PCA analyses of pockets using the 17 physicochemical properties detected by dpocket. (A, B) Score plots of the first two principal components, which describe 40.9% and 18.3% of the variance, respectively. The data points were colored by class labels (A) and ligand identity (B). (C) A plot showing the contribution of each property to the first two principal components.
3
3
(A) Assessment of machine learning algorithms used for the classifier model. The performance was assessed with 25 random seedlings. Boxes were plotted from first quartile to third quartile, while the whiskers extend to demonstrate the whole range of the data except for outliers. Outliers were defined as the data points outside of 1.5 times the interquartile range from the first and third quartiles. (B) Optimization of data sets for the classifier. The performance was assessed with an independent test data set.
4
4
(A) Prediction results of the E. coli, yeast, and human proteomes. The prediction scores are ranked from high to low. The dotted line indicates the prediction threshold with probability >0.5. Gene ontology analyses of the top 10 biological process (B) and molecular function GO terms (C) in yeast and human (D, E). The size of the dot indicates the number of genes for the GO term while the color indicates the false discovery rate (FDR).
5
5
SLiPP accurately predicts the lipid binding pockets within the proteins. PDB structures are colored in cyan, while AlphaFold models are colored in tan. The ligands in the PDB structures are presented as pink stick models, and the SLiPP-predicted lipid binding pockets are shown as pink blobs. The middle panel are aligned structures of PDB structures and AlphaFold models to demonstrate the accuracy of the SLiPP prediction.
6
6
Identification of ADCK5 as a novel lipid binding protein. (A) AlphaFold model of ADCK5. The blue surface shows the SLiPP-predicted lipid-binding pocket, while the red surface shows the catalytic site (D360) of ADCK5. (B) Protein lipid overlay assay of ADCK5. TG = triglyceride, DG = diacylglycerol, PA = phosphatidic acid, PS = phosphatidylserine, PE = phosphatidylethanolamine, PC = phosphatidylcholine, PG = phosphatidylglycerol, CL = cardiolipin, PI = phosphatidylinositol, PIP = phosphatidylinositol 4-phosphate, PIP2 = phosphatidylinositol 4, 5-bisphosphate, PIP3 = phosphatidylinositol 3,4,5-trisphosphate, Chol = cholesterol, SM = sphingomyelin, BTL = brain total lipid extract (C) First derivative of F350/F330 of ADCK5 upon addition of BTL across temperature. The dotted line indicates the transition temperature of ADCK5. (D) ATPase activity of ADCK5 through measuring the phosphate release upon ATP hydrolysis. (E–H) Volcano plots of lipid enrichment by ADCK5 following its incubation with brain total lipid extract. The fold change was compared with the no protein control. Vertical lines represent a fold change threshold of 1.5 while horizontal lines represent an adjusted p-value threshold of 0.05. Red points are the highly enriched features that could not be annotated; these are labeled with their m/z values. Blue points are annotated but not experimentally verified as binding with physiologically relevant equilibrium dissociation constants.
7
7
Importance of pocket property was assessed by (A) the decrease in impurity and (B) the decrease in F1 score when the feature is permutated. The permutation was done in 10 repeats, with error bar indicating the standard deviation of the 10 repeats. (C) Violin plots of hydrophobicity scores with different ligand occupancies. The white dot represents median and the box plots from first quartile to third quartile. (D) Score plots on PCA analyses with the addition of heme binding pockets to the full data set; heme binding pockets are shown as purple dots with white borders.

Update of

References

    1. Vanier M., Millat G.. Niemann–Pick disease type C. Clin. Genet. 2003;64(4):269–281. doi: 10.1034/j.1399-0004.2003.00147.x. - DOI - PubMed
    1. Yu F. P. S., Amintas S., Levade T., Medin J. A.. Acid ceramidase deficiency: Farber disease and SMA-PME. Orphanet J. Rare Dis. 2018;13(1):121. doi: 10.1186/s13023-018-0845-z. - DOI - PMC - PubMed
    1. Jefferies J. L.. Barth syndrome. Am. J. Med. Genet., Part C. 2013;163(3):198–205. doi: 10.1002/ajmg.c.31372. - DOI - PMC - PubMed
    1. Marshall W. C., Ockenden B. G., Fosbrooke A. S., Cumings J. N.. Wolman’s disease. A rare lipidosis with adrenal calcification. Arch. Dis. Child. 1969;44(235):331–341. doi: 10.1136/adc.44.235.331. - DOI - PMC - PubMed
    1. Santos C. R., Schulze A.. Lipid metabolism in cancer. FEBS J. 2012;279(15):2610–2623. doi: 10.1111/j.1742-4658.2012.08644.x. - DOI - PubMed

LinkOut - more resources