Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 May 22:2024.04.15.589616.
doi: 10.1101/2024.04.15.589616.

Protein codes promote selective subcellular compartmentalization

Affiliations

Protein codes promote selective subcellular compartmentalization

Henry R Kilgore et al. bioRxiv. .

Update in

  • Protein codes promote selective subcellular compartmentalization.
    Kilgore HR, Chinn I, Mikhael PG, Mitnikov I, Van Dongen C, Zylberberg G, Afeyan L, Banani SF, Wilson-Hawken S, Lee TI, Barzilay R, Young RA. Kilgore HR, et al. Science. 2025 Mar 7;387(6738):1095-1101. doi: 10.1126/science.adq2634. Epub 2025 Feb 6. Science. 2025. PMID: 39913643

Abstract

Cells have evolved mechanisms to distribute ~10 billion protein molecules to subcellular compartments where diverse proteins involved in shared functions must efficiently assemble. Here, we demonstrate that proteins with shared functions share amino acid sequence codes that guide them to compartment destinations. A protein language model, ProtGPS, was developed that predicts with high performance the compartment localization of human proteins excluded from the training set. ProtGPS successfully guided generation of novel protein sequences that selectively assemble in targeted subcellular compartments. ProtGPS also identified pathological mutations that change this code and lead to altered subcellular localization of proteins. Our results indicate that protein sequences contain not only a folding code, but also a previously unrecognized code governing their distribution in specific cellular compartments.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: R.A.Y. is a founder and shareholder of Syros Pharmaceuticals, Camp4 Therapeutics, Omega Therapeutics, Dewpoint Therapeutics and Paratus Sciences, and has consulting or advisory roles at Precede Biosciences and Novo Nordisk. R.B. has consulting or advisory roles at Dewpoint Therapeutics, J&J, Amgen, Outcomes4Me, Immunai and Firmenich. H.R.K. is a consultant of Dewpoint Therapeutics. I.C. is a consultant of Paratus Sciences. The remaining authors declare no competing interests.

Figures

Fig 1.
Fig 1.. ProtGPS classifies protein compartment with high performance.
A. Graphical depiction of some cellular compartments found in eukaryotic cells, compartments in bold were studied in this work. B. Bar graph showing the number of protein sequences gathered from UNIPROT and the CD-code database used in the development of ProtGPS. C. Schematic showing the approach toward developing ProtGPS. D. Bar graph showing the area under the receiver-operator curve for classification of withheld test data (15 % of total) with ProtGPS.
Fig. 2.
Fig. 2.. Generative modeling creates novel proteins that concentrate in a desired condensate.
A. Schematic showing the use of Naturally constrained Markov Chain Monte Carlo to generate proteins and assay them in live cells (MCMC) (see supporting information for more details). B. Live cell image of a colon cancer cell (HCT-116) tagged at the endogenous NPM1 locus with GFP and expressing nucleolus targeted protein NUC1-mCherry, scale: 10 microns. C. Live cell confocal micrographs of NUCX-mCherry proteins in HCT-116 cells expressing NPM1-GFP from the endogenous locus cells, scale: 10 microns. D. Dot plots showing the measured partition ratios of NUCX (Kx =Inucleolus / Inucleoplasm) and SPLX-mCherry (Kx = ISRSF2 / Inucleoplasm or = ISRSF2 / Icytoplasm, as indicated by *) proteins relative to the NLS-mCherry control protein, dotted line is the average value of NLS-mCherry protein. See Table S3 for more information. E. Live cell images and quantification showing the relationship of measured partition ratios (Kx = Inucleolus / Inucleoplasm) into the nucleolus by proteins on the NUC6-mCherry trajectory to its computed probability of partitioning.
Fig 3.
Fig 3.. Pathogenic mutations are predicted to alter protein compartmentalization.
A. Schematic of information flow, pathogenic ClinVar mutants caused by single point or truncation mutations were classified with ProtGPS to determine if the detected protein code was changed in the pathogenic variant. B. (Left) Dot plot showing the Shannon entropy change in compartment prediction due to single point or truncation mutation. (Right) Histogram showing the Wasserstein distance between the wild-type and mutant protein compartment probabilities. C. Live cell images of mESCs ectopically expressing wild type and truncated pathogenic variants fused to GFP, Wasserstein distance is given for each mutant as w, scale 10 microns.

References

    1. Banani S. F., Lee H. O., Hyman A. A. & Rosen M. K. Biomolecular condensates: Organizers of cellular biochemistry. Nature Reviews Molecular and Cell Biology 18, 285–285, doi:10.1038/NRM.2017.7 (2017). - DOI - PMC - PubMed
    1. Lambert S. A. et al. The Human Transcription Factors. Cell 172, 650–665, doi:10.1016/j.cell.2018.01.029 (2018). - DOI - PubMed
    1. Cramer P. Organization and regulation of gene transcription. Nature 573, 45–54, doi:10.1038/s41586-019-1517-4 (2019). - DOI - PubMed
    1. Jena S. et al. Noncovalent interactions in proteins and nucleic acids: beyond hydrogen bonding and π-stacking. Chemical Society Reviews 51, 4261–4286, doi:10.1039/D2CS00133K (2022). - DOI - PubMed
    1. Huttlin E. L. et al. Architecture of the human interactome defines protein communities and disease networks. Nature 545, 505–509, doi:10.1038/nature22366 (2017). - DOI - PMC - PubMed

Publication types