Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Apr 2;20(4):1826-1834.
doi: 10.1021/acs.jproteome.0c00407. Epub 2020 Oct 7.

Spritz: A Proteogenomic Database Engine

Affiliations

Spritz: A Proteogenomic Database Engine

Anthony J Cesnik et al. J Proteome Res. .

Abstract

Proteoforms are the workhorses of the cell, and subtle differences between their amino acid sequences or post-translational modifications (PTMs) can change their biological function. To most effectively identify and quantify proteoforms in genetically diverse samples by mass spectrometry (MS), it is advantageous to search the MS data against a sample-specific protein database that is tailored to the sample being analyzed, in that it contains the correct amino acid sequences and relevant PTMs for that sample. To this end, we have developed Spritz (https://smith-chem-wisc.github.io/Spritz/), an open-source software tool for generating protein databases annotated with sequence variations and PTMs. We provide a simple graphical user interface for Windows and scripts that can be run on any operating system. Spritz automatically sets up and executes approximately 20 tools, which enable the construction of a proteogenomic database from only raw RNA sequencing data. Sequence variations that are discovered in RNA sequencing data upon comparison to the Ensembl reference genome are annotated on proteins in these databases, and PTM annotations are transferred from UniProt. Modifications can also be discovered and added to the database using bottom-up mass spectrometry data and global PTM discovery in MetaMorpheus. We demonstrate that such sample-specific databases allow the identification of variant peptides, modified variant peptides, and variant proteoforms by searching bottom-up and top-down proteomic data from the Jurkat human T lymphocyte cell line and demonstrate the identification of phosphorylated variant sites with phosphoproteomic data from the U2OS human osteosarcoma cell line.

Keywords: PTMs; RNA-Seq; modifications; proteoform; proteogenomics; sample-specific; sequence variations; top-down; transcriptomics.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Spritz is a pipeline of approximately 20 tools (Table S1) that produces sample-specific protein databases annotated with modifications and sequence variations. Spritz automatically downloads the genome fasta file, gene model file, and UniProt XML protein database for the organism. If publicly available sequence read archives (SRAs) are available for the sample of interest, Spritz will also download those raw RNA-Seq data. MetaMorpheus is used in this work to search both bottom-up and top-down mass spectrometry proteomics data. Bottom-up proteomics data are used to discover additional sites of modification to help enable the identification of variant proteoform sequences.
Figure 2:
Figure 2:
Coding variations were applied to proteins generated from the Ensembl reference genome including the amino acid substitutions listed here (green), while synonymous variations were excluded (grey).
Figure 3:
Figure 3:
A) Several types of variants were called by GATK, annotated in the protein database, included in the search space, and identified by the bottom-up proteomic analysis of Jurkat lysate. B) Comparison of the number of variant peptides and C) variant sites identified by Spritz analysis (blue) and by the previous analysis (red). The uniquely identified variant peptides or sites for each protease are shown in black, whereas the cumulative total of variant peptides or sites are shown in blue or red. As reported previously ,, the use of multiple proteases enhances sequence coverage, leading to additional variant identifications, as shown in B and C.
Figure 4.
Figure 4.
Modifications detected on variant peptides in A) Jurkat multi-protease data and B) U2OS phosphoproteomic data.
Figure 5.
Figure 5.
Manually constructed proteoform family examples containing variant proteoforms. Each pink square represents a gene, each pink diamond represents a transcript (e.g., variant, isoform), and each purple circle represents a proteoform identification. A) Proteoform family containing proteoforms generated by translation of both reference and variant transcript sequences. All proteoforms are related to a single gene and therefore belong to the same proteoform family. B) Proteoform family for a histone containing two SAAVs and many modifications.

References

    1. Smith LM; Kelleher NL; Consortium for Top Down Proteomics. Proteoform: A Single Term Describing Protein Complexity. Nat. Methods 2013, 10 (3), 186–187. 10.1038/nmeth.2369. - DOI - PMC - PubMed
    1. Kelleher NL; Lin HY; Valaskovic GA; Aaserud DJ; Fridriksson EK; McLafferty FW Top Down versus Bottom Up Protein Characterization by Tandem High-Resolution Mass Spectrometry. J. Am. Chem. Soc 1999, 121 (4), 806–812. 10.1021/ja973655h. - DOI
    1. Catherman AD; Durbin KR; Ahlf DR; Early BP; Fellers RT; Tran JC; Thomas PM; Kelleher NL Large-Scale Top-down Proteomics of the Human Proteome: Membrane Proteins, Mitochondria, and Senescence. Mol. Cell. Proteomics MCP 2013, 12 (12), 3465–3473. 10.1074/mcp.M113.030114. - DOI - PMC - PubMed
    1. Dai Y; Shortreed MR; Scalf M; Frey BL; Cesnik AJ; Solntsev S; Schaffer LV; Smith LM Elucidating Escherichia Coli Proteoform Families Using Intact-Mass Proteomics and a Global PTM Discovery Database. J. Proteome Res 2017, 16 (11), 4156–4165. 10.1021/acs.jproteome.7b00516. - DOI - PMC - PubMed
    1. Dai Y; Buxton KE; Schaffer LV; Miller RM; Millikin RJ; Scalf M; Frey BL; Shortreed MR; Smith LM Constructing Human Proteoform Families Using Intact-Mass and Top-Down Proteomics with a Multi-Protease Global Post-Translational Modification Discovery Database. J. Proteome Res 2019, 18 (10), 3671–3680. 10.1021/acs.jproteome.9b00339. - DOI - PMC - PubMed

Publication types