Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Oct 7;11 Suppl 6(Suppl 6):S7.
doi: 10.1186/1471-2105-11-S6-S7.

PEPPI: a peptidomic database of human protein isoforms for proteomics experiments

Affiliations

PEPPI: a peptidomic database of human protein isoforms for proteomics experiments

Ao Zhou et al. BMC Bioinformatics. .

Abstract

Background: Protein isoform generation, which may derive from alternative splicing, genetic polymorphism, and posttranslational modification, is an essential source of achieving molecular diversity by eukaryotic cells. Previous studies have shown that protein isoforms play critical roles in disease diagnosis, risk assessment, sub-typing, prognosis, and treatment outcome predictions. Understanding the types, presence, and abundance of different protein isoforms in different cellular and physiological conditions is a major task in functional proteomics, and may pave ways to molecular biomarker discovery of human diseases. In tandem mass spectrometry (MS/MS) based proteomics analysis, peptide peaks with exact matches to protein sequence records in the proteomics database may be identified with mass spectrometry (MS) search software. However, due to limited annotation and poor coverage of protein isoforms in proteomics databases, high throughput protein isoform identifications, particularly those arising from alternative splicing and genetic polymorphism, have not been possible.

Results: Therefore, we present the PEPtidomics Protein Isoform Database (PEPPI, http://bio.informatics.iupui.edu/peppi), a comprehensive database of computationally-synthesized human peptides that can identify protein isoforms derived from either alternatively spliced mRNA transcripts or SNP variations. We collected genome, pre-mRNA alternative splicing and SNP information from Ensembl. We synthesized in silico isoform transcripts that cover all exons and theoretically possible junctions of exons and introns, as well as all their variations derived from known SNPs. With three case studies, we further demonstrated that the database can help researchers discover and characterize new protein isoform biomarkers from experimental proteomics data.

Conclusions: We developed a new tool for the proteomics community to characterize protein isoforms from MS-based proteomics experiments. By cataloguing each peptide configurations in the PEPPI database, users can study genetic variations and alternative splicing events at the proteome level. They can also batch-download peptide sequences in FASTA format to search for MS/MS spectra derived from human samples. The database can help generate novel hypotheses on molecular risk factors and molecular mechanisms of complex diseases, leading to identification of potentially highly specific protein isoform biomarkers.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Web Interface Structure (A) Search Home: main search page allowing five types of query string: Ensembl gene ID, IPI protein accession number, peptide sequence, PEPPI region ID and peptide ID. (B) Gene View: search result page visualizing peptide regions within a gene. (C) Region View: search result page displaying peptides within a peptide region. (D) Peptide View: PEPPI peptide information page. (E) Protein View: search result page of PEPPI peptides mapped to an IPI protein. (F) Sequence Search: search result page of PEPPI peptides mapped to a query peptide sequence.
Figure 2
Figure 2
Gene View (A) Gene scale. Shows the user the chromosome coordinates and strand of the gene. With the gene scale, users can read the approximate position of the peptide regions. (B) Peptide regions. Shows the user which five types of peptide regions within current given gene that the peptide belong to. The coloring of the peptide regions indicates the ORF (Red: 0, Green: 1, Blue: 2). The solid bars indicate exons, and the blank bars indicate introns. The curve between two exons means the exons are spliced with each other.
Figure 3
Figure 3
Region View (A) The different colors of the cDNA and peptide sequences indicate two different exons, or an exon and an intron. The amino acid letter colored in red overlaps with the splice site. (B) Green and light cyan backgrounds are used to indicate SNP in cDNA and peptide sequences. (C) By clicking on a SNP in sequence, users can see the details of the SNP. A link to the dbSNP database is provided.
Figure 4
Figure 4
Peptide View (A) The cDNA and peptide sequences that the current PEPPI peptide is mapped to. Same color theme is used in the region view. By clicking on SNP in sequence, users can access detailed information of the SNP. (B) In the protein mapping list, the view displays all the proteins mapped to the current PEPPI peptide. The peptide sequence is highlighted from within the protein sequence.
Figure 5
Figure 5
Identifying The Genomic Origin of MS Detected Peptides and The Relating Alternative Splicing Event (A) The HIP-2 search result page of protein IPI00023636, displaying the evidence peptides detected in MS experiments. (B) The PEPPI sequence search result page of peptide 1, indicating the query peptide is produced from an exon-exon combination region. The corresponding PEPPI peptide can be mapped to 4 proteins. (C) The sequence search result page of peptide 2, indicating the peptide comes from an exon, and can be mapped to 5 proteins. (D) The search result of the wild-type MP2K7_HUMAN, showing the regions mapped by the peptides. Peptide 1 crosses the splice site of two exons (PEP000841690 and PEP000841692). Peptide 2 is produced from a single exon, PEP000841692. (E) The search result of the 3rd isoform of MP2K7_HUMAN, the protein that is mapped to the peptide 2 but not mapped to peptide 1. That is because the insertion of a cassette exon (PEP000841691) changed the sequence of the protein.
Figure 6
Figure 6
Overlap of Peptides/Genes Identified by Four Search Databases. (A) Peptides identified from two or more samples. (B) Proteins identified from two or more samples.
Figure 7
Figure 7
Data Generation Process. The whole data generation process was divided into three steps: (A) deriving the genome data from Ensembl; (B) pre-processing of the data to select protein-coding genes and SNPs within coding exons; and (C) generation of peptide regions and PEPPI peptides. The result datasets are colored in red.
Figure 8
Figure 8
The UML of Database Backend The datasets derived by the data generation pipeline are colored in red, and the datasets derived from other databases are colored in black.

Similar articles

Cited by

References

    1. Kim P, Kim N, Lee Y, Kim B, Shin Y, Lee S. ECgene: genome annotation for alternative splicing. Nucleic Acids Res. 2005;33(Database issue):D75–79. doi: 10.1093/nar/gki118. - DOI - PMC - PubMed
    1. Lixia M, Zhijian C, Chao S, Chaojiang G, Congyi Z. Alternative splicing of breast cancer associated gene BRCA1 from breast cancer cell line. J Biochem Mol Biol. 2007;40(1):15–21. - PubMed
    1. Zhu Z, Xing S, Cheng P, Zeng F, Lu G. Modification of alternative splicing of Bcl-x pre-mRNA in bladder cancer cells. J Huazhong Univ Sci Technolog Med Sci. 2006;26(2):213–216. doi: 10.1007/BF02895819. - DOI - PubMed
    1. Ku TH, Hsu FR. Mining colon cancer specific alternative splicing in EST database. AMIA Annu Symp Proc. 2005. p. 1012. - PMC - PubMed
    1. Ogawa T, Shiga K, Hashimoto S, Kobayashi T, Horii A, Furukawa T. APAF-1-ALT, a novel alternative splicing form of APAF-1, potentially causes impeded ability of undergoing DNA damage-induced apoptosis in the LNCaP human prostate cancer cell line. Biochem Biophys Res Commun. 2003;306(2):537–543. doi: 10.1016/S0006-291X(03)00995-1. - DOI - PubMed

Publication types

LinkOut - more resources