Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jan 28;5(1):e8949.
doi: 10.1371/journal.pone.0008949.

An integrated mass-spectrometry pipeline identifies novel protein coding-regions in the human genome

Affiliations

An integrated mass-spectrometry pipeline identifies novel protein coding-regions in the human genome

Danny A Bitton et al. PLoS One. .

Abstract

Background: Most protein mass spectrometry (MS) experiments rely on searches against a database of known or predicted proteins, limiting their ability as a gene discovery tool.

Results: Using a search against an in silico translation of the entire human genome, combined with a series of annotation filters, we identified 346 putative novel peptides [False Discovery Rate (FDR)<5%] in a MS dataset derived from two human breast epithelial cell lines. A subset of these were then successfully validated by a different MS technique. Two of these correspond to novel isoforms of Heterogeneous Ribonuclear Proteins, while the rest correspond to novel loci.

Conclusions: MS technology can be used for ab initio gene discovery in human data, which, since it is based on different underlying assumptions, identifies protein-coding genes not found by other techniques. As MS technology continues to evolve, such approaches will become increasingly powerful.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. A pipeline to identify peptides originating from uncharacterised proteins using LC MS/MS data.
Data are subjected first to identification using ProteinPilot and then filtered according to genome annotation. A subset of predicted novel peptides were then confirmed by addition MS/MS.
Figure 2
Figure 2. Location and nature of novel exon-3′ extending peptide in HNRNPL.
Top: Location of peptide relative to exons. (Blue rectangle: gene; brown rectangles: transcripts; red/white rectangles: exons; red: coding, white: UTR). Bottom: alignment between NP_001128232.1 (hnRNPL isoform a, Rattus norvegicus) and HNRPL_HUMAN , showing location of the candidate peptide, and the retained intron found in the rat, but not the human, sequence.
Figure 3
Figure 3. Location and nature of novel integenic peptide relative to Genscan prediction.
Top: the peptide identified by the pipeline is classified as intronic, but is within the Genscan prediction GENSCAN00000020420. Bottom: the predicted protein is similar to hnRNPA1 (RA1L3_HUMAN; BLAST; Expect = 1e−33; 73% Identity).

Similar articles

Cited by

References

    1. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, et al. The Sequence of the Human Genome. Science. 2001;291:1304–1351. - PubMed
    1. Birney E, Andrews D, Caccamo M, Chen Y, Clarke L, et al. Ensembl 2006. Nucl Acids Res. 2006;34:D556–561. - PMC - PubMed
    1. Guigo R, Dermitzakis ET, Agarwal P, Ponting CP, Parra G, et al. Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes. PNAS. 2003;100:1140–1145. - PMC - PubMed
    1. Guigo R, Flicek P, Abril J, Reymond A, Lagarde J, et al. EGASP: the human ENCODE Genome Annotation Assessment Project. Genome Biology. 2006;7:S2. - PMC - PubMed
    1. Claverie J-M. GENE NUMBER: What If There Are Only 30,000 Human Genes? Science. 2001;291:1255–1257. - PubMed

Publication types