. 2011 Jan;10(1):M110.002527.

doi: 10.1074/mcp.M110.002527. Epub 2010 Oct 28.

Proteogenomic analysis of polymorphisms and gene annotation divergences in prokaryotes using a clustered mass spectrometry-friendly database

Gustavo A de Souza¹, Magnus Ø Arntzen, Suereta Fortuin, Anita C Schürch, Hiwa Målen, Christopher R E McEvoy, Dick van Soolingen, Bernd Thiede, Robin M Warren, Harald G Wiker

Affiliations

PMID: 21030493
PMCID: PMC3013451
DOI: 10.1074/mcp.M110.002527

Proteogenomic analysis of polymorphisms and gene annotation divergences in prokaryotes using a clustered mass spectrometry-friendly database

Gustavo A de Souza et al. Mol Cell Proteomics. 2011 Jan.

. 2011 Jan;10(1):M110.002527.

doi: 10.1074/mcp.M110.002527. Epub 2010 Oct 28.

Authors

Gustavo A de Souza¹, Magnus Ø Arntzen, Suereta Fortuin, Anita C Schürch, Hiwa Målen, Christopher R E McEvoy, Dick van Soolingen, Bernd Thiede, Robin M Warren, Harald G Wiker

Affiliation

¹ The Gade Institute, Section for Microbiology and Immunology, University of Bergen, N-5021 Bergen, Norway.

PMID: 21030493
PMCID: PMC3013451
DOI: 10.1074/mcp.M110.002527

Abstract

Precise annotation of genes or open reading frames is still a difficult task that results in divergence even for data generated from the same genomic sequence. This has an impact in further proteomic studies, and also compromises the characterization of clinical isolates with many specific genetic variations that may not be represented in the selected database. We recently developed software called multistrain mass spectrometry prokaryotic database builder (MSMSpdbb) that can merge protein databases from several sources and be applied on any prokaryotic organism, in a proteomic-friendly approach. We generated a database for the Mycobacterium tuberculosis complex (using three strains of Mycobacterium bovis and five of M. tuberculosis), and analyzed data collected from two laboratory strains and two clinical isolates of M. tuberculosis. We identified 2561 proteins, of which 24 were present in M. tuberculosis H37Rv samples, but not annotated in the M. tuberculosis H37Rv genome. We were also able to identify 280 nonsynonymous single amino acid polymorphisms and confirm 367 translational start sites. As a proof of concept we applied the database to whole-genome DNA sequencing data of one of the clinical isolates, which allowed the validation of 116 predicted single amino acid polymorphisms and the annotation of 131 N-terminal start sites. Moreover we identified regions not present in the original M. tuberculosis H37Rv sequence, indicating strain divergence or errors in the reference sequence. In conclusion, we demonstrated the potential of using a merged database to better characterize laboratory or clinical bacterial strains.

PubMed Disclaimer

Figures

**Fig. 1.**
**TSS choice validation for protein Rv0390.** A, The entry in FASTA format is shown. Underlined region delimit the expected tryptic peptides of two predicted TSS choices, a valine and a methionine (bold, underlined). These tryptic peptides were artificially added in the end of the entry after the letter code “O” (bold). B, Fragmentation profile of peptide SYAGDITPLQAWEMLSDNPR.

**Fig. 2.**
**Identification of regions predicted as noncoding.** A, The entry in FASTA format is shown, with the predicted N-terminal tryptic peptide underlined. The sequence in bold is present in a region initially predicted as noncoding in all eight genomes used in this work. B, The fragmentation pattern of sequence MEGDAGAGQLNPADANK is shown, indicating that this region is indeed coding. This amino acid sequence is not present in any other gene of the database.

**Fig. 3.**
**Identification of *M. tuberculosis* H37Rv unannotated genes.** A, Schematic representation of the genomic region containing the gene MT2297 from the *M. tuberculosis* CDC1551 strain. Black boxes indicate gene annotation. In *M. tuberculosis* H37Rv and *M. tuberculosis* F11 genomes, the gene is not annotated but the genomic region is nonetheless present. B, Fragmentation pattern of peptide ADLYAAVDAMR from MT2297, present in *M. tuberculosis* H37Rv fractions.

**Fig. 4.**
**Missing region of the *M. tuberculosis* H37Rv genome.** A, Alignment of selected gene sequences from *M. tuberculosis* CDC1551, H37Ra, and H37Rv genomes, illustrating a deletion region in *M. tuberculosis* H37Rv that includes genes MT2420/MRA_2374 to MT2422/MRA_2376. Interestingly, *M. tuberculosis* H37Rv and *M. tuberculosis* H37Ra share the same ancestor, but the *M. tuberculosis* H37Ra genome sequence share more similarities with the *M. tuberculosis* CDC1551 genome than with the original *M. tuberculosis* H37Rv genome sequence. B, Fragmentation pattern of peptide AQAAALEAEHQAIVR from MT2420, found in *M. tuberculosis* H37Rv (ATCC27294) whole cell lysates, indicating that the deletion reported in the original *M. tuberculosis* H37Rv sequencing effort is incorrect. MS/MS information tables (MaxQuant output) are openly available at www.proteomecommons.org under the Hash code: dXuxNwU84QKYzzkLfmpU8Mcv6p277wRTOWXjRuWEH/WkkdAyYT/DeWm3ILF43l3lLZF7MMchNwPBwWa6G16fo6KhRrIAAAAAAAAC/w = = All RAW files used in this work can be downloaded using the Hash code: EIH2o0QZ9mMIXgurpLpJ34rgf1PQHXKOIa0EUOX0NIZ+bJdOOsdkXvcCQ9N5ZUqtlAEDZ/TQaoPn/uTOvpR5SPQuAyAAAAAAAAB0Cw = =.

See this image and copyright information in PMC

Cited by

Mycobacterium tuberculosis Rv3628 drives Th1-type T cell immunity via TLR2-mediated activation of dendritic cells and displays vaccine potential against the hyper-virulent Beijing K strain.
Kim WS, Kim JS, Cha SB, Kim H, Kwon KW, Kim SJ, Han SJ, Choi SY, Cho SN, Park JH, Shin SJ. Kim WS, et al. Oncotarget. 2016 May 3;7(18):24962-82. doi: 10.18632/oncotarget.8771. Oncotarget. 2016. PMID: 27097115 Free PMC article.
Genome annotation improvements from cross-phyla proteogenomics and time-of-day differences in malaria mosquito proteins using untargeted quantitative proteomics.
Imrie L, Le Bihan T, O'Toole Á, Hickner PV, Dunn WA, Weise B, Rund SSC. Imrie L, et al. PLoS One. 2019 Jul 29;14(7):e0220225. doi: 10.1371/journal.pone.0220225. eCollection 2019. PLoS One. 2019. PMID: 31356616 Free PMC article.
Mycobacterium tuberculosis Rv0927c Inhibits NF-κB Pathway by Downregulating the Phosphorylation Level of IκBα and Enhances Mycobacterial Survival.
Xia A, Li X, Quan J, Chen X, Xu Z, Jiao X. Xia A, et al. Front Immunol. 2021 Aug 31;12:721370. doi: 10.3389/fimmu.2021.721370. eCollection 2021. Front Immunol. 2021. PMID: 34531869 Free PMC article.
Introducing the ESAT-6 free IGRA, a companion diagnostic for TB vaccines based on ESAT-6.
Ruhwald M, de Thurah L, Kuchaka D, Zaher MR, Salman AM, Abdel-Ghaffar AR, Shoukry FA, Michelsen SW, Soborg B, Blauenfeldt T, Mpagama S, Hoff ST, Agger EM, Rosenkrands I, Aagard C, Kibiki G, El-Sheikh N, Andersen P. Ruhwald M, et al. Sci Rep. 2017 Apr 7;7:45969. doi: 10.1038/srep45969. Sci Rep. 2017. PMID: 28387329 Free PMC article.
Empowering Shotgun Mass Spectrometry with 2DE: A HepG2 Study.
Kiseleva O, Ponomarenko E, Poverennaya E. Kiseleva O, et al. Int J Mol Sci. 2020 May 27;21(11):3813. doi: 10.3390/ijms21113813. Int J Mol Sci. 2020. PMID: 32471280 Free PMC article.

See all "Cited by" articles

References

1. Garrels J. I. (2002) Yeast genomic databases and the challenge of the post-genomic era. Funct. Integr. Genomics 2, 212–237 - PubMed
1. Rappsilber J., Mann M. (2002) What does it mean to identify a protein in proteomics? Trends Biochem. Sci. 27, 74–78 - PubMed
1. Ge H., Walhout A. J., Vidal M. (2003) Integrating ‘omic’ information: a bridge between genomics and systems biology. Trends. Genet. 19, 551–560 - PubMed
1. Overbeek R. (2000) Genomics: what is realistically achievable? Genome Biol.. 1, Comment2002 - PMC - PubMed
1. Kyrpides N. C. (1999) Genomes OnLine Database (GOLD 1.0): a monitor of complete and ongoing genome projects world-wide. Bioinformatics 15, 773–774 - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Proteogenomic analysis of polymorphisms and gene annotation divergences in prokaryotes using a clustered mass spectrometry-friendly database

Affiliation

Proteogenomic analysis of polymorphisms and gene annotation divergences in prokaryotes using a clustered mass spectrometry-friendly database

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials