Review

. 2021 Aug;16(8):3737-3760.

doi: 10.1038/s41596-021-00566-6. Epub 2021 Jul 9.

Tutorial: best practices and considerations for mass-spectrometry-based protein biomarker discovery and validation

Ernesto S Nakayasu¹, Marina Gritsenko², Paul D Piehowski², Yuqian Gao², Daniel J Orton², Athena A Schepmoes², Thomas L Fillmore², Brigitte I Frohnert³, Marian Rewers³, Jeffrey P Krischer⁴, Charles Ansong², Astrid M Suchy-Dicey⁵, Carmella Evans-Molina⁶, Wei-Jun Qian², Bobbie-Jo M Webb-Robertson^{2

7}, Thomas O Metz⁸

Affiliations

¹ Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA. ernesto.nakayasu@pnnl.gov.
² Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA.
³ Barbara Davis Center for Diabetes, School of Medicine, University of Colorado, Aurora, CO, USA.
⁴ Morsani College of Medicine, University of South Florida, Tampa, FL, USA.
⁵ Elson S. Floyd College of Medicine, Washington State University, Seattle, WA, USA.
⁶ Center for Diabetes and Metabolic Diseases and the Herman B Wells Center for Pediatric Research, Indiana University School of Medicine, Indianapolis, IN, USA.
⁷ Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.
⁸ Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA. thomas.metz@pnnl.gov.

PMID: 34244696
PMCID: PMC8830262
DOI: 10.1038/s41596-021-00566-6

Review

Tutorial: best practices and considerations for mass-spectrometry-based protein biomarker discovery and validation

Ernesto S Nakayasu et al. Nat Protoc. 2021 Aug.

. 2021 Aug;16(8):3737-3760.

doi: 10.1038/s41596-021-00566-6. Epub 2021 Jul 9.

Authors

Affiliations

¹ Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA. ernesto.nakayasu@pnnl.gov.
² Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA.
³ Barbara Davis Center for Diabetes, School of Medicine, University of Colorado, Aurora, CO, USA.
⁴ Morsani College of Medicine, University of South Florida, Tampa, FL, USA.
⁵ Elson S. Floyd College of Medicine, Washington State University, Seattle, WA, USA.
⁶ Center for Diabetes and Metabolic Diseases and the Herman B Wells Center for Pediatric Research, Indiana University School of Medicine, Indianapolis, IN, USA.
⁷ Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.
⁸ Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA. thomas.metz@pnnl.gov.

PMID: 34244696
PMCID: PMC8830262
DOI: 10.1038/s41596-021-00566-6

Abstract

Mass-spectrometry-based proteomic analysis is a powerful approach for discovering new disease biomarkers. However, certain critical steps of study design such as cohort selection, evaluation of statistical power, sample blinding and randomization, and sample/data quality control are often neglected or underappreciated during experimental design and execution. This tutorial discusses important steps for designing and implementing a liquid-chromatography-mass-spectrometry-based biomarker discovery study. We describe the rationale, considerations and possible failures in each step of such studies, including experimental design, sample collection and processing, and data collection. We also provide guidance for major steps of data processing and final statistical analysis for meaningful biological interpretations along with highlights of several successful biomarker studies. The provided guidelines from study design to implementation to data interpretation serve as a reference for improving rigor and reproducibility of biomarker development studies.

PubMed Disclaimer

Conflict of interest statement

Competing interests

The authors declare no competing interests.

Figures

**Fig. 1 |. Phases of biomarker development studies.**
Biomarker discovery is usually divided into three different phases: discovery, verification and validation. In the discovery phase, a small number of samples is submitted for in-depth proteomics analysis where thousands of proteins are measured to identify biomarker candidates. Often, larger cohorts of samples are analyzed in the subsequent phases, increasing the statistical power. Biomarker candidates are also downselected each developmental phase based on their performance to accurate predict the disease or condition. In some cases, a combination rather than individual protein is tested as a biomarker. In the verification phase, biomarker candidates undergo additional proteomics analysis to verify both their identities and expression in the same or similar samples as in the discovery phase. A few of the most promising candidates are tested in the validation phase to determine its performance for clinical use.

**Fig. 2 |. Considerations for each step of the discovery-phase workflow.**
The main consideration points for each step of the workflow are shown. Note that an example for blood plasma analysis is shown, but other sample types may have some additional or fewer steps in the workflow. For tissue analysis, the immunodepletion step should be replaced by a tissue lysis step, the details of which are documented in the text.

**Fig. 3 |. Monitoring instrument performance with standard samples.**
In our laboratory, we use a tryptic digest of the bacterium *Shewanella oneidensis* as a standard sample to check the LC-MS/MS performance. This standard is run before and after each batch of samples. a, Number of identified peptides in *S. oneidensis* runs. Note a slow decay in the number of identified peptides, which is almost unnoticeable in consecutive runs but has a major effect across time. The number of peptide identifications was reestablished after cleaning the instrument. **b,c**, Chromatograms from analysis of *S. oneidensis* before and after instrument cleaning, respectively. This shows the cumulative reduction in instrument performance across time.

**Fig. 4 |. Identification of unexpected peptide modifications with data QC analysis.**
a, Total-ion chromatogram from analysis of three LC-MS/MS runs from corresponding high-pH reversed-phase chromatography fractions of different multiplexed sets of isobaric-tagged samples. The runs were analyzed by QC-ART, and the flagged run is highlighted. The highlighted region has a different peak profile compared with the unflagged runs. b, A selected m/z range of the region highlighted in a. The analysis reviewed a shift of 15.99 Da, corresponding to the mass of an oxidation, on the peptide GQYCYELDEK, which does not contain the methionine residues, which are commonly searched during peptide identification. c, Workflow of the MSGF + database searches to identify new oxidized residues. The searches considered oxidation in any residue and used Ascore to ensure the site of modification. d, Normalized counts of oxidized amino acid residues. **e,f**, Average number of peptide (e) and protein (f) identifications per fraction of reanalyzed data. The blue bars represent the database search performed considering methionine oxidation as the only possible modification, whereas the red bars also considered methionine, cysteine, tryptophan and tyrosine oxidations. This shows that not only can QC analysis find runs with drift in in sample preparation and instrument performance, but it can also find runs that have distinct profiles due to unexpected posttranslational modifications. The asterisks represent P ≤ 0.05 by t-test. Reproduced from ref. with permission from the American Society for Biochemistry and Molecular Biology.

**Fig. 5 |. Considerations for each step of the validation-phase workflow.**
The main consideration points for each step of the workflow are shown.

See this image and copyright information in PMC

References

1. Rappaport N et al. MalaCards: an amalgamated human disease compendium with diverse clinical and genetic annotation and structured search. Nucleic Acids Res. 45, D877–D887 (2017). - PMC - PubMed
1. Yi L, Swensen AC & Qian WJ Serum biomarkers for diagnosis and prediction of type 1 diabetes. Transl. Res 201, 13–25 (2018). - PMC - PubMed
1. Sims EK et al. Teplizumab improves and stabilizes beta cell function in antibody-positive high-risk individuals. Sci. Transl. Med 10.1126/scitranslmed.abc8980 (2021). - DOI - PMC - PubMed
1. Sands BE Biomarkers of inflammation in inflammatory bowel disease. Gastroenterology 149, 1275–1285 e1272 (2015). - PubMed
1. Lindhardt M et al. Proteomic prediction and Renin angiotensin aldosterone system Inhibition prevention Of early diabetic nephRopathy in TYpe 2 diabetic patients with normoalbuminuria (PRIORITY): essential study design and rationale of a randomised clinical multicentre trial. BMJ Open 6, e010310 (2016). - PMC - PubMed

Key references using this review

1. Zhang Q et al. J. Exp. Med 210, 191–203 (2013): 10.1084/jem.20111843 - DOI - PMC - PubMed
1. Carnielli CM et al. Nat. Commun 9, 3598 (2018): 10.1038/s41467-018-05696-2 - DOI - PMC - PubMed
1. Tofte N et al. Lancet Diabetes Endocrinol. 8, 301–312 (2020): 10.1016/S2213-8587(20)30026-7 - DOI - PubMed
1. Zhang Z et al. Cancer Res. 64, 5882–5890 (2004): 10.1158/0008-5472.CAN-04-0746 - DOI - PubMed

Publication types

Actions
Actions
Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Tutorial: best practices and considerations for mass-spectrometry-based protein biomarker discovery and validation

Affiliations

Tutorial: best practices and considerations for mass-spectrometry-based protein biomarker discovery and validation

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Key references using this review

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources