Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Aug;16(8):3737-3760.
doi: 10.1038/s41596-021-00566-6. Epub 2021 Jul 9.

Tutorial: best practices and considerations for mass-spectrometry-based protein biomarker discovery and validation

Affiliations
Review

Tutorial: best practices and considerations for mass-spectrometry-based protein biomarker discovery and validation

Ernesto S Nakayasu et al. Nat Protoc. 2021 Aug.

Abstract

Mass-spectrometry-based proteomic analysis is a powerful approach for discovering new disease biomarkers. However, certain critical steps of study design such as cohort selection, evaluation of statistical power, sample blinding and randomization, and sample/data quality control are often neglected or underappreciated during experimental design and execution. This tutorial discusses important steps for designing and implementing a liquid-chromatography-mass-spectrometry-based biomarker discovery study. We describe the rationale, considerations and possible failures in each step of such studies, including experimental design, sample collection and processing, and data collection. We also provide guidance for major steps of data processing and final statistical analysis for meaningful biological interpretations along with highlights of several successful biomarker studies. The provided guidelines from study design to implementation to data interpretation serve as a reference for improving rigor and reproducibility of biomarker development studies.

PubMed Disclaimer

Conflict of interest statement

Competing interests

The authors declare no competing interests.

Figures

Fig. 1 |
Fig. 1 |. Phases of biomarker development studies.
Biomarker discovery is usually divided into three different phases: discovery, verification and validation. In the discovery phase, a small number of samples is submitted for in-depth proteomics analysis where thousands of proteins are measured to identify biomarker candidates. Often, larger cohorts of samples are analyzed in the subsequent phases, increasing the statistical power. Biomarker candidates are also downselected each developmental phase based on their performance to accurate predict the disease or condition. In some cases, a combination rather than individual protein is tested as a biomarker. In the verification phase, biomarker candidates undergo additional proteomics analysis to verify both their identities and expression in the same or similar samples as in the discovery phase. A few of the most promising candidates are tested in the validation phase to determine its performance for clinical use.
Fig. 2 |
Fig. 2 |. Considerations for each step of the discovery-phase workflow.
The main consideration points for each step of the workflow are shown. Note that an example for blood plasma analysis is shown, but other sample types may have some additional or fewer steps in the workflow. For tissue analysis, the immunodepletion step should be replaced by a tissue lysis step, the details of which are documented in the text.
Fig. 3 |
Fig. 3 |. Monitoring instrument performance with standard samples.
In our laboratory, we use a tryptic digest of the bacterium Shewanella oneidensis as a standard sample to check the LC-MS/MS performance. This standard is run before and after each batch of samples. a, Number of identified peptides in S. oneidensis runs. Note a slow decay in the number of identified peptides, which is almost unnoticeable in consecutive runs but has a major effect across time. The number of peptide identifications was reestablished after cleaning the instrument. b,c, Chromatograms from analysis of S. oneidensis before and after instrument cleaning, respectively. This shows the cumulative reduction in instrument performance across time.
Fig. 4 |
Fig. 4 |. Identification of unexpected peptide modifications with data QC analysis.
a, Total-ion chromatogram from analysis of three LC-MS/MS runs from corresponding high-pH reversed-phase chromatography fractions of different multiplexed sets of isobaric-tagged samples. The runs were analyzed by QC-ART, and the flagged run is highlighted. The highlighted region has a different peak profile compared with the unflagged runs. b, A selected m/z range of the region highlighted in a. The analysis reviewed a shift of 15.99 Da, corresponding to the mass of an oxidation, on the peptide GQYCYELDEK, which does not contain the methionine residues, which are commonly searched during peptide identification. c, Workflow of the MSGF + database searches to identify new oxidized residues. The searches considered oxidation in any residue and used Ascore to ensure the site of modification. d, Normalized counts of oxidized amino acid residues. e,f, Average number of peptide (e) and protein (f) identifications per fraction of reanalyzed data. The blue bars represent the database search performed considering methionine oxidation as the only possible modification, whereas the red bars also considered methionine, cysteine, tryptophan and tyrosine oxidations. This shows that not only can QC analysis find runs with drift in in sample preparation and instrument performance, but it can also find runs that have distinct profiles due to unexpected posttranslational modifications. The asterisks represent P ≤ 0.05 by t-test. Reproduced from ref. with permission from the American Society for Biochemistry and Molecular Biology.
Fig. 5 |
Fig. 5 |. Considerations for each step of the validation-phase workflow.
The main consideration points for each step of the workflow are shown.

References

    1. Rappaport N et al. MalaCards: an amalgamated human disease compendium with diverse clinical and genetic annotation and structured search. Nucleic Acids Res. 45, D877–D887 (2017). - PMC - PubMed
    1. Yi L, Swensen AC & Qian WJ Serum biomarkers for diagnosis and prediction of type 1 diabetes. Transl. Res 201, 13–25 (2018). - PMC - PubMed
    1. Sims EK et al. Teplizumab improves and stabilizes beta cell function in antibody-positive high-risk individuals. Sci. Transl. Med 10.1126/scitranslmed.abc8980 (2021). - DOI - PMC - PubMed
    1. Sands BE Biomarkers of inflammation in inflammatory bowel disease. Gastroenterology 149, 1275–1285 e1272 (2015). - PubMed
    1. Lindhardt M et al. Proteomic prediction and Renin angiotensin aldosterone system Inhibition prevention Of early diabetic nephRopathy in TYpe 2 diabetic patients with normoalbuminuria (PRIORITY): essential study design and rationale of a randomised clinical multicentre trial. BMJ Open 6, e010310 (2016). - PMC - PubMed

Key references using this review

    1. Zhang Q et al. J. Exp. Med 210, 191–203 (2013): 10.1084/jem.20111843 - DOI - PMC - PubMed
    1. Carnielli CM et al. Nat. Commun 9, 3598 (2018): 10.1038/s41467-018-05696-2 - DOI - PMC - PubMed
    1. Tofte N et al. Lancet Diabetes Endocrinol. 8, 301–312 (2020): 10.1016/S2213-8587(20)30026-7 - DOI - PubMed
    1. Zhang Z et al. Cancer Res. 64, 5882–5890 (2004): 10.1158/0008-5472.CAN-04-0746 - DOI - PubMed

Publication types