Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Nov-Dec;43(6):1237-1254.
doi: 10.1002/mas.21849. Epub 2023 May 4.

Statistical approaches applicable in managing OMICS data: Urinary proteomics as exemplary case

Affiliations
Review

Statistical approaches applicable in managing OMICS data: Urinary proteomics as exemplary case

De-Wei An et al. Mass Spectrom Rev. 2024 Nov-Dec.

Abstract

With urinary proteomics profiling (UPP) as exemplary omics technology, this review describes a workflow for the analysis of omics data in large study populations. The proposed workflow includes: (i) planning omics studies and sample size considerations; (ii) preparing the data for analysis; (iii) preprocessing the UPP data; (iv) the basic statistical steps required for data curation; (v) the selection of covariables; (vi) relating continuously distributed or categorical outcomes to a series of single markers (e.g., sequenced urinary peptide fragments identifying the parental proteins); (vii) showing the added diagnostic or prognostic value of the UPP markers over and beyond classical risk factors, and (viii) pathway analysis to identify targets for personalized intervention in disease prevention or treatment. Additionally, two short sections respectively address multiomics studies and machine learning. In conclusion, the analysis of adverse health outcomes in relation to omics biomarkers rests on the same statistical principle as any other data collected in large population or patient cohorts. The large number of biomarkers, which have to be considered simultaneously requires planning ahead how the study database will be structured and curated, imported in statistical software packages, analysis results will be triaged for clinical relevance, and presented.

Keywords: multidimensional classifiers; proteomics; statistical methods; urinary proteomics.

PubMed Disclaimer

References

REFERENCES

    1. Bartel J, Krumsiek J, Theis FJ. 2013. Statistical methods for the analysis of high‐throughput metabolomics data. Comput Struct Biotechnol J 4, e201301009.
    1. Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Royal Stat Soc B 57, 289–300.
    1. Bhat A, Heinzel A, Mayer B, Perco P, Mühlberger I, Husi H, Merseburger AS, Zoidakis J, Vlahou A, Schanstra JP, Mischak H, Jankowski V. 2015. Protein interactome of muscle‐invasive bladder cancer. PLoS One 10, e0116404.
    1. Blom G. 1958. Statistical estimates and transformed beta‐variables. 1st ed. New York/Stockholm: Wiley/Almquist and Wiksell.
    1. Casalicchio G, Molnar C, Bischl B. 2019. Visualizing the feature importance for black box models. In: Machine Learning and Knowledge Discovery in Databases (Berlingerio M, Bonchi F, Gärtner T, eds.). Cham, Switzerland: Springer International Publishing, 665–670.

LinkOut - more resources