Statistical Contributions to Bioinformatics: Design, Modeling, Structure Learning, and Integration
- PMID: 29129969
- PMCID: PMC5679480
- DOI: 10.1177/1471082X17698255
Statistical Contributions to Bioinformatics: Design, Modeling, Structure Learning, and Integration
Abstract
The advent of high-throughput multi-platform genomics technologies providing whole-genome molecular summaries of biological samples has revolutionalized biomedical research. These technologiees yield highly structured big data, whose analysis poses significant quantitative challenges. The field of Bioinformatics has emerged to deal with these challenges, and is comprised of many quantitative and biological scientists working together to effectively process these data and extract the treasure trove of information they contain. Statisticians, with their deep understanding of variability and uncertainty quantification, play a key role in these efforts. In this article, we attempt to summarize some of the key contributions of statisticians to bioinformatics, focusing on four areas: (1) experimental design and reproducibility, (2) preprocessing and feature extraction, (3) unified modeling, and (4) structure learning and integration. In each of these areas, we highlight some key contributions and try to elucidate the key statistical principles underlying these methods and approaches. Our goals are to demonstrate major ways in which statisticians have contributed to bioinformatics, encourage statisticians to get involved early in methods development as new technologies emerge, and to stimulate future methodological work based on the statistical principles elucidated in this article and utilizing all availble information to uncover new biological insights.
Keywords: Bioinformatics; Epigenetics; Experimental Design; Genomics; Preprocessing; Proteomics; Regularization; Reproducible Research; Statistical Modeling.
Figures
Similar articles
-
Rejoinder to statistical contributions to bioinformatics: Design, modelling, structure learning and Integration.Stat Modelling. 2017 Aug;17(4-5):338-357. doi: 10.1177/1471082X17728576. Epub 2017 Sep 12. Stat Modelling. 2017. PMID: 30034293 Free PMC article.
-
Promoting synergistic research and education in genomics and bioinformatics.BMC Genomics. 2008;9 Suppl 1(Suppl 1):I1. doi: 10.1186/1471-2164-9-S1-I1. BMC Genomics. 2008. PMID: 18366597 Free PMC article. Review.
-
Functional genomics and proteomics in the clinical neurosciences: data mining and bioinformatics.Prog Brain Res. 2006;158:83-108. doi: 10.1016/S0079-6123(06)58004-5. Prog Brain Res. 2006. PMID: 17027692 Review.
-
An overview of technologies for MS-based proteomics-centric multi-omics.Expert Rev Proteomics. 2022 Mar;19(3):165-181. doi: 10.1080/14789450.2022.2070476. Epub 2022 May 2. Expert Rev Proteomics. 2022. PMID: 35466851 Free PMC article.
-
Statistical contributions to proteomic research.Methods Mol Biol. 2010;641:143-66. doi: 10.1007/978-1-60761-711-2_9. Methods Mol Biol. 2010. PMID: 20407946 Free PMC article.
Cited by
-
The High-Throughput Analyses Era: Are We Ready for the Data Struggle?High Throughput. 2018 Mar 2;7(1):8. doi: 10.3390/ht7010008. High Throughput. 2018. PMID: 29498666 Free PMC article.
-
Multi-kernel linear mixed model with adaptive lasso for prediction analysis on high-dimensional multi-omics data.Bioinformatics. 2020 Mar 1;36(6):1785-1794. doi: 10.1093/bioinformatics/btz822. Bioinformatics. 2020. PMID: 31693075 Free PMC article.
-
Explainable deep transfer learning model for disease risk prediction using high-dimensional genomic data.PLoS Comput Biol. 2022 Jul 15;18(7):e1010328. doi: 10.1371/journal.pcbi.1010328. eCollection 2022 Jul. PLoS Comput Biol. 2022. PMID: 35839250 Free PMC article.
-
Bayesian Structure Learning in Multi-layered Genomic Networks.J Am Stat Assoc. 2021;116(534):605-618. doi: 10.1080/01621459.2020.1775611. Epub 2020 Jul 24. J Am Stat Assoc. 2021. PMID: 34239216 Free PMC article.
-
A penalized linear mixed model with generalized method of moments for prediction analysis on high-dimensional multi-omics data.Brief Bioinform. 2022 Jul 18;23(4):bbac193. doi: 10.1093/bib/bbac193. Brief Bioinform. 2022. PMID: 35649346 Free PMC article.
References
-
- Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, et al. Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature. 2000;403(6769):503–511. - PubMed
-
- Augustin CK, Yoo JS, Potti A, Yoshimoto Y, Zipfel PA, Friedman HS, Nevens JR, Ali-Osman F, Tyler DS. Genomic and molecular profiling predicts response to temozolomide in melanoma. Clinical Cancer Research. 2009;15:502–510. - PubMed
-
- Baggerly KA, Coombes KR. Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology. The Annals of Applied Statistics. 2010;3(4):1309–1334.
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources