Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017;17(4-5):245-289.
doi: 10.1177/1471082X17698255. Epub 2017 Jun 15.

Statistical Contributions to Bioinformatics: Design, Modeling, Structure Learning, and Integration

Affiliations

Statistical Contributions to Bioinformatics: Design, Modeling, Structure Learning, and Integration

Jeffrey S Morris et al. Stat Modelling. 2017.

Abstract

The advent of high-throughput multi-platform genomics technologies providing whole-genome molecular summaries of biological samples has revolutionalized biomedical research. These technologiees yield highly structured big data, whose analysis poses significant quantitative challenges. The field of Bioinformatics has emerged to deal with these challenges, and is comprised of many quantitative and biological scientists working together to effectively process these data and extract the treasure trove of information they contain. Statisticians, with their deep understanding of variability and uncertainty quantification, play a key role in these efforts. In this article, we attempt to summarize some of the key contributions of statisticians to bioinformatics, focusing on four areas: (1) experimental design and reproducibility, (2) preprocessing and feature extraction, (3) unified modeling, and (4) structure learning and integration. In each of these areas, we highlight some key contributions and try to elucidate the key statistical principles underlying these methods and approaches. Our goals are to demonstrate major ways in which statisticians have contributed to bioinformatics, encourage statisticians to get involved early in methods development as new technologies emerge, and to stimulate future methodological work based on the statistical principles elucidated in this article and utilizing all availble information to uncover new biological insights.

Keywords: Bioinformatics; Epigenetics; Experimental Design; Genomics; Preprocessing; Proteomics; Regularization; Reproducible Research; Statistical Modeling.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Illustration of Types of Multi-platform Genomics Data and Their Interrelationships
Figure 2
Figure 2
Heatmap of Ovarian Cancer Data: Heatmap of mass spectra from 216 samples in Petricoin et al. (2002) run on Ciphergen H4 ProteinChip (top) and Ciphergen WCX2 (bottom).

Similar articles

Cited by

References

    1. Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, et al. Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature. 2000;403(6769):503–511. - PubMed
    1. Alwine JC, Kemp DJ, Stark GR. Method for detection of specific rnas in agarose gels by transfer to diazobenzyloxymethyl-paper and hybridization with dna probes. Proceedings of the National Academy of Sciences. 1977;74(12):5350–5354. - PMC - PubMed
    1. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al. Gene ontology: tool for the unification of biology. Nature Genetics. 2000;25(1):25–29. - PMC - PubMed
    1. Augustin CK, Yoo JS, Potti A, Yoshimoto Y, Zipfel PA, Friedman HS, Nevens JR, Ali-Osman F, Tyler DS. Genomic and molecular profiling predicts response to temozolomide in melanoma. Clinical Cancer Research. 2009;15:502–510. - PubMed
    1. Baggerly KA, Coombes KR. Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology. The Annals of Applied Statistics. 2010;3(4):1309–1334.

LinkOut - more resources