Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2017 Jan;152(1):53-67.e3.
doi: 10.1053/j.gastro.2016.09.065. Epub 2016 Oct 20.

Using Big Data to Discover Diagnostics and Therapeutics for Gastrointestinal and Liver Diseases

Affiliations
Review

Using Big Data to Discover Diagnostics and Therapeutics for Gastrointestinal and Liver Diseases

Benjamin Wooden et al. Gastroenterology. 2017 Jan.

Abstract

Technologies such as genome sequencing, gene expression profiling, proteomic and metabolomic analyses, electronic medical records, and patient-reported health information have produced large amounts of data from various populations, cell types, and disorders (big data). However, these data must be integrated and analyzed if they are to produce models or concepts about physiological function or mechanisms of pathogenesis. Many of these data are available to the public, allowing researchers anywhere to search for markers of specific biological processes or therapeutic targets for specific diseases or patient types. We review recent advances in the fields of computational and systems biology and highlight opportunities for researchers to use big data sets in the fields of gastroenterology and hepatology to complement traditional means of diagnostic and therapeutic discovery.

Keywords: Big Data; Drug Repurposing; Precision Medicine; Translational Bioinformatics.

PubMed Disclaimer

Conflict of interest statement

The authors have no relevant conflicts.

Figures

Figure 1
Figure 1. Big data-driven discovery in gastroenterology and hepatology
Big data-driven discovery may provide new approaches to long-standing or emerging unmet needs in gastrointestinal and liver diseases (left panel). Multi-domain systematically and/or automatically collected data from patients and publicly or privately available databases are integrated into a highly rich and heterogeneous dataset (middle panel). Mining of the assembled big data by specialized methodologies (translational bioinformatics) more efficiently yields diagnostic devices, tools, and/or therapeutics (right panel).
Figure 2
Figure 2. Advantages of the big data-driven approach
In the traditional, biological hypothesis-driven approach for a specific disease (upper panel), candidate biomarkers and therapeutic targets go through lengthy and costly serial preclinical validations. Clinical evaluation is performed without incorporating genetic and environmental variations among enrolled patients, and a therapeutic benefit in a subset of patients can be missed. As a result, successful clinical translation suffers from lower efficiency and higher cost. In contrast, the big data-driven approach (lower panel) incorporates different data types, including both molecular and clinical information, and computationally derives candidate biomarkers and therapeutic targets/drugs without relying on any prior hypotheses. Subsequent preclinical and clinical validation can be simultaneously performed in parallel by incorporating computational cross-species analysis, thereby substantially reducing the required time and costs associated with biomarker/therapeutic development. Candidates may additionally be targeted to a specific niche patient subpopulation, further reducing the likelihood of translational failure.
Figure 3
Figure 3. Big data-driven biomarker discovery
Biomarker candidates may be identified from either analysis of newly-collected samples or in silico analysis of existing data from public and/or private big data repositories (left). Biomarker validation has traditionally been a costly process requiring assay development and prospective clinical evaluation with patients followed according to a strict protocol. By incorporating big data resources, in silico validation of a candidate biomarker can establish its clinical utility in multiple patient cohorts without conducting costly and lengthy prospective clinical trials. Only well-validated biomarkers are advanced to subsequent assay development and clinical evaluation with reduced risk of failure to demonstrate clinical utility (right).
Figure 4
Figure 4. Big data-driven therapeutic discovery
An example of the hypothesis-free, “signature inversion” therapeutic discovery approach for inflammatory bowel disease (IBD) is shown as an example of big data-driven drug discovery (e.g., Dudley et al). A disease signature—a set of genes dysregulated in a coordinated manner in IBD patients—is first identified (left, genes A, B, and C are up-regulated, and genes D, E, and F are down-regulated). With the IBD disease signature, a database of drug perturbation gene signatures is queried to identify compounds that modulate the genes A-F in the opposite direction (i.e., suppress expression of genes A, B, and C, and induce expression of genes D, E, and F), and are thereby expected to antagonize the IBD disease signature. No mechanistic understanding of the associated gene dysregulation is needed for the computational compound identification. Subsequent experimental validation can confirm the predicted therapeutic effect and seek to uncover mechanism(s) of action before proceeding to further preclinical and clinical development (right). Because the screening is performed using data derived from approved drugs with known toxicity profiles, clinical testing can omit phase I and move immediately to phase II.

Similar articles

Cited by

References

    1. Collins F, Green E, Guttmacher A, et al. A vision for the future of genomics research. Nature. 2003;422:835–847. - PubMed
    1. Costa FF. Big data in biomedicine. Drug Discov. Today. 2014;19:433–440. - PubMed
    1. Cook CE, Bergman MT, Finn RD, et al. The European Bioinformatics Institute in 2016: Data growth and integration. Nucleic Acids Res. 2016;44:D20–D26. - PMC - PubMed
    1. Kolesnikov N, Hastings E, Keays M, et al. ArrayExpress update-simplifying data submissions. Nucleic Acids Res. 2015;43:D1113–D1116. - PMC - PubMed
    1. Barrett T, Wilhite SE, Ledoux P, et al. NCBI GEO: Archive for functional genomics data sets - Update. Nucleic Acids Res. 2013;41:D991–D995. - PMC - PubMed

Publication types