Review

. 2017 Jan;152(1):53-67.e3.

doi: 10.1053/j.gastro.2016.09.065. Epub 2016 Oct 20.

Using Big Data to Discover Diagnostics and Therapeutics for Gastrointestinal and Liver Diseases

Benjamin Wooden¹, Nicolas Goossens², Yujin Hoshida³, Scott L Friedman¹

Affiliations

¹ Division of Liver Diseases, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, New York.
² Division of Liver Diseases, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, New York; Division of Gastroenterology and Hepatology, Department of Medical Specialties, Geneva University Hospital, Geneva, Switzerland.
³ Division of Liver Diseases, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, New York. Electronic address: yujin.hoshida@mssm.edu.

PMID: 27773806
PMCID: PMC5193106
DOI: 10.1053/j.gastro.2016.09.065

Review

Using Big Data to Discover Diagnostics and Therapeutics for Gastrointestinal and Liver Diseases

Benjamin Wooden et al. Gastroenterology. 2017 Jan.

. 2017 Jan;152(1):53-67.e3.

doi: 10.1053/j.gastro.2016.09.065. Epub 2016 Oct 20.

Authors

Benjamin Wooden¹, Nicolas Goossens², Yujin Hoshida³, Scott L Friedman¹

Affiliations

¹ Division of Liver Diseases, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, New York.
² Division of Liver Diseases, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, New York; Division of Gastroenterology and Hepatology, Department of Medical Specialties, Geneva University Hospital, Geneva, Switzerland.
³ Division of Liver Diseases, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, New York. Electronic address: yujin.hoshida@mssm.edu.

PMID: 27773806
PMCID: PMC5193106
DOI: 10.1053/j.gastro.2016.09.065

Abstract

Technologies such as genome sequencing, gene expression profiling, proteomic and metabolomic analyses, electronic medical records, and patient-reported health information have produced large amounts of data from various populations, cell types, and disorders (big data). However, these data must be integrated and analyzed if they are to produce models or concepts about physiological function or mechanisms of pathogenesis. Many of these data are available to the public, allowing researchers anywhere to search for markers of specific biological processes or therapeutic targets for specific diseases or patient types. We review recent advances in the fields of computational and systems biology and highlight opportunities for researchers to use big data sets in the fields of gastroenterology and hepatology to complement traditional means of diagnostic and therapeutic discovery.

Keywords: Big Data; Drug Repurposing; Precision Medicine; Translational Bioinformatics.

PubMed Disclaimer

Conflict of interest statement

The authors have no relevant conflicts.

Figures

**Figure 1. Big data-driven discovery in gastroenterology and hepatology**
Big data-driven discovery may provide new approaches to long-standing or emerging unmet needs in gastrointestinal and liver diseases (left panel). Multi-domain systematically and/or automatically collected data from patients and publicly or privately available databases are integrated into a highly rich and heterogeneous dataset (middle panel). Mining of the assembled big data by specialized methodologies (translational bioinformatics) more efficiently yields diagnostic devices, tools, and/or therapeutics (right panel).

**Figure 2. Advantages of the big data-driven approach**
In the traditional, biological hypothesis-driven approach for a specific disease (upper panel), candidate biomarkers and therapeutic targets go through lengthy and costly serial preclinical validations. Clinical evaluation is performed without incorporating genetic and environmental variations among enrolled patients, and a therapeutic benefit in a subset of patients can be missed. As a result, successful clinical translation suffers from lower efficiency and higher cost. In contrast, the big data-driven approach (lower panel) incorporates different data types, including both molecular and clinical information, and computationally derives candidate biomarkers and therapeutic targets/drugs without relying on any prior hypotheses. Subsequent preclinical and clinical validation can be simultaneously performed in parallel by incorporating computational cross-species analysis, thereby substantially reducing the required time and costs associated with biomarker/therapeutic development. Candidates may additionally be targeted to a specific niche patient subpopulation, further reducing the likelihood of translational failure.

**Figure 3. Big data-driven biomarker discovery**
Biomarker candidates may be identified from either analysis of newly-collected samples or *in silico* analysis of existing data from public and/or private big data repositories (left). Biomarker validation has traditionally been a costly process requiring assay development and prospective clinical evaluation with patients followed according to a strict protocol. By incorporating big data resources, *in silico* validation of a candidate biomarker can establish its clinical utility in multiple patient cohorts without conducting costly and lengthy prospective clinical trials. Only well-validated biomarkers are advanced to subsequent assay development and clinical evaluation with reduced risk of failure to demonstrate clinical utility (right).

**Figure 4. Big data-driven therapeutic discovery**
An example of the hypothesis-free, “signature inversion” therapeutic discovery approach for inflammatory bowel disease (IBD) is shown as an example of big data-driven drug discovery (e.g., Dudley *et al*). A disease signature—a set of genes dysregulated in a coordinated manner in IBD patients—is first identified (left, genes A, B, and C are up-regulated, and genes D, E, and F are down-regulated). With the IBD disease signature, a database of drug perturbation gene signatures is queried to identify compounds that modulate the genes A-F in the opposite direction (i.e., suppress expression of genes A, B, and C, and induce expression of genes D, E, and F), and are thereby expected to antagonize the IBD disease signature. No mechanistic understanding of the associated gene dysregulation is needed for the computational compound identification. Subsequent experimental validation can confirm the predicted therapeutic effect and seek to uncover mechanism(s) of action before proceeding to further preclinical and clinical development (right). Because the screening is performed using data derived from approved drugs with known toxicity profiles, clinical testing can omit phase I and move immediately to phase II.

See this image and copyright information in PMC

Cited by

A Review of the Role and Challenges of Big Data in Healthcare Informatics and Analytics.
Awrahman BJ, Aziz Fatah C, Hamaamin MY. Awrahman BJ, et al. Comput Intell Neurosci. 2022 Sep 29;2022:5317760. doi: 10.1155/2022/5317760. eCollection 2022. Comput Intell Neurosci. 2022. PMID: 36210978 Free PMC article. Review.
From Reductionistic Approach to Systems Immunology Approach for the Understanding of Tumor Microenvironment.
Koelsch N, Manjili MH. Koelsch N, et al. Int J Mol Sci. 2023 Jul 28;24(15):12086. doi: 10.3390/ijms241512086. Int J Mol Sci. 2023. PMID: 37569461 Free PMC article. Review.
Random gene sets in predicting survival of patients with hepatocellular carcinoma.
Itzel T, Spang R, Maass T, Munker S, Roessler S, Ebert MP, Schlitt HJ, Herr W, Evert M, Teufel A. Itzel T, et al. J Mol Med (Berl). 2019 Jun;97(6):879-888. doi: 10.1007/s00109-019-01764-2. Epub 2019 Apr 17. J Mol Med (Berl). 2019. PMID: 31001651
Innovations in Genomics and Big Data Analytics for Personalized Medicine and Health Care: A Review.
Hassan M, Awan FM, Naz A, deAndrés-Galiana EJ, Alvarez O, Cernea A, Fernández-Brillet L, Fernández-Martínez JL, Kloczkowski A. Hassan M, et al. Int J Mol Sci. 2022 Apr 22;23(9):4645. doi: 10.3390/ijms23094645. Int J Mol Sci. 2022. PMID: 35563034 Free PMC article. Review.
Clinico-histological and molecular features of hepatocellular carcinoma from nonalcoholic fatty liver disease.
Fujiwara N, Nakagawa H. Fujiwara N, et al. Cancer Sci. 2023 Oct;114(10):3825-3833. doi: 10.1111/cas.15925. Epub 2023 Aug 7. Cancer Sci. 2023. PMID: 37545384 Free PMC article. Review.

See all "Cited by" articles

References

1. Collins F, Green E, Guttmacher A, et al. A vision for the future of genomics research. Nature. 2003;422:835–847. - PubMed
1. Costa FF. Big data in biomedicine. Drug Discov. Today. 2014;19:433–440. - PubMed
1. Cook CE, Bergman MT, Finn RD, et al. The European Bioinformatics Institute in 2016: Data growth and integration. Nucleic Acids Res. 2016;44:D20–D26. - PMC - PubMed
1. Kolesnikov N, Hastings E, Keays M, et al. ArrayExpress update-simplifying data submissions. Nucleic Acids Res. 2015;43:D1113–D1116. - PMC - PubMed
1. Barrett T, Wilhite SE, Ledoux P, et al. NCBI GEO: Archive for functional genomics data sets - Update. Nucleic Acids Res. 2013;41:D991–D995. - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Using Big Data to Discover Diagnostics and Therapeutics for Gastrointestinal and Liver Diseases

Affiliations

Using Big Data to Discover Diagnostics and Therapeutics for Gastrointestinal and Liver Diseases

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical