. 2017 Apr 7;12(4):e0172187.

doi: 10.1371/journal.pone.0172187. eCollection 2017.

Combining clinical and genomics queries using i2b2 - Three methods

Shawn N Murphy^{1

2

3}, Paul Avillach^{2

4}, Riccardo Bellazzi^{5

6

7}, Lori Phillips¹, Matteo Gabetta^{5

8}, Alal Eran^{2

4}, Michael T McDuffie^{2

4}, Isaac S Kohane^{2

4}

Affiliations

¹ Research IS and Computing, Partners HealthCare, Charlestown, Massachusetts, United States of America.
² Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, United States of America.
³ Laboratory of Computer Science, Massachusetts General Hospital, Boston, Massachusetts, United States of America.
⁴ Children's Hospital Informatics Program, Boston Children's Hospital, Boston, Massachusetts, United States of America.
⁵ Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy.
⁶ IRCCS Fondazione S. Maugeri, Pavia, Italy.
⁷ Centre for Health Technologies, University of Pavia, Pavia, Italy.
⁸ Biomeris s.r.l, Via Ferrata, Pavia, Italy.

PMID: 28388645
PMCID: PMC5384666
DOI: 10.1371/journal.pone.0172187

Combining clinical and genomics queries using i2b2 - Three methods

Shawn N Murphy et al. PLoS One. 2017.

. 2017 Apr 7;12(4):e0172187.

doi: 10.1371/journal.pone.0172187. eCollection 2017.

Authors

Shawn N Murphy^{1

2

3}, Paul Avillach^{2

4}, Riccardo Bellazzi^{5

6

7}, Lori Phillips¹, Matteo Gabetta^{5

8}, Alal Eran^{2

4}, Michael T McDuffie^{2

4}, Isaac S Kohane^{2

4}

Affiliations

¹ Research IS and Computing, Partners HealthCare, Charlestown, Massachusetts, United States of America.
² Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, United States of America.
³ Laboratory of Computer Science, Massachusetts General Hospital, Boston, Massachusetts, United States of America.
⁴ Children's Hospital Informatics Program, Boston Children's Hospital, Boston, Massachusetts, United States of America.
⁵ Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy.
⁶ IRCCS Fondazione S. Maugeri, Pavia, Italy.
⁷ Centre for Health Technologies, University of Pavia, Pavia, Italy.
⁸ Biomeris s.r.l, Via Ferrata, Pavia, Italy.

PMID: 28388645
PMCID: PMC5384666
DOI: 10.1371/journal.pone.0172187

Abstract

We are fortunate to be living in an era of twin biomedical data surges: a burgeoning representation of human phenotypes in the medical records of our healthcare systems, and high-throughput sequencing making rapid technological advances. The difficulty representing genomic data and its annotations has almost by itself led to the recognition of a biomedical "Big Data" challenge, and the complexity of healthcare data only compounds the problem to the point that coherent representation of both systems on the same platform seems insuperably difficult. We investigated the capability for complex, integrative genomic and clinical queries to be supported in the Informatics for Integrating Biology and the Bedside (i2b2) translational software package. Three different data integration approaches were developed: The first is based on Sequence Ontology, the second is based on the tranSMART engine, and the third on CouchDB. These novel methods for representing and querying complex genomic and clinical data on the i2b2 platform are available today for advancing precision medicine.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

**Fig 1. Overview of the three different approaches.**
1) Using i2b2 by adding patient facts that have concepts coded per the Genome Sequence Ontology, 2) using i2b2/tranSMART by adding patient facts represented by a unique ontology allowing greater variant exploration, 3) using i2b2 by generating a patient set from i2b2 Star Schema database contained phenotypes and then using an alternate NoSQL-NGS variant storage to complete the genomic part of the query.

**Fig 2. Classical i2b2 user interface for use case 1.**
Which individuals with a lower mode of HLA-DQB1 protein levels (i.e., HLA-DQB1 log protein ratio < 0) have missense or nonsense mutations in that gene? The available ontologies are displayed on the left side and the phenotypic and genotypic concepts used to build the query are shown on the right.

**Fig 3**
**Panel A: Designing a query in the i2b2/tranSMART interface using phenotypic and genomic variables**. Use case 1: Which individuals with a lower mode of HLA-DQB1 protein levels (i.e., HLA-DQB1 log protein ratio < 0) have missense or nonsense mutations in that gene? **Panel B: Results of a query in the i2b2/tranSMART interface using phenotypic and genomic variables**. Use case 1: Which individuals with a lower mode of HLA-DQB1 protein levels (i.e., HLA-DQB1 log protein ratio < 0) have missense or nonsense mutations in that gene?

**Fig 4. Display of counts per population in two subgroups in i2b2/tranSMART (use case 3).**

**Fig 5. System components and their inter-relationships.**
The Data annotation/upload process requires the user to provide one or more VCF files that are functionally annotated with ANNOVAR and used to create one JSON document for each variant belonging to a single patient; these JSONs are stored inside CouchDB to be queried by the BigQ-NGS Cell. On the client-side, the BigQ-NGS Plugin allows the user to create a genetic query with drag-and-drop interactions within the i2b2 Webclient; afterwards the plugin communicates with the cell to run the query and collect the results that are shown to the user.

**Fig 6. Screenshot of BigQ-NGS Plugin with user interactions highlighted.**
(1) The user creates a query by dragging and dropping different blocks inside the plugin’s workspace. Each block represents a query on a single attribute that will be performed by the NoSQL-NGS Cell. After the blocks are connected to each other, the query is defined. (2) A patient set, previously created with a standard i2b2 query, is dragged and dropped on the Patient Result Set Drop (PRS Drop) block to define the patients whose exomes will be queried. (3) By double-clicking the standard query blocks (in yellow), it is possible to specify their query logic and query parameters. (4) Afterwards, the query process can start, and each block executes its query sequentially, calling the NoSQL-NGS Cell. (5) When all blocks have performed their query, the user can visualize the results by double-clicking the Patient Result Set Table (PRS Table) block.

See this image and copyright information in PMC

Cited by

Enabling Precision Medicine in Cancer Care Through a Molecular Data Warehouse: The Moffitt Experience.
Eschrich SA, Teer JK, Reisman P, Siegel E, Challa C, Lewis P, Fellows K, Malpica E, Carvajal R, Gonzalez G, Cukras S, Betin-Montes M, Aden-Buie G, Avedon M, Manning D, Tan AC, Fridley BL, Gerke T, Van Looveren M, Blake A, Greenman J, Rollison DE. Eschrich SA, et al. JCO Clin Cancer Inform. 2021 May;5:561-569. doi: 10.1200/CCI.20.00175. JCO Clin Cancer Inform. 2021. PMID: 33989014 Free PMC article.
The Association of Black Cardiologists (ABC) Cardiovascular Implementation Study (CVIS): A Research Registry Integrating Social Determinants to Support Care for Underserved Patients.
Ofili EO, Schanberg LE, Hutchinson B, Sogade F, Fergus I, Duncan P, Hargrove J, Artis A, Onyekwere O, Batchelor W, Williams M, Oduwole A, Onwuanyi A, Ojutalayo F, Cross JA, Seto TB, Okafor H, Pemu P, Immergluck L, Foreman M, Mensah EA, Quarshie A, Mubasher M, Baker A, Ngare A, Dent A, Malouhi M, Tchounwou P, Lee J, Hayes T, Abdelrahim M, Sarpong D, Fernandez-Repollet E, Sodeke SO, Hernandez A, Thomas K, Dennos A, Smith D, Gbadebo D, Ajuluchikwu J, Kong BW, McCollough C, Weiler SR, Natter MD, Mandl KD, Murphy S. Ofili EO, et al. Int J Environ Res Public Health. 2019 May 10;16(9):1631. doi: 10.3390/ijerph16091631. Int J Environ Res Public Health. 2019. PMID: 31083298 Free PMC article.
Research data warehouse best practices: catalyzing national data sharing through informatics innovation.
Murphy SN, Visweswaran S, Becich MJ, Campion TR, Knosp BM, Melton-Meaux GB, Lenert LA. Murphy SN, et al. J Am Med Inform Assoc. 2022 Mar 15;29(4):581-584. doi: 10.1093/jamia/ocac024. J Am Med Inform Assoc. 2022. PMID: 35289371 Free PMC article. No abstract available.
Phenotyping to Facilitate Accrual for a Cardiovascular Intervention.
Wagholikar KB, Fischer CM, Goodson AP, Herrick CD, Maclean TE, Smith KV, Fera L, Gaziano TA, Dunning JR, Bosque-Hamilton J, Matta L, Toscano E, Richter B, Ainsworth L, Oates MF, Aronson S, MacRae CA, Scirica BM, Desai AS, Murphy SN. Wagholikar KB, et al. J Clin Med Res. 2019 Jun;11(6):458-463. doi: 10.14740/jocmr3830. Epub 2019 May 10. J Clin Med Res. 2019. PMID: 31143314 Free PMC article.
Integrating Genomics and Clinical Data for Statistical Analysis by Using GEnome MINIng (GEMINI) and Fast Healthcare Interoperability Resources (FHIR): System Design and Implementation.
Gruendner J, Wolf N, Tögel L, Haller F, Prokosch HU, Christoph J. Gruendner J, et al. J Med Internet Res. 2020 Oct 7;22(10):e19879. doi: 10.2196/19879. J Med Internet Res. 2020. PMID: 33026356 Free PMC article.

See all "Cited by" articles

References

1. Mandl KD, Kohane IS. Federalist principles for healthcare data networks. Nat Biotechnol. 2015;33: 360–363. 10.1038/nbt.3180 - DOI - PMC - PubMed
1. Liao KP, Cai T, Gainer V, Goryachev S, Zeng-treitler Q, Raychaudhuri S, et al. Electronic medical records for discovery research in rheumatoid arthritis. Arthritis Care Res (Hoboken). John Wiley & Sons, Inc; 2010;62: 1120–1127. - PMC - PubMed
1. Kurreeman F.,Liao K., Chibnik L., Hickey B., Stahl E., Gainer V., et al., Genetic basis of autoantibody positive and negative rheumatoid arthritis risk in a multi-ethnic cohort derived from electronic health records. Am J Hum Genet, 2011. 88(1): p. 57–69. 10.1016/j.ajhg.2010.12.007 - DOI - PMC - PubMed
1. Savaiano J. Bring healthcare's dark data to light. In: healthcareitnews.com [Internet]. 30 Jan 2013 [cited 20 Nov 2014]. http://www.healthcareitnews.com/news/bring-healthcares-dark-data-light?s...
1. Stephens Z.D., Lee S.Y., Faghri F., Campbell R.H., Zhai C., Efron M.J., et al. Big Data: Astronomical or Genomical? PLoS Biol, 2015. 13(7): p. e1002195 10.1371/journal.pbio.1002195 - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Combining clinical and genomics queries using i2b2 - Three methods

Affiliations

Combining clinical and genomics queries using i2b2 - Three methods

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources