Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2017 Feb 15;18(2):412.
doi: 10.3390/ijms18020412.

Big Data Analytics for Genomic Medicine

Affiliations
Review

Big Data Analytics for Genomic Medicine

Karen Y He et al. Int J Mol Sci. .

Abstract

Genomic medicine attempts to build individualized strategies for diagnostic or therapeutic decision-making by utilizing patients' genomic information. Big Data analytics uncovers hidden patterns, unknown correlations, and other insights through examining large-scale various data sets. While integration and manipulation of diverse genomic data and comprehensive electronic health records (EHRs) on a Big Data infrastructure exhibit challenges, they also provide a feasible opportunity to develop an efficient and effective approach to identify clinically actionable genetic variants for individualized diagnosis and therapy. In this paper, we review the challenges of manipulating large-scale next-generation sequencing (NGS) data and diverse clinical data derived from the EHRs for genomic medicine. We introduce possible solutions for different challenges in manipulating, managing, and analyzing genomic and clinical data to implement genomic medicine. Additionally, we also present a practical Big Data toolset for identifying clinically actionable genetic variants using high-throughput NGS data and EHRs.

Keywords: Big Data analytics; clinically actionable genetic variants; electronic health records; healthcare; next-generation sequencing.

PubMed Disclaimer

Conflict of interest statement

Dongliang Ge and Max M. He are employed and may hold stock of and/or stock options with BioSciKin Co., Ltd. This does not alter our adherence to the journal’s policies. The other authors declare no conflict of interest.

Figures

Figure 1
Figure 1
The SRA database growth in the past eight years.
Figure 2
Figure 2
The approximately files sizes of different NGS data formats and running times of generating those different format files. BWA: Burrows-Wheeler aligner, GATAK: genome analysis toolkit, BAM: the binary version of sequence alignment/map, FASTQ: a text-based format for representing either nucleotide sequences or peptide sequences, VCF: variant call format.
Figure 3
Figure 3
The basic framework of SeqHBase for identifying clinically actionable genetic variants.

References

    1. Collins F.S., Varmus H. A new initiative on precision medicine. N. Engl. J. Med. 2015;372:793–795. doi: 10.1056/NEJMp1500523. - DOI - PMC - PubMed
    1. Carter T.C., He M.M. Challenges of identifying clinically actionable genetic variants for precision medicine. J. Healthc. Eng. 2016;2016 doi: 10.1155/2016/3617572. - DOI - PMC - PubMed
    1. Vassy J.L., Korf B.R., Green R.C. How to know when physicians are ready for genomic medicine. Sci. Transl. Med. 2015;7:287fs219. doi: 10.1126/scitranslmed.aaa2401. - DOI - PMC - PubMed
    1. McKusick V.A. Mendelian Inheritance in Man and its online version, OMIM. Am. J. Hum. Genet. 2007;80:588–604. doi: 10.1086/514346. - DOI - PMC - PubMed
    1. Brunham L.R., Hayden M.R. Hunting human disease genes: Lessons from the past, challenges for the future. Hum. Genet. 2013;132:603–617. doi: 10.1007/s00439-013-1286-3. - DOI - PMC - PubMed

LinkOut - more resources