Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Oct 27;49(11):1560-1563.
doi: 10.1038/ng.3968.

Analysis commons, a team approach to discovery in a big-data environment for genetic epidemiology

Affiliations

Analysis commons, a team approach to discovery in a big-data environment for genetic epidemiology

Jennifer A Brody et al. Nat Genet. .

Abstract

The exploding volume of whole-genome sequence (WGS) and multi-omics data requires new approaches for analysis. As one solution, we have created a cloud-based Analysis Commons, which brings together genotype and phenotype data from multiple studies in a setting that is accessible by multiple investigators. This framework addresses many of the challenges of multi-center WGS analyses, including data sharing mechanisms, phenotype harmonization, integrated multi-omics analyses, annotation, and computational flexibility. In this setting, the computational pipeline facilitates a sequence-to-discovery analysis workflow illustrated here by an analysis of plasma fibrinogen levels in 3996 individuals from the National Heart, Lung, and Blood Institute (NHLBI) Trans-Omics for Precision Medicine (TOPMed) WGS program. The Analysis Commons represents a novel model for transforming WGS resources from a massive quantity of phenotypic and genomic data into knowledge of the determinants of health and disease risk in diverse human populations.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Analysis Commons Design
The Analysis Commons is a cloud-computing environment that combines data from multiple sources and provides analysis access to a wide-range of analysts and developers. Each study uploads phenotype, genotype or other -omics data. Both genetic data and phenotypic data are harmonized and pooled into joint datasets. Analysts can choose from multiple analytic pipelines for association analysis as well as QC, annotation and results visualization. A large number of analysts, from dispersed sites, can access the analytic tools through a web interface or by batch processing through a command line interface. In addition, analysts can run ad hoc analyses, or developers can test and implement new methods by accessing the underlying data resources directly.
Figure 2
Figure 2. Plasma Fibrinogen Association Results
(A) Top single variant association results fall within a region on chromosome 4 containing the fibrinogen subunits. A regional association plotting application computes the linkage disequilibrium with the top signal (diamond) and plots the –log10 p values and genes within a specified window. (B) Rare variants (MAF < 5%) were filtered to those with high CADD phred scores and aggregated into genomic windows covering 50 kb.

References

    1. Psaty BM, et al. Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium: Design of prospective meta-analyses of genome-wide association studies from 5 cohorts. Circ Cardiovasc Genet. 2009;2:73–80. - PMC - PubMed
    1. Morrison AC, et al. Whole-genome sequence-based analysis of high-density lipoprotein cholesterol. Nat Genet. 2013;45:899–901. - PMC - PubMed
    1. Fuchsberger C, et al. The genetic architecture of type 2 diabetes. Nature. 2016;536:41–7. - PMC - PubMed
    1. Sankar PL, Parker LS. The Precision Medicine Initiative's All of Us Research Program: an agenda for research on its ethical, legal, and social issues. Genet Med. 2016 - PubMed
    1. Zheng X, et al. SeqArray-a storage-efficient high-performance data format for WGS variant calls. Bioinformatics. 2017;33:2251–2257. - PMC - PubMed

Publication types