Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2010 Jul;11(7):476-86.
doi: 10.1038/nrg2795.

Next-generation genomics: an integrative approach

Affiliations
Review

Next-generation genomics: an integrative approach

R David Hawkins et al. Nat Rev Genet. 2010 Jul.

Abstract

Integrating results from diverse experiments is an essential process in our effort to understand the logic of complex systems, such as development, homeostasis and responses to the environment. With the advent of high-throughput methods--including genome-wide association (GWA) studies, chromatin immunoprecipitation followed by sequencing (ChIP-seq) and RNA sequencing (RNA-seq)--acquisition of genome-scale data has never been easier. Epigenomics, transcriptomics, proteomics and genomics each provide an insightful, and yet one-dimensional, view of genome function; integrative analysis promises a unified, global view. However, the large amount of information and diverse technology platforms pose multiple challenges for data access and processing. This Review discusses emerging issues and strategies related to data integration in the era of next-generation genomics.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Annotating the genome through detecting transcription factor binding sites and histone modification states
Promoters can be mapped by the localization of general transcription machinery and transcription factors (TF) such as RNA polymerase II (Pol II) or TAF1, or by the localization H3K4me3. The bodies of transcribed genes and noncoding RNAs are marked by H3K36me3. Enhancers can be found by distal transcription factor (TF) binding sites or by H3K4me1. This modification often coincides with H3K4me2, which has been shown to be necessary to recruit pioneering transcription factors to enhancer elements. In addition, H3K4me1 sites overlap acetylated histone lysines, in agreement with acetylation islands outside of promoters identifying functional enhancer elements, . Insulators are bound by CTCF. Nucleosomes are shown as cylinders and example histone tails are in grey. Different TFs are shown in different colours. Factors bound to the insulator include CTCF and subunits cohesion.
Figure 2
Figure 2. Identification of regulatory SNPs (rSNPs)
The sequence of a transcription factor (TF) binding site is shown with the position of an A/T polymorphism. By integrating chromatin signatures of enhancers or transcription factor binding sites with SNP data, SNPs falling with the region would be predicted as rSNPs. These could then be correlated to changes in gene expression.
Figure 3
Figure 3. Data Visualization
The UCSC Genome Browser is a tool for viewing genomic datasets. A vast amount of data is available for viewing through this browser. This example from the browser shows numerous data types, in K562 cells, from the ENCODE Consortium. A random gene was selected - KATNAL1 - that illustrates several points that can be identified by using this tool. The promoter has a typical chromatin structure (peak of H3K4me3 between the bimodal peaks of H3K4me1), is bound by Pol II, and is Dnase hypersensitive. The gene is transcribed, as indicated by RNA-Seq data, as well as H3K36me3 localization. The gene lies between two CTCF bound sites that could be tested for insulator activity. An intronic H3K4me1 peak (highlighted) predicts an enhancer element, corroborated by the DHS peak. There is a broad repressive domain of H3K27me3 downstream, which could have an open chromatin structure in another cell type.
Figure 4
Figure 4. Flow chart for data analysis
This example of shows a workflow for ChIP-seq data analysis that can be done by bench scientist using current resources is shown. A similar strategy could be used for other types of NGS data. Blue boxes show steps that can be performed using Galaxy. Integration or cross-sectioning of data can often be done in the UCSC browser or by joining list in Galaxy (Purple box). Downstream steps such known motif analysis and gene ontology (GO) analysis can be achieved with online or stand alone tools (Red boxes). Galaxy can also be used to establish analytical pipelines for calling SNPs that could then be integrated into sequencing-based data such as ChIP-Seq.

References

    1. Licatalosi DD, Darnell RB. RNA processing and its regulation: global insights into biological networks. Nat Rev Genet. 2010;11:75–87. - PMC - PubMed
    1. Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009;10:57–63. - PMC - PubMed
    1. Farnham PJ. Insights from genomic profiling of transcription factors. Nat Rev Genet. 2009;10:605–16. - PMC - PubMed
    1. Park PJ. ChIP-seq: advantages and challenges of a maturing technology. Nat Rev Genet. 2009;10:669–80. - PMC - PubMed
    1. Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010;11:31–46. - PubMed

Publication types

MeSH terms