Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Jan 12:2009:869093.
doi: 10.4061/2009/869093.

Data integration in genetics and genomics: methods and challenges

Affiliations

Data integration in genetics and genomics: methods and challenges

Jemila S Hamid et al. Hum Genomics Proteomics. .

Abstract

Due to rapid technological advances, various types of genomic and proteomic data with different sizes, formats, and structures have become available. Among them are gene expression, single nucleotide polymorphism, copy number variation, and protein-protein/gene-gene interactions. Each of these distinct data types provides a different, partly independent and complementary, view of the whole genome. However, understanding functions of genes, proteins, and other aspects of the genome requires more information than provided by each of the datasets. Integrating data from different sources is, therefore, an important part of current research in genomics and proteomics. Data integration also plays important roles in combining clinical, environmental, and demographic data with high-throughput genomic data. Nevertheless, the concept of data integration is not well defined in the literature and it may mean different things to different researchers. In this paper, we first propose a conceptual framework for integrating genetic, genomic, and proteomic data. The framework captures fundamental aspects of data integration and is developed taking the key steps in genetic, genomic, and proteomic data fusion. Secondly, we provide a review of some of the most commonly used current methods and approaches for combining genomic data with focus on the statistical aspects.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Conceptual framework for data integration in genetics and genomics.
Figure 2
Figure 2
An illustrative flowchart for finding disease causing genes by integrating heterogeneous data.

References

    1. Schena M, Shalon D, Davis RW, Brown PO. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science. 1995;270(5235):467–470. - PubMed
    1. Oostlander AE, Meijer GA, Ylstra B. Microarray-based comparative genomic hybridization and its applications in human genetics. Clinical Genetics. 2004;66(6):488–495. - PubMed
    1. Aebersold R, Mann M. Mass spectrometry-based proteomics. Nature. 2003;422(6928):198–207. - PubMed
    1. Daemen A, Gevaert O, De Bie T, et al. Integrating microarray and proteomics data to predict the response on cetuximab in patients with rectal cancer. Pacific Symposium on Biocomputing. 2008;13:166–177. - PubMed
    1. Reif DM, White BC, Moore JH. Integrated analysis of genetic, genomic and proteomic data. Expert Review of Proteomics. 2004;1(1):67–75. - PubMed

LinkOut - more resources