Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2015 Sep 2;22(1):9.
doi: 10.1186/s40709-015-0032-5. eCollection 2015 Dec.

Data integration in biological research: an overview

Affiliations
Review

Data integration in biological research: an overview

Vasileios Lapatas et al. J Biol Res (Thessalon). .

Abstract

Data sharing, integration and annotation are essential to ensure the reproducibility of the analysis and interpretation of the experimental findings. Often these activities are perceived as a role that bioinformaticians and computer scientists have to take with no or little input from the experimental biologist. On the contrary, biological researchers, being the producers and often the end users of such data, have a big role in enabling biological data integration. The quality and usefulness of data integration depend on the existence and adoption of standards, shared formats, and mechanisms that are suitable for biological researchers to submit and annotate the data, so it can be easily searchable, conveniently linked and consequently used for further biological analysis and discovery. Here, we provide background on what is data integration from a computational science point of view, how it has been applied to biological research, which key aspects contributed to its success and future directions.

Keywords: Bioinformatics; Data driven; Data integration; Open sciences; Standards.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Data integration methodologies. This figure illustrates six major types of data integration methodologies in biology
Fig. 2
Fig. 2
Current state. This figure illustrates a simplified view of the current state of biological data and tools
Fig. 3
Fig. 3
Ideal state. This figure illustrates a simplified view of an ideal state of biological data and tools
Fig. 4
Fig. 4
Selected parts of a FASTQ file. In this format declaration lines start with two different characters (“@” and “+”) corresponding to different data types (the raw sequence and the sequence quality values, respectively)
Fig. 5
Fig. 5
Selected parts of the GenBank entry DQ408531. The complete entry can be found at http://www.ncbi. nlm.nih.gov/nuccore/DQ408531
Fig. 6
Fig. 6
Selected parts of the Uniprot entry P01308 in XML format - The complete entry can be found at http://www.uniprot.org/uniprot/P01308.xml
Fig. 7
Fig. 7
Selected parts of a SAM file

References

    1. Stamatoyannopoulos JA, Snyder M, Hardison R, Ren B, Gingeras T, Gilbert DM, et al. An encyclopedia of mouse dna elements (mouse encode) Genome Biol. 2012;13(8):418. doi: 10.1186/gb-2012-13-8-418. - DOI - PMC - PubMed
    1. Gomez-Cabrero D, Abugessaisa I, Maier D, Teschendorff A, Merkenschlager M, Gisel A, et al. Data integration in the era of omics: current and future challenges. BMC Syst Biol. 2014;8(Suppl 2):1. doi: 10.1186/1752-0509-8-S2-I1. - DOI - PMC - PubMed
    1. Ma’ayan A, Rouillard AD, Clark NR, Wang Z, Duan Q, Kou Y. Lean big data integration in systems biology and systems pharmacology. Trends Pharmacol Sci. 2014;35(9):450–60. doi: 10.1016/j.tips.2014.07.001. - DOI - PMC - PubMed
    1. Ritchie MD, Holzinger ER, Li R, Pendergrass SA, Kim D. Methods of integrating data to uncover genotype-phenotype interactions. Nat Rev Genet. 2015;16(2):85–97. doi: 10.1038/nrg3868. - DOI - PubMed
    1. Warde-Farley D, Donaldson SL, Comes O, Zuberi K, Badrawi R, Chao P, et al. The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function. Nucleic Acids Res. 2010;38(Web Server issue):214–20. doi: 10.1093/nar/gkq537. - DOI - PMC - PubMed

LinkOut - more resources