Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Jun;40(3):252-69.
doi: 10.1016/j.jbi.2006.09.001. Epub 2006 Sep 24.

Anatomy of data integration

Affiliations

Anatomy of data integration

Olga Brazhnik et al. J Biomed Inform. 2007 Jun.

Abstract

Producing reliable information is the ultimate goal of data processing. The ocean of data created with the advances of science and technologies calls for integration of data coming from heterogeneous sources that are diverse in their purposes, business rules, underlying models and enabling technologies. Reference models, Semantic Web, standards, ontology, and other technologies enable fast and efficient merging of heterogeneous data, while the reliability of produced information is largely defined by how well the data represent the reality. In this paper, we initiate a framework for assessing the informational value of data that includes data dimensions; aligning data quality with business practices; identifying authoritative sources and integration keys; merging models; uniting updates of varying frequency and overlapping or gapped data sets.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Illustration of main modalities of data integration.
Fig. 2
Fig. 2
Dimensions of clinical data
Fig. 3
Fig. 3
3D patient record
Fig. 4
Fig. 4
Information pipeline
Fig. 5
Fig. 5
A conceptual data model (CDM) defines the main concepts included into and excluded from the study.
Fig. 6
Fig. 6
Meaningful integration is possible only between data sources with overlapping focal DE.
Fig. 7
Fig. 7
The process of identifying authoritative data sources and focal DEs
Fig. 8
Fig. 8
DOB in this example has many invalid entries. Excluding DOB will increase the chance for successful integration.
Fig. 9
Fig. 9
Data gathering agents provide only a part of the data pool. If agents act on servers that receive updates from multiple sites there is often no information on which sites provided and which ones did not provide the updates.
Fig. 10
Fig. 10
While agents provide near-real-time updates, the complete data pool is defined only at the time of the update which brought the pool to its fullness. For example, if flat file updates are received on the first of each month, data pulls are performed on Mondays, and agents deliver data in near real-time. Then on Friday the 27th, a consistent summary can be produced as of the first of the month, in this case 26 days ago, when the data pool was complete.
Fig. 11
Fig. 11
Semantic equivalency should be established between confirmed cases in diverse disease models.

References

    1. A Guide to the Project Management Body of Knowledge. Project Management Institute, Inc.; 2004.
    1. Pietka E. Large-Scale Hospital Information System in clinical practice. International Congress Series. 2003;1256:843.
    1. Giuse DA, Kuhn KA. Health information systems challenges: the Heidelberg conference and the future. International Journal of Medical Informatics. 2003;69(23):105. - PubMed
    1. Combi C, Oliboni B, Rossato R. Merging multimedia presentations and semistructured temporal data: a graph-based model and its application to clinical information. Artificial Intelligence in Medicine. 34(2):89. - PubMed
    1. Grosu A-L, et al. Validation of a method for automatic image fusion (BrainLAB System) of CT data and 11C-methionine-PET data for stereotactic radiotherapy using a LINAC: first clinical experience. International Journal of Radiation Oncology*Biology*Physics. 2003;56(5):1450. - PubMed

LinkOut - more resources