Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Sep 26:8:362.
doi: 10.1186/1471-2105-8-362.

Design and implementation of a generalized laboratory data model

Affiliations

Design and implementation of a generalized laboratory data model

Michael C Wendl et al. BMC Bioinformatics. .

Abstract

Background: Investigators in the biological sciences continue to exploit laboratory automation methods and have dramatically increased the rates at which they can generate data. In many environments, the methods themselves also evolve in a rapid and fluid manner. These observations point to the importance of robust information management systems in the modern laboratory. Designing and implementing such systems is non-trivial and it appears that in many cases a database project ultimately proves unserviceable.

Results: We describe a general modeling framework for laboratory data and its implementation as an information management system. The model utilizes several abstraction techniques, focusing especially on the concepts of inheritance and meta-data. Traditional approaches commingle event-oriented data with regular entity data in ad hoc ways. Instead, we define distinct regular entity and event schemas, but fully integrate these via a standardized interface. The design allows straightforward definition of a "processing pipeline" as a sequence of events, obviating the need for separate workflow management systems. A layer above the event-oriented schema integrates events into a workflow by defining "processing directives", which act as automated project managers of items in the system. Directives can be added or modified in an almost trivial fashion, i.e., without the need for schema modification or re-certification of applications. Association between regular entities and events is managed via simple "many-to-many" relationships. We describe the programming interface, as well as techniques for handling input/output, process control, and state transitions.

Conclusion: The implementation described here has served as the Washington University Genome Sequencing Center's primary information system for several years. It handles all transactions underlying a throughput rate of about 9 million sequencing reactions of various kinds per month and has handily weathered a number of major pipeline reconfigurations. The basic data model can be readily adapted to other high-volume processing environments.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Prototypical schemas for laboratory machines: (a) direct model for an instrument, (b) using inheritance to sub-class instrument types, (c) using meta-data. Diagrams show entity type relationships and primary and foreign keys (marked 'PK' and 'FK'. respectively). The "bird's foot" symbols are a standard notation indicating that single instances from one type associate with multiple instances in the other. The "arrow" notation indicates inheritance.
Figure 2
Figure 2
DNA as an abstract type ('dna') having hierarchical sub-types. Relationships are modeled on the basic inheritance concept with meta-data describing the hierarchy. Two sub-types are shown, 'genomic_sample' and 'pcr_product'. Each prescribes additional sub-type-specific attributes.
Figure 3
Figure 3
Possible state transitions for an instance of an event.
Figure 4
Figure 4
Direct modeling schema for medical sequencing projects.
Figure 5
Figure 5
Description of a medical sequencing pipeline. Boxes represent entity instances (objects), while arrow colors represent the following: event flow (black), output from an event (green), input to an event (blue), directives governing an event (red).
Figure 6
Figure 6
Object layout for a medical sequencing pipeline. Concrete entity types (outer-most ring) inherit from five abstract base types (middle ring). The object is in the inner-most ring. Entity types are color-coded: manifestations of DNA (red), directives (magenta), manifestations of sequence data (green), events (black), and lab instruments (blue).
Figure 7
Figure 7
Core layout of the LIMS, showing main abstract entity types.

References

    1. International Human Genome Sequencing Consortium Initial Sequencing and Analysis of the Human Genome. Nature. 2001;409:860–921. - PubMed
    1. Gilbert GN. The Transformation of Research Findings into Scientific Knowledge. Social Studies of Science. 1976;6:281–306.
    1. Fenyö D, Beavis RC. Informatics and Data Management in Proteomics. Trends in Biotechnology. 2002;20:S35–S38. - PubMed
    1. Pevzner PA, Tang H, Waterman MS. An Eulerian Path Approach to DNA Fragment Assembly. Proceedings of the National Academy of Sciences. 2001;98:9748–9753. - PMC - PubMed
    1. Gordon D, Abajian C, Green P. Consed: A Graphical Tool for Sequence Finishing. Genome Research. 1998;8:195–202. - PubMed

Publication types

MeSH terms

LinkOut - more resources