MARK-AGE data management: Cleaning, exploration and visualization of data

Jennifer Baur¹, Maria Moreno-Villanueva¹, Tobias Kötter², Thilo Sindlinger¹, Alexander Bürkle³, Michael R Berthold², Michael Junk⁴

Affiliations

¹ Chair for Molecular Toxicology, University of Konstanz, 78457 Konstanz, Germany.
² Chair for Bioinformatics and Information Mining, University of Konstanz, 78457 Konstanz, Germany.
³ Chair for Molecular Toxicology, University of Konstanz, 78457 Konstanz, Germany. Electronic address: alexander.buerkle@uni-konstanz.de.
⁴ Department for Mathematics and Statistics,University of Konstanz, 78457 Konstanz, Germany.

PMID: 26004801
DOI: 10.1016/j.mad.2015.05.007

Free article

MARK-AGE data management: Cleaning, exploration and visualization of data

Jennifer Baur et al. Mech Ageing Dev. 2015 Nov.

Free article

. 2015 Nov:151:38-44.

doi: 10.1016/j.mad.2015.05.007. Epub 2015 May 21.

Authors

Jennifer Baur¹, Maria Moreno-Villanueva¹, Tobias Kötter², Thilo Sindlinger¹, Alexander Bürkle³, Michael R Berthold², Michael Junk⁴

Affiliations

¹ Chair for Molecular Toxicology, University of Konstanz, 78457 Konstanz, Germany.
² Chair for Bioinformatics and Information Mining, University of Konstanz, 78457 Konstanz, Germany.
³ Chair for Molecular Toxicology, University of Konstanz, 78457 Konstanz, Germany. Electronic address: alexander.buerkle@uni-konstanz.de.
⁴ Department for Mathematics and Statistics,University of Konstanz, 78457 Konstanz, Germany.

PMID: 26004801
DOI: 10.1016/j.mad.2015.05.007

Abstract

Databases are an organized collection of data and necessary to investigate a wide spectrum of research questions. For data evaluation analyzers should be aware of possible data quality problems that can compromise results validity. Therefore data cleaning is an essential part of the data management process, which deals with the identification and correction of errors in order to improve data quality. In our cross-sectional study, biomarkers of ageing, analytical, anthropometric and demographic data from about 3000 volunteers have been collected in the MARK-AGE database. Although several preventive strategies were applied before data entry, errors like miscoding, missing values, batch problems etc., could not be avoided completely. Such errors can result in misleading information and affect the validity of the performed data analysis. Here we present an overview of the methods we applied for dealing with errors in the MARK-AGE database. We especially describe our strategies for the detection of missing values, outliers and batch effects and explain how they can be handled to improve data quality. Finally we report about the tools used for data exploration and data sharing between MARK-AGE collaborators.

Keywords: Batch effects; Data cleaning; Data visualization; Missing data; Outliers.

PubMed Disclaimer

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
- Elsevier Science
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

MARK-AGE data management: Cleaning, exploration and visualization of data

Affiliations

MARK-AGE data management: Cleaning, exploration and visualization of data

Authors

Affiliations

Abstract

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Medical