A Review of Microarray Datasets: Where to Find Them and Specific Characteristics
- PMID: 31115885
- DOI: 10.1007/978-1-4939-9442-7_4
A Review of Microarray Datasets: Where to Find Them and Specific Characteristics
Abstract
The advent of DNA microarray datasets has stimulated a new line of research both in bioinformatics and in machine learning. This type of data is used to collect information from tissue and cell samples regarding gene expression differences that could be useful for disease diagnosis or for distinguishing specific types of tumor. Microarray data classification is a difficult challenge for machine learning researchers due to its high number of features and the small sample sizes. This chapter is devoted to reviewing the microarray databases most frequently used in the literature. We also make the interested reader aware of the problematic of data characteristics in this domain, such as the imbalance of the data, their complexity, and the so-called dataset shift.
Keywords: Dataset shift; High dimensionality; Microarray data; Unbalanced data.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources