A simple strategy for sample annotation error detection in cytometry datasets
- PMID: 34967113
- DOI: 10.1002/cyto.a.24525
A simple strategy for sample annotation error detection in cytometry datasets
Abstract
Mislabeling samples or data with the wrong participant information can affect study integrity and lead investigators to draw inaccurate conclusions. Quality control to prevent these types of errors is commonly embedded into the analysis of genomic datasets, but a similar identification strategy is not standard for cytometric data. Here, we present a method for detecting sample identification errors in cytometric data using expression of human leukocyte antigen (HLA) class I alleles. We measured HLA-A*02 and HLA-B*07 expression in three longitudinal samples from 41 participants using a 33-marker CyTOF panel designed to identify major immune cell types. 3/123 samples (2.4%) showed HLA allele expression that did not match their longitudinal pairs. Furthermore, these same three samples' cytometric signature did not match qPCR HLA class I allele data, suggesting that they were accurately identified as mismatches. We conclude that this technique is useful for detecting sample-labeling errors in cytometric analyses of longitudinal data. This technique could also be used in conjunction with another method, like GWAS or PCR, to detect errors in cross-sectional data. We suggest widespread adoption of this or similar techniques will improve the quality of clinical studies that utilize cytometry.
Keywords: cytometry; human leukocyte antigen; quality control; reproducible research; sample mix-up; sample swap.
© 2021 International Society for Advancement of Cytometry.
References
REFERENCES
-
- Westra H-J, Jansen RC, Fehrmann RS, Meerman GJ, Heel D, Wijmenga C, et al. MixupMapper: correcting sample mix-ups in genome-wide datasets increases power to detect small genetic effects. Bioinformatics. 2011;27(15):2104-11.
-
- Lohr M, Hellwig B, Edlund K, Mattsson JS, Botling J, Schmidt M, et al. Identification of sample annotation errors in gene expression datasets. Arch Toxicol. 2015;89:2265-72.
-
- Toker L, Feng M, Pavlidis P. F1000Research. 2016;5:2103.
-
- Li J, zu Dohna H, Miller J, Cardona CJ, Carpenter TE. Identifying errors in avian influenza virus gene sequences and implications for data usage of public databases. Genomics. 2010;95:29-36.
-
- Yoo S, Huang T, Campbell JD, Lee E, Tu Z, Geraci MW, et al. MODMatcher: multi-omics data matcher for integrative genomic analysis. PLoS Comput Biol. 2014;10(8):e1003790.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Research Materials
