Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Aug 1;25(1):205.
doi: 10.1186/s13059-024-03349-w.

MAMS: matrix and analysis metadata standards to facilitate harmonization and reproducibility of single-cell data

Affiliations

MAMS: matrix and analysis metadata standards to facilitate harmonization and reproducibility of single-cell data

Irzam Sarfraz et al. Genome Biol. .

Abstract

Many datasets are being produced by consortia that seek to characterize healthy and disease tissues at single-cell resolution. While biospecimen and experimental information is often captured, detailed metadata standards related to data matrices and analysis workflows are currently lacking. To address this, we develop the matrix and analysis metadata standards (MAMS) to serve as a resource for data centers, repositories, and tool developers. We define metadata fields for matrices and parameters commonly utilized in analytical workflows and developed the rmams package to extract MAMS from single-cell objects. Overall, MAMS promotes the harmonization, integration, and reproducibility of single-cell data across platforms.

PubMed Disclaimer

Conflict of interest statement

A.S. is an employee of Flagship Labs 84, Inc., which is a subsidiary of Flagship Pioneering.

Figures

Fig. 1
Fig. 1
Overview of matrix classes included in MAMS. Feature and observation matrices (FOMs) contain biological data at different stages of processing including reduced dimensional representations. Feature annotation matrices (FEA) and observation annotation matrices (OBS) store annotations such as additional IDs or labels, quality control metrics, and cluster labels. The observation neighborhood graph (ONG) and feature neighborhood graph (FNG) classes store information related to the correlation, similarity, or distance between pairs of observations or features, respectively. The observation ID (OID) and feature ID classes are used to store unique identifiers for individual observations and features, respectively. The record (REC) class is a special set of fields for storing information related to data and tool provenance
Fig. 2
Fig. 2
Matrices produced during a simple analysis workflow for single-cell RNA-seq data. Several steps are often performed in analysis workflows for scRNA-seq data generated with high-throughput devices. The observations are filtered to exclude empty droplets and poor-quality cells. Quality control metrics can be stored in an OBS annotation data frame. Preprocessing of the data matrix includes steps for normalization and standardization of features (e.g., z-scoring). From the scaled data, a subset of highly variable genes is used as input into principal component analysis (PCA). The reduced dimensional space of the PCA is used as input into 2D embedding tools such as tSNE and UMAP as well as clustering algorithms such as k-means and Leiden
Fig. 3
Fig. 3
Example of MAMS list format. As the ability to implement and store matrix and analysis related metadata is variable across software platforms and data objects, we created a simple list-like structure to capture relevant MAMS fields for each matrix. This structure can be stored in configuration file formats like JSON and YAML or in general metadata or unstructured slots within data objects. Each dataset will have its own entry within the list and each class of matrix has an entry within the list for each dataset. Each matrix is denoted with a unique ID and MAMS fields are denoted with key-value pairs under each matrix. The additional fields specified within this implementation including filepath and accessor can be used to point to matrices stored in any flat file format or within a data object

Update of

References

    1. Regev A, et al. The human cell atlas. Elife. 2017;6:71.10.7554/eLife.27041 - DOI - PMC - PubMed
    1. HuBMAP Consortium. The human body at cellular resolution: the NIH human biomolecular Atlas program. Nature. 2019;574:187–92. 10.1038/s41586-019-1629-x - DOI - PMC - PubMed
    1. Rozenblatt-Rosen O, et al. The Human Tumor Atlas Network: charting tumor transitions across space and time at single-cell resolution. Cell. 2020;181:236–49. 10.1016/j.cell.2020.03.053 - DOI - PMC - PubMed
    1. Li H, et al. Fly Cell Atlas: a single-nucleus transcriptomic atlas of the adult fruit fly. Science. 2022;375:eabk2432. 10.1126/science.abk2432 - DOI - PMC - PubMed
    1. Plant Cell Atlas Consortium, et al. Vision, challenges and opportunities for a Plant Cell Atlas. Elife. 2021;10:e66877. 10.7554/eLife.66877 - DOI - PMC - PubMed