Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar 4;41(3):btaf114.
doi: 10.1093/bioinformatics/btaf114.

mastR: an R package for automated identification of tissue-specific gene signatures in multi-group differential expression analysis

Affiliations

mastR: an R package for automated identification of tissue-specific gene signatures in multi-group differential expression analysis

Jinjin Chen et al. Bioinformatics. .

Abstract

Motivation: Biomarker discovery is important and offers insight into potential underlying mechanisms of disease. While existing biomarker identification methods primarily focus on single cell RNA sequencing (scRNA-seq) data, there remains a need for automated methods designed for labeled bulk RNA-seq data from sorted cell populations or experiments. Current methods require curation of results or statistical thresholds and may not account for tissue background expression. Here we bridge these limitations with an automated marker identification method for labeled bulk RNA-seq data that explicitly considers background expressions.

Results: We developed mastR, a novel tool for accurate marker identification using transcriptomic data. It leverages robust statistical pipelines like edgeR and limma to perform pairwise comparisons between groups, and aggregates results using rank-product-based permutation test. A signal-to-noise ratio approach is implemented to minimize background signals. We assessed the performance of mastR-derived NK cell signatures against published curated signatures and found that the mastR-derived signature performs as well, if not better than the published signatures. We further demonstrated the utility of mastR on simulated scRNA-seq data and in comparison with Seurat in terms of marker selection performance.

Availability and implementation: mastR is freely available from https://bioconductor.org/packages/release/bioc/html/mastR.html. A vignette and guide are available at https://davislaboratory.github.io/mastR. All statistical analyses were carried out using R (version ≥4.3.0) and Bioconductor (version ≥3.17).

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Schematic of the mastR workflow. The workflow of mastR can be divided into four main sections: (A) build markers pool; (B) identify signature markers; (C) refine signature by filtering based on background expression and (D) visualize and access signature performance. The mastR workflow recommends integrating markers from multiple sources (e.g. PanglaoDB, MSigDB) to form an initial set of markers. mastR then generates a design matrix based on the given “Group” and “Batch” factors to be used during data processing and DE analysis. The data processing includes an edgeR data filtering and normalization pipeline, and a limma-voom-treat based linear modeling DE approach to compare the target group with all other groups. mastR then computes the marker’s RP score based on the ranked product across the DE comparisons and bootstrapped permutation null distribution for further feature selection across multiple comparisons. The selected features will be constrained by their intersection with the initial set of markers. mastR allows for filtering of genes based on the SNR with a background dataset to remove features with inherent expression in a specific context or disease. mastR then provides visualization functions to assess the performance of the signature.

References

    1. Burel JG, Chawla A, Greenbaum JA et al. Distinguishing cell–cell complexes from dual lineage cells using single-cell transcriptomics is not trivial. Cytom Part A 2022;101:547–51. - PMC - PubMed
    1. Franzen O, Gan LM, Bjorkegren JLM. PanglaoDB: a web server for exploration of mouse and human single-cell RNA sequencing data. Database (Oxford) 2019;2019:baz046. - PMC - PubMed
    1. Hao Y, Hao S, Andersen-Nissen E et al. Integrated analysis of multimodal single-cell data. Cell 2021;184:3573–87 e3529. - PMC - PubMed
    1. Heumos L, Schaar AC, Lance C et al.; Single-cell Best Practices Consortium. Best practices for single-cell analysis across modalities. Nat Rev Genet 2023;24:550–72. - PMC - PubMed
    1. Kaur H, Kumar R, Lathwal A et al. Computational resources for identification of cancer biomarkers from omics data. Brief Funct Genomics 2021;20:213–22. - PubMed