Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 1;26(3):bbaf227.
doi: 10.1093/bib/bbaf227.

mKmer: an unbiased K-mer embedding of microbiomic single-microbe RNA sequencing data

Affiliations

mKmer: an unbiased K-mer embedding of microbiomic single-microbe RNA sequencing data

Fangyu Mo et al. Brief Bioinform. .

Abstract

The advanced single-microbe RNA sequencing (smRNA-seq) technique addresses the pressing need to understand the complexity and diversity of microbial communities, as well as the distinct microbial states defined by different gene expression profiles. Current analyses of smRNA-seq data heavily rely on the integrity of reference genomes within the queried microbiota. However, establishing a comprehensive collection of microbial reference genomes or gene sets remains a significant challenge for most real-world microbial ecosystems. Here, we developed an unbiased embedding algorithm utilizing K-mer signatures, named mKmer, which bypasses gene or genome alignment to enable species identification for individual microbes and downstream functional enrichment analysis. By substituting gene features in the canonical cell-by-gene matrix with highly conserved K-mers, we demonstrate that mKmer outperforms gene-based methods in clustering and motif inference tasks using benchmark datasets from crop soil and human gut microbiomes. Our method provides a reference genome-free analytical framework for advancing smRNA-seq studies.

Keywords: K-mer; K-motif; HCK; reference genome-free; smRNA-seq.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Overview of the mKmer method. There are seven modules in mKmer for msmRNA-seq data analysis, including read selection, species annotation, selection of K and HCKs, cell-by-HCK matrix construction, and functional analysis. Traditional analysis software, such as UMI-tools, Jellyfish, and Seurat, are also employed in the mKmer pipeline.
Figure 2
Figure 2
Frequency of K-mer depths by different K sizes for msmRNA-seq datasets from soybean soil (A) and human gut (B) samples. The X-axis represents the number of times K-mers are detected, and the Y-axis indicates the number of such K-mers.
Figure 3
Figure 3
K-mer rank plot for HCKs (the X-axis represents the position (x) of K-mers sorted in descending order by detection count, and the Y-axis indicates the average detection count of the top x K-mers) and barcode rank plot (the Y-axis coordinates represent UMI count) calling of the smRNA-seq data. (A) K-mer rank plot of a soybean soil smRNA-seq (K = 13). (B) Barcode rank plot of the soybean soil smRNA-seq. (C) K-mer rank plot of a human gut smRNA-seq (K = 12). (D) Barcode rank plot of the human gut smRNA-seq.
Figure 4
Figure 4
An example of HCKs. A conserved region coding for a motif of the bacterial gene HSP60 (top), and its counts of 12-mers obtained by scanning Staphylococcus, displayed as a line chart to show the distribution of HCKs (bottom). The motif’s names (nuclear/protein) from the MEME database are shown at the top. In the middle, the 12-mer composition of the conserved region (S. warned as reference) is shown. The numbers of each 12-mer are provided at the end of 12-mers.
Figure 5
Figure 5
. Benchmark results of two samples by cell-by-gene matrix (A and C) and mKmer (B and D). UMAP clustering and species annotation of the msmRNA-seq data from the soybean soil (A) and human gut (B) samples. The numbers in parentheses are the number of cells of the strain.
Figure 6
Figure 6
A case study by mKmer. (A, B) UMAP clustering and species annotation of the pretreatment (A) and posttreatment (B) gut msmRNA-seq data of a cancer patient using K-mers. (C) Integrated UMAP clustering of bacterial species Phocaeicola. dorei in the patient’s gut before (left) and after treatment (right) with mKmer. (D) Functional annotation of P. dorei in the gut of treated patients by KmerGOp.

Similar articles

References

    1. Shen Y, Qian Q, Ding L. et al. High-throughput single-microbe RNA sequencing reveals adaptive state heterogeneity and host-phage activity associations in human gut microbiome. Protein Cell 2024;16:211–26. 10.1093/procel/pwae027. - DOI - PMC - PubMed
    1. Macosko EZ, Basu A, Satija R. et al. Highly parallel genome-wide expression profiling of individual cells using Nanoliter droplets. Cell 2015;161:1202–14. 10.1016/j.cell.2015.05.002. - DOI - PMC - PubMed
    1. Dobin A, Davis CA, Schlesinger F. et al. STAR: Ultrafast universal RNA-seq aligner. Bioinformatics 2013;29:15–21. 10.1093/bioinformatics/bts635. - DOI - PMC - PubMed
    1. Albertsen M, Hugenholtz P, Skarshewski A. et al. Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat Biotechnol 2013;31:533–8. 10.1038/nbt.2579. - DOI - PubMed
    1. Rinke C, Schwientek P, Sczyrba A. et al. Insights into the phylogeny and coding potential of microbial dark matter. Nature 2013;499:431–7. 10.1038/nature12352. - DOI - PubMed