Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 8;21(5):e1012476.
doi: 10.1371/journal.pcbi.1012476. eCollection 2025 May.

Optimal transport reveals dynamic gene regulatory networks via gene velocity estimation

Affiliations

Optimal transport reveals dynamic gene regulatory networks via gene velocity estimation

Wenjun Zhao et al. PLoS Comput Biol. .

Abstract

Inferring gene regulatory networks from gene expression data is an important and challenging problem in the biology community. We propose OTVelo, a methodology that takes time-stamped single-cell gene expression data as input and predicts gene regulation across two time points. It is known that the rate of change of gene expression, which we will refer to as gene velocity, provides crucial information that enhances such inference; however, this information is not always available due to the limitations in sequencing depth. Our algorithm overcomes this limitation by estimating gene velocities using optimal transport. We then infer gene regulation using time-lagged correlation and Granger causality via regularized linear regression. Instead of providing an aggregated network across all time points, our method uncovers the underlying dynamical mechanism across time points. We validate our algorithm on 13 simulated datasets with both synthetic and curated networks and demonstrate its efficacy on 9 experimental data sets.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Overview of our pipeline:
(A) The input of our algorithm are time-stamped single-cell gene expression data. At each time point, a different cell population is sampled and their gene expression level measured. (B) We use optimal transport (OT) to predict past and future states (represented by the expression levels of all genes) of each cell. (C) For each cell, we approximate the gene velocity (the rate of change of expression levels) of each gene using the finite difference of predicted past and future states. (D) To infer gene regulation, we use either time-lagged correlation analysis or Granger causality of gene velocities separately for each pair of adjacent time points. (E) Since the analysis is run independently for each time interval, we obtain temporal GRNs, which are represented by graphs that evolve over time. (F) These graphs can be consolidated into a single global gene regulatory network.
Fig 2
Fig 2. Comparison between optimal transport-based cell velocities (left) and RNA velocities (right) computed from scVelo [18] and projected onto 2D UMAP coordinates.
To create data points for optimal transport, we use pseudotime computed from scVelo (indicated by color) and divide it uniformly according to quantiles.
Fig 3
Fig 3. Results on simulated data from Harissa [11].
First row: Networks used for subsequent tests, where each network starts with a stimulus, with blue edges indicating activation and red edges indicating inhibition, reproduced according to Ventre et al. [12] after excluding self-loops. Second row: first two principal components of cells and velocity field inferred by our methodology. The third and fourth rows show performance of 6 algorithms, measured by AUPRC and AUPRC on signed prediction. Note that GENIE3 and GRNBoost2 cannot predict the sign of regulation and are removed from the fourth row. The gray dashed line indicates the baseline that correspond to a random classifier.
Fig 4
Fig 4. AUPRC for predicting directed edges for the two OTVelo methods, profiled on 10-gene tree-structure networks with different data collection parameters.
(a) Performance as a function of the number of cells per time point, while keeping the same time points. (b) Performance as a function of the length of the measurement period, while keeping the same gap between time points and the same total number of cells. (c) Performance as a function of the gap between time points, while keeping the same final time point and the same total number of cells.
Fig 5
Fig 5. Results on curated networks based on simulation from BoolODE as used in BEELINE [16].
The top row shows the four curated networks that correspond to actual biological processes (red edges indicate inhibition, and blue edges indicate activation), reproduced from BEELINE [16] after removing self-loops. The second row shows the velocity field after projection onto the first two PCA modes, and the third and fourth row contain the AUPRC ratios for, respectively, unsigned and signed inference; note that GENIE3 and GRNBoost2 are removed from the signed AUPRC panel due to their inability to predict the type of regulation. The dashed gray line is the baseline that corresponds to a random classifier.
Fig 6
Fig 6. Results on curated network models with different levels of dropouts, quantified by AUPRC on unsigned (top row) and signed (bottom row) predictions.
Each method is represented by three box plots that correspond (from left to right) to 0%, 50%, and 70% dropout rates.
Fig 7
Fig 7. Performance of our algorithm for the “Split" and “Combined" setups described in the main text as quantified via unsigned (top row) and signed (bottom row) AUPRC ratios on curated datasets with 0%, 50%, and 70% dropout rates (from left to right boxplot).
Fig 8
Fig 8. Performance of OTVelo on unbalanced datasets with uniform and reweighed marginals.
Top: velocity field for HSC with uniform weights (left) and reweighed marginals (right). Bottom: AUPRC ratios for uniform weights and reweighed marginals assigned based on the proportion of cells in each branch.
Fig 9
Fig 9. Early precision ratio of five real datasets compared against ground truth based on cell-type specific ChIP-seq, using either 500 or 1000 top variable genes, combined with significantly varying TFs.
Since mHSC includes three branches, we perform the task on each branch separately as in BEELINE and plot them on the same graph in the first column. HARISSA and CARDAMOM were not included in the second row due to prohibitive computational time.
Fig 10
Fig 10. Breakdown of the GRNs inferred by OTVelo-Granger from the scRNA-seq scGEM dataset [34], where only edges with weights in the top 1% in each time interval are visualized.
Fig 11
Fig 11. ROC (panel A) and precision-recall curve (panel B) for the mouse ERS dataset [36].
OTVelo-Granger has the best AUPRC. Both OTVelo methods have accurate prediction for 5-10 edges that are assigned largest weights, indicated by the short plateau in the upper left corner of panel (B).
Fig 12
Fig 12. Dynamic GRNs inferred from OTVelo-Granger for the mouse ERS dataset [36].
Fig 13
Fig 13. Results for the drosophila neuroectoderm dataset [37].
(A) First two UMAP coordinates of single-cell data along with the field learned via optimal transport. (B) Histogram of the in-out degree ratio as defined in (9). (C) First two UMAP coordinates colored according to annotated cell types. (D) Degree ratio of four genes and their velocity profile over the cell spaces: cas and sv are enriched in neural progenitors and sensory progenitors that primarily exist in earlier time points, resulting in a ratio of 0.84<1; cac and Syt1 are two markers that are enriched when neurons mature, so they are assigned ratios greater than one, as indicated in their captions.
Fig 14
Fig 14. In-out degree as defined in (9) after thresholding the correlation matrix.
Fig 15
Fig 15. Computational time for each algorithm as functions of (1) number of genes, with a fixed number of 1200 cells per time point (left); or (2) number of cells per time point, with a fixed number of 100 genes.

Update of

Similar articles

Cited by

References

    1. Liu S, Trapnell C. Single-cell transcriptome sequencing: recent advances and remaining challenges. F1000Res. 2016;5:F1000 Faculty Rev-182. doi: 10.12688/f1000research.7223.1 - DOI - PMC - PubMed
    1. Street K, Risso D, Fletcher RB, Das D, Ngai J, Yosef N, et al.. Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genomics. 2018;19(1):477. doi: 10.1186/s12864-018-4772-0 - DOI - PMC - PubMed
    1. Huynh-Thu VA, Irrthum A, Wehenkel L, Geurts P. Inferring regulatory networks from expression data using tree-based methods. PLoS One. 2010;5(9):e12776. doi: 10.1371/journal.pone.0012776 - DOI - PMC - PubMed
    1. Aibar S, González-Blas CB, Moerman T, Huynh-Thu VA, Imrichova H, Hulselmans G, et al.. SCENIC: single-cell regulatory network inference and clustering. Nat Methods. 2017;14(11):1083–6. doi: 10.1038/nmeth.4463 - DOI - PMC - PubMed
    1. Specht AT, Li J. LEAP: constructing gene co-expression networks for single-cell RNA-sequencing data using pseudotime ordering. Bioinformatics. 2017;33(5):764–6. doi: 10.1093/bioinformatics/btw729 - DOI - PMC - PubMed

LinkOut - more resources