Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan 20;14(2):269.
doi: 10.3390/genes14020269.

GReNaDIne: A Data-Driven Python Library to Infer Gene Regulatory Networks from Gene Expression Data

Affiliations

GReNaDIne: A Data-Driven Python Library to Infer Gene Regulatory Networks from Gene Expression Data

Pauline Schmitt et al. Genes (Basel). .

Abstract

Context: Inferring gene regulatory networks (GRN) from high-throughput gene expression data is a challenging task for which different strategies have been developed. Nevertheless, no ever-winning method exists, and each method has its advantages, intrinsic biases, and application domains. Thus, in order to analyze a dataset, users should be able to test different techniques and choose the most appropriate one. This step can be particularly difficult and time consuming, since most methods' implementations are made available independently, possibly in different programming languages. The implementation of an open-source library containing different inference methods within a common framework is expected to be a valuable toolkit for the systems biology community. Results: In this work, we introduce GReNaDIne (Gene Regulatory Network Data-driven Inference), a Python package that implements 18 machine learning data-driven gene regulatory network inference methods. It also includes eight generalist preprocessing techniques, suitable for both RNA-seq and microarray dataset analysis, as well as four normalization techniques dedicated to RNA-seq. In addition, this package implements the possibility to combine the results of different inference tools to form robust and efficient ensembles. This package has been successfully assessed under the DREAM5 challenge benchmark dataset. The open-source GReNaDIne Python package is made freely available in a dedicated GitLab repository, as well as in the official third-party software repository PyPI Python Package Index. The latest documentation on the GReNaDIne library is also available at Read the Docs, an open-source software documentation hosting platform. Contribution: The GReNaDIne tool represents a technological contribution to the field of systems biology. This package can be used to infer gene regulatory networks from high-throughput gene expression data using different algorithms within the same framework. In order to analyze their datasets, users can apply a battery of preprocessing and postprocessing tools and choose the most adapted inference method from the GReNaDIne library and even combine the output of different methods to obtain more robust results. The results format provided by GReNaDIne is compatible with well-known complementary refinement tools such as PYSCENIC.

Keywords: Python; bioinformatics; ensemble learning; gene expression; gene regulatory network inference; machine learning; systems biology.

PubMed Disclaimer

Conflict of interest statement

All authors declare no competing interests, either financial or nonfinancial.

Figures

Figure 1
Figure 1
The GReNaDIne GRN Inference workflow is organized in three modules: (a) Gene expression preprocessing, including RNA-seq normalization, standardization, and discretization techniques. (b) GRN data-driven inference scoring methods, including techniques based on MI and correlation scores, methods based on regression algorithms, and techniques based on classification algorithms. This second module also incorporates some integration schemes to combine results from different methods to form ensembles. (c) Postprocessing regulatory edges selection tools and GRN evaluation methods. The GRN inference workflow of GreNaDIne simply requires as inputs a gene expression matrix and facultatively a list of regulatory genes (e.g., TFs).
Figure 2
Figure 2
Cluster maps representing the gain in (a) AUROC and (b) AUPR values for each inference methods (rows) without using any preprocessing techniques on each benchmark dataset (columns), with respect to the AUROC or AUPR reference score of the DREAM 5 community approach. The family of each method is reported in colors in the left column: regression in green, classification in red, and correlation/MI in blue. The GReNaDIne inference methods exhibited comparable and even better results than those obtained by the DREAM5 community approach. The inner biases and advantages of each method make it suitable for some particular datasets; indeed, the methods performed differently on each particular dataset, and no ever-winning method existed.
Figure 3
Figure 3
Cluster maps representing the (a) AUROC and (b) AUPR values for each combination of inference methods (rows) and preprocessing technique (columns); notice that column I represents the identity (i.e., no preprocessing technique applied). The family of each method is reported in colors in the left column: regression in green, classification in red, and correlation/MI in blue. Preprocessing techniques that ensure that genes exhibit comparable levels of expression (i.e., row z-score, EFD, and row K-means) led to better performances on average.
Figure 4
Figure 4
(a) AUROC and (b) AUPR scores obtained by ensembles of the inference methods (i.e., BRS•SVM•Ens and BRS•SVM•Ens•Corr), single methods, and DREAM5 community. The ensembles of GReNaDIne that contained BRSr, an SVM-based method, as well as a method based on ensembles of trees or linear regressors (ensemble termed BRSr•SVM•Ens), and also including an extra correlation- or MI-based method (ensemble termed BRSr•SVM•Ens•Corr), revealed to be efficient and robust across different datasets, outperforming single methods as well as the robust DREAM5 community method, with respect to both the AUROC and AUPR scores.
Figure 5
Figure 5
Boxplots representing the average (a) AUROC and (b) AUPR obtained by the SVM•BRS•Ens ensembles, with different integration schemes, on all DREAM5 datasets. The integration schemes are arranged in ascending order based on their average AUROC and AUPR scores. All integration schemes had suitable results, but rank-TF and Z-score-TF tended to exhibited lower results compared to Z-score-TG, rank-full, Z-score-full, and Rank-TG. Therefore, it is suggested to use these latter integration schemes.

Similar articles

Cited by

References

    1. Levine M., Davidson E.H. Gene regulatory networks for development. Proc. Natl. Acad. Sci. USA. 2005;102:4936–4949. doi: 10.1073/pnas.0408031102. - DOI - PMC - PubMed
    1. Shis D.L., Bennett M.R., Igoshin O.A. Dynamics of bacterial gene regulatory networks. Ann. Rev. Biophys. 2018;47:447–467. doi: 10.1146/annurev-biophys-070317-032947. - DOI - PubMed
    1. Chen Y.-C., Desplan C. Gene regulatory networks during the development of the Drosophila visual system. Curr. Top. Dev. Biol. 2020;139:89–125. - PMC - PubMed
    1. Shahbazi M.N. Mechanisms of human embryo development: From cell fate to tissue shape and back. Development. 2020:147. doi: 10.1242/dev.190629. - DOI - PMC - PubMed
    1. Aibar S., González-Blas C.B., Moerman T., Huynh-Thu V.A., Imrichova H., Hulselmans G., Rambow F., Marine J.C., Geurts P., Aerts J., et al. Scenic: Single-cell regulatory network inference and clustering. Nat. Methods. 2017;14:1083. doi: 10.1038/nmeth.4463. - DOI - PMC - PubMed

Publication types

LinkOut - more resources