. 2019 Oct;26(10):1113-1129.

doi: 10.1089/cmb.2019.0036. Epub 2019 Apr 22.

Integration of Multiple Data Sources for Gene Network Inference Using Genetic Perturbation Data

Xiao Liang¹, William Chad Young², Ling-Hong Hung³, Adrian E Raftery⁴, Ka Yee Yeung³

Affiliations

¹ Department of Computer Science, Virginia Tech, Blacksburg, Virginia.
² Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, Washington.
³ School of Engineering and Technology, University of Washington, Tacoma, Washington.
⁴ Department of Statistics, University of Washington, Seattle, Washington.

PMID: 31009236
PMCID: PMC6786343
DOI: 10.1089/cmb.2019.0036

Integration of Multiple Data Sources for Gene Network Inference Using Genetic Perturbation Data

Xiao Liang et al. J Comput Biol. 2019 Oct.

. 2019 Oct;26(10):1113-1129.

doi: 10.1089/cmb.2019.0036. Epub 2019 Apr 22.

Authors

Xiao Liang¹, William Chad Young², Ling-Hong Hung³, Adrian E Raftery⁴, Ka Yee Yeung³

Affiliations

¹ Department of Computer Science, Virginia Tech, Blacksburg, Virginia.
² Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, Washington.
³ School of Engineering and Technology, University of Washington, Tacoma, Washington.
⁴ Department of Statistics, University of Washington, Seattle, Washington.

PMID: 31009236
PMCID: PMC6786343
DOI: 10.1089/cmb.2019.0036

Abstract

The inference of gene networks from large-scale human genomic data is challenging due to the difficulty in identifying correct regulators for each gene in a high-dimensional search space. We present a Bayesian approach integrating external data sources with knockdown data from human cell lines to infer gene regulatory networks. In particular, we assemble multiple data sources, including gene expression data, genome-wide binding data, gene ontology, and known pathways, and use a supervised learning framework to compute prior probabilities of regulatory relationships. We show that our integrated method improves the accuracy of inferred gene networks as well as extends some previous Bayesian frameworks both in theory and applications. We apply our method to two different human cell lines, namely skin melanoma cell line A375 and lung cancer cell line A549, to illustrate the capabilities of our method. Our results show that the improvement in performance could vary from cell line to cell line and that we might need to choose different external data sources serving as prior knowledge if we hope to obtain better accuracy for different cell lines.

Keywords: data integration; gene regulation; machine learning; systems biology.

PubMed Disclaimer

Conflict of interest statement

The authors declare there are no competing financial interests.

Figures

<b>FIG. 1.</b> — **FIG. 1.**
An overview of the approach. We first build a supervised framework for a selected set of target–gene regulatory pairs using external knowledge derived from the literature and existing data sets. Then, we apply machine learning methods to predict the regulatory relationships across all target–gene regulatory pairs for the landmark genes in the LINCS L1000 project. The predicted regulatory relationships are used as the prior probabilities in our Bayesian approach to predict the posterior probabilities.

<b>FIG. 2.</b> — **FIG. 2.**
Histograms of the expected number of regulators per target gene predicted using knockdown data in cell line A375. **(A)** Shows the histogram of the expected number of regulators per target gene without the sampling bias correction. **(B)** Shows the histogram of the expected number of regulators per target gene after applying the sampling bias correction to the prior.

<b>FIG. 3.</b> — **FIG. 3.**
Histograms of the expected number of regulators per target gene predicted using knockdown data in cell line A549. **(A)** Shows the histogram of the expected number of regulators per target gene without the sampling bias correction. **(B)** Shows the histogram of the expected number of regulators per target gene after applying the sampling bias correction to the prior.

<b>FIG. 4.</b> — **FIG. 4.**
Precision–recall curves for cell line A549 using different data assessed with TRANSFAC and JASPAR. The results are improved by external data integration with or without MCDC. MCDC, model-based clustering with data correction.

<b>FIG. 5.</b> — **FIG. 5.**
Inferred directed edges at a posterior probability cutoff of 0.5 from the gene network generated by integrating external data with the knockdown data and MCDC-corrected untreated data. Each node represents a gene and each edge represents a regulatory interaction between the two genes. The width of each edge is in proportion to the inferred posterior probability that the regulatory relationship exists for the corresponding gene pair.

<b>FIG. 6.</b> — **FIG. 6.**
True positive edges at a posterior probability cutoff of 0.5 from the gene network generated by integrating external data with the knockdown data and MCDC-corrected untreated data. These true positive edges represent the edges from Figure 5 that are also found in our assessment criteria. Each node represents a gene and each edge represents a regulatory interaction between the two genes. The width of each edge is in proportion to the inferred posterior probability that the regulatory relationship exists for the corresponding gene pair.

<b>FIG. 7.</b> — **FIG. 7.**
Precision–recall curves for cell line A375 using different data assessed with TRANSFAC and JASPAR. The results are improved by external data integration with or without MCDC.

See this image and copyright information in PMC

Cited by

Drug target inference by mining transcriptional data using a novel graph convolutional network framework.
Zhong F, Wu X, Yang R, Li X, Wang D, Fu Z, Liu X, Wan X, Yang T, Fan Z, Zhang Y, Luo X, Chen K, Zhang S, Jiang H, Zheng M. Zhong F, et al. Protein Cell. 2022 Apr;13(4):281-301. doi: 10.1007/s13238-021-00885-0. Epub 2021 Oct 22. Protein Cell. 2022. PMID: 34677780 Free PMC article.
Network inference in systems biology: recent developments, challenges, and applications.
Saint-Antoine MM, Singh A. Saint-Antoine MM, et al. Curr Opin Biotechnol. 2020 Jun;63:89-98. doi: 10.1016/j.copbio.2019.12.002. Epub 2020 Jan 9. Curr Opin Biotechnol. 2020. PMID: 31927423 Free PMC article. Review.
Deep learning-based multimodal spatial transcriptomics analysis for cancer.
Rajdeo P, Aronow B, Surya Prasath VB. Rajdeo P, et al. Adv Cancer Res. 2024;163:1-38. doi: 10.1016/bs.acr.2024.08.001. Epub 2024 Aug 22. Adv Cancer Res. 2024. PMID: 39271260 Free PMC article. Review.

References

1. ada package 2016. Available at: cran.r-project.org/package=ada Accessed February28, 2017
1. Ashburner M., Ball C.A., Blake J.A., et al. . 2000. Gene ontology: Tool for the unification of biology. Nat. Genet. 25, 25–29 - PMC - PubMed
1. Banfield J.D., and Raftery A.E. 1993. Model-based gaussian and non-gaussian clustering. Biometrics 49, 803–821
1. Bansal M., Della Gatta G., and Di Bernardo D. 2006. Inference of gene regulatory networks and compound mode of action from time course gene expression profiles. Bioinformatics 22, 815–822 - PubMed
1. Barretina J., Caponigro G., Stransky N., et al. . 2012. The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Integration of Multiple Data Sources for Gene Network Inference Using Genetic Perturbation Data

Affiliations

Integration of Multiple Data Sources for Gene Network Inference Using Genetic Perturbation Data

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources