GSNFS: Gene subnetwork biomarker identification of lung cancer expression data
- PMID: 28117655
- PMCID: PMC5260788
- DOI: 10.1186/s12920-016-0231-4
GSNFS: Gene subnetwork biomarker identification of lung cancer expression data
Abstract
Background: Gene expression has been used to identify disease gene biomarkers, but there are ongoing challenges. Single gene or gene-set biomarkers are inadequate to provide sufficient understanding of complex disease mechanisms and the relationship among those genes. Network-based methods have thus been considered for inferring the interaction within a group of genes to further study the disease mechanism. Recently, the Gene-Network-based Feature Set (GNFS), which is capable of handling case-control and multiclass expression for gene biomarker identification, has been proposed, partly taking into account of network topology. However, its performance relies on a greedy search for building subnetworks and thus requires further improvement. In this work, we establish a new approach named Gene Sub-Network-based Feature Selection (GSNFS) by implementing the GNFS framework with two proposed searching and scoring algorithms, namely gene-set-based (GS) search and parent-node-based (PN) search, to identify subnetworks. An additional dataset is used to validate the results.
Methods: The two proposed searching algorithms of the GSNFS method for subnetwork expansion are concerned with the degree of connectivity and the scoring scheme for building subnetworks and their topology. For each iteration of expansion, the neighbour genes of a current subnetwork, whose expression data improved the overall subnetwork score, is recruited. While the GS search calculated the subnetwork score using an activity score of a current subnetwork and the gene expression values of its neighbours, the PN search uses the expression value of the corresponding parent of each neighbour gene. Four lung cancer expression datasets were used for subnetwork identification. In addition, using pathway data and protein-protein interaction as network data in order to consider the interaction among significant genes were discussed. Classification was performed to compare the performance of the identified gene subnetworks with three subnetwork identification algorithms.
Results: The two searching algorithms resulted in better classification and gene/gene-set agreement compared to the original greedy search of the GNFS method. The identified lung cancer subnetwork using the proposed searching algorithm resulted in an improvement of the cross-dataset validation and an increase in the consistency of findings between two independent datasets. The homogeneity measurement of the datasets was conducted to assess dataset compatibility in cross-dataset validation. The lung cancer dataset with higher homogeneity showed a better result when using the GS search while the dataset with low homogeneity showed a better result when using the PN search. The 10-fold cross-dataset validation on the independent lung cancer datasets showed higher classification performance of the proposed algorithms when compared with the greedy search in the original GNFS method.
Conclusions: The proposed searching algorithms provide a higher number of genes in the subnetwork expansion step than the greedy algorithm. As a result, the performance of the subnetworks identified from the GSNFS method was improved in terms of classification performance and gene/gene-set level agreement depending on the homogeneity of the datasets used in the analysis. Some common genes obtained from the four datasets using different searching algorithms are genes known to play a role in lung cancer. The improvement of classification performance and the gene/gene-set level agreement, and the biological relevance indicated the effectiveness of the GSNFS method for gene subnetwork identification using expression data.
Figures

Similar articles
-
BMRF-MI: integrative identification of protein interaction network by modeling the gene dependency.BMC Genomics. 2015;16 Suppl 7(Suppl 7):S10. doi: 10.1186/1471-2164-16-S7-S10. Epub 2015 Jun 11. BMC Genomics. 2015. PMID: 26099273 Free PMC article.
-
Incorporating topological information for predicting robust cancer subnetwork markers in human protein-protein interaction network.BMC Bioinformatics. 2016 Oct 6;17(Suppl 13):351. doi: 10.1186/s12859-016-1224-1. BMC Bioinformatics. 2016. PMID: 27766944 Free PMC article.
-
GTA: a game theoretic approach to identifying cancer subnetwork markers.Mol Biosyst. 2016 Mar;12(3):818-25. doi: 10.1039/c5mb00684h. Epub 2016 Jan 11. Mol Biosyst. 2016. PMID: 26750920
-
IODNE: An integrated optimization method for identifying the deregulated subnetwork for precision medicine in cancer.CPT Pharmacometrics Syst Pharmacol. 2017 Mar;6(3):168-176. doi: 10.1002/psp4.12167. Epub 2017 Mar 7. CPT Pharmacometrics Syst Pharmacol. 2017. PMID: 28266149 Free PMC article. Review.
-
Detecting disease genes of non-small lung cancer based on consistently differential interactions.Cancer Metastasis Rev. 2015 Jun;34(2):195-208. doi: 10.1007/s10555-015-9561-5. Cancer Metastasis Rev. 2015. PMID: 26004969 Review.
Cited by
-
Identification of Diagnostic and Prognostic Subnetwork Biomarkers for Women with Breast Cancer Using Integrative Genomic and Network-Based Analysis.Int J Mol Sci. 2024 Nov 28;25(23):12779. doi: 10.3390/ijms252312779. Int J Mol Sci. 2024. PMID: 39684488 Free PMC article.
-
Incorporating Pathway Information into Feature Selection towards Better Performed Gene Signatures.Biomed Res Int. 2019 Apr 3;2019:2497509. doi: 10.1155/2019/2497509. eCollection 2019. Biomed Res Int. 2019. PMID: 31073522 Free PMC article. Review.
-
NBIA: a network-based integrative analysis framework - applied to pathway analysis.Sci Rep. 2020 Mar 6;10(1):4188. doi: 10.1038/s41598-020-60981-9. Sci Rep. 2020. PMID: 32144346 Free PMC article.
-
pathfindR: An R Package for Comprehensive Identification of Enriched Pathways in Omics Data Through Active Subnetworks.Front Genet. 2019 Sep 25;10:858. doi: 10.3389/fgene.2019.00858. eCollection 2019. Front Genet. 2019. PMID: 31608109 Free PMC article.
-
Data analysis methods for defining biomarkers from omics data.Anal Bioanal Chem. 2022 Jan;414(1):235-250. doi: 10.1007/s00216-021-03813-7. Epub 2021 Dec 24. Anal Bioanal Chem. 2022. PMID: 34951658 Review.
References
-
- Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102:15545–50. doi: 10.1073/pnas.0506580102. - DOI - PMC - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical
Research Materials