. 2015 Jun 15;31(12):i311-9.

doi: 10.1093/bioinformatics/btv255.

FERAL: network-based classifier with application to breast cancer outcome prediction

Amin Allahyar¹, Jeroen de Ridder¹

Affiliations

PMID: 26072498
PMCID: PMC4765883
DOI: 10.1093/bioinformatics/btv255

FERAL: network-based classifier with application to breast cancer outcome prediction

Amin Allahyar et al. Bioinformatics. 2015.

. 2015 Jun 15;31(12):i311-9.

doi: 10.1093/bioinformatics/btv255.

Authors

Amin Allahyar¹, Jeroen de Ridder¹

Affiliation

¹ Delft Bioinformatics Lab, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Delft, The Netherlands.

PMID: 26072498
PMCID: PMC4765883
DOI: 10.1093/bioinformatics/btv255

Abstract

Motivation: Breast cancer outcome prediction based on gene expression profiles is an important strategy for personalize patient care. To improve performance and consistency of discovered markers of the initial molecular classifiers, network-based outcome prediction methods (NOPs) have been proposed. In spite of the initial claims, recent studies revealed that neither performance nor consistency can be improved using these methods. NOPs typically rely on the construction of meta-genes by averaging the expression of several genes connected in a network that encodes protein interactions or pathway information. In this article, we expose several fundamental issues in NOPs that impede on the prediction power, consistency of discovered markers and obscures biological interpretation.

Results: To overcome these issues, we propose FERAL, a network-based classifier that hinges upon the Sparse Group Lasso which performs simultaneous selection of marker genes and training of the prediction model. An important feature of FERAL, and a significant departure from existing NOPs, is that it uses multiple operators to summarize genes into meta-genes. This gives the classifier the opportunity to select the most relevant meta-gene for each gene set. Extensive evaluation revealed that the discovered markers are markedly more stable across independent datasets. Moreover, interpretation of the marker genes detected by FERAL reveals valuable mechanistic insight into the etiology of breast cancer.

Availability and implementation: All code is available for download at: http://homepage.tudelft.nl/53a60/resources/FERAL/FERAL.zip.

PubMed Disclaimer

Figures

**Fig. 1.**
Overview of the proposed model (FERAL). (a) Current models follow a similar path in which several nearby genes (according to a given network) are selected and then integrated using an average operator resulting in a meta-gene. These meta-genes are then ranked based on a pre-defined scoring function and top candidates are presented to the final classifier. (b) Instead of being limited to average-based meta-genes, FERAL computes several meta-genes using different operators and employs the SGL to select the most appropriate meta-gene for each specific gene set while simultaneously performing selection, integration and classification

**Fig. 2.**
Evaluation of different integration operators. (a) Visualization of the consistency in the direction of association with the target label for connected gene pairs in the I2D network. The x-axis represents the magnitude of difference, defined as $abs (C_{a} - C_{b}) \times Sgn (C_{a} \times C_{b})$ , where *C_x* denotes the correlation between gene x and the target label and $Sgn$ is sign function. The y-axis is the correlation between two genes (see Supplementary Section S3 for details). (b) Performance comparison between 11 operators including (from left to right): average, average of differences between seed gene and its interactors (implemented in Taylor), variance, minimum, maximum, median, regression, lasso, DA2, Decision Tree (DT) and support vector machine with an RBF kernel. To generate each violin plot, 5000 randomly selected seed genes and their 9 closest neighbors according to the I2D network were integrated into a meta-gene using one of the operators, and the predictive performance (AUC) is determined. The y-axis represents the improvement log ratio of the AUC obtained with the meta-gene with the highest AUC of the individual genes. This comparison shows that other operators are able to provide similar or even better performance compared with average operator. Interestingly, adjusting the direction of genes before taking the average can improve the performance considerably

**Fig. 3.**
**Schematic of the training and testing procedures of FERAL.** (a) In the first step, 10 genes are selected using given network. (b) Corresponding genes in expression dataset are selected and normalized using z-score. (c) Meta-genes are computed using the expression profiles of the gene set and target label (in case of a supervised integration). The expression of the individual genes is retained within the gene set. (d) The SGL is trained using training samples. (e) Test samples are used to assess the prediction performance (in terms of AUC) in the current fold

**Fig. 4.**
**Performance evaluation (AUC).** Performance of the methods under study for the PPI network (I2D), a co-expression network (Co-Expr) and a random network (Random). We also added the result when a classical Lasso is employed (Single). Error bars denote the 95% confidence interval. The heatmaps indicate the P value of the paired t-test between pairwise comparison of the AUCs of the individual CV folds. (a) Sub-type stratified CV. (b) Sampled leave-one-study-out CV

**Fig. 5.**
Stability measurement (using Fisher’s exact test) for three different networks including I2D, Co-Expr and random network. The original version of the standard methods produced a much a lower overlap between folds due to pre-ranking of meta-genes. Similarly, Lasso produced a low overlap due to random selection of correlated features. FERAL obtained a higher gene set stability across folds for the I2D and Co-Expr network

**Fig. 6.**
**Gene enrichment**. (a) Gene enrichment of top genes for each method when the I2D network is employed. The values on top of each group represent the number of genes in each gene set. A notably increased enrichment is obtained using the gene sets produced by FERAL. (b) Result of top 15 gene enrichments by BiNGO applied to top 400 genes provided by FERAL

**Fig. 7.**
**Frequently identified gene sets by FERAL.** The bars represent the median coefficient across folds, normalized to the range ${- 1, 1}$ . Background colors indicate the correlation with target label ranging from positive (blue) to negative (red)

See this image and copyright information in PMC

References

1. Albert R. (2005) Scale-free networks in cell biology. J. Cell Sci. , 118, 4947–4957. - PubMed
1. Babaei S., et al. (2011) Integrating protein family sequence similarities with gene expression to find signature gene networks in breast cancer metastasis. In: Loog M., et al. (eds), 6th IAPR International Conference, Pattern Recognition in Bioinformatics (PRIB). Springer-Verlag Berlin Heidelberg, Delft, The Netherlands, pp. 247–259.
1. Chen G., et al. (2002) Evaluation and comparison of clustering algorithms in analyzing ES cell gene expression data. Stat. Sin. , 12, 241–262.
1. Cheng W., et al. (2014) Graph-regularized dual lasso for robust eqtl mapping. Bioinformatics , 30, i139–i148. - PMC - PubMed
1. Chuang H.-Y., et al. (2007) Network-based classification of breast cancer metastasis. Mol. Syst. Biol. , 3, 140. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

FERAL: network-based classifier with application to breast cancer outcome prediction

Affiliation

FERAL: network-based classifier with application to breast cancer outcome prediction

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical