Robust rank aggregation for gene list integration and meta-analysis

Raivo Kolde¹, Sven Laur, Priit Adler, Jaak Vilo

Affiliations

PMID: 22247279
PMCID: PMC3278763
DOI: 10.1093/bioinformatics/btr709

Robust rank aggregation for gene list integration and meta-analysis

Raivo Kolde et al. Bioinformatics. 2012.

. 2012 Feb 15;28(4):573-80.

doi: 10.1093/bioinformatics/btr709. Epub 2012 Jan 12.

Authors

Raivo Kolde¹, Sven Laur, Priit Adler, Jaak Vilo

Affiliation

¹ Institute of Computer Science, University of Tartu, Liivi 2- 314, 50409 Tartu, Estonia.

PMID: 22247279
PMCID: PMC3278763
DOI: 10.1093/bioinformatics/btr709

Abstract

Motivation: The continued progress in developing technological platforms, availability of many published experimental datasets, as well as different statistical methods to analyze those data have allowed approaching the same research question using various methods simultaneously. To get the best out of all these alternatives, we need to integrate their results in an unbiased manner. Prioritized gene lists are a common result presentation method in genomic data analysis applications. Thus, the rank aggregation methods can become a useful and general solution for the integration task.

Results: Standard rank aggregation methods are often ill-suited for biological settings where the gene lists are inherently noisy. As a remedy, we propose a novel robust rank aggregation (RRA) method. Our method detects genes that are ranked consistently better than expected under null hypothesis of uncorrelated inputs and assigns a significance score for each gene. The underlying probabilistic model makes the algorithm parameter free and robust to outliers, noise and errors. Significance scores also provide a rigorous way to keep only the statistically relevant genes in the final list. These properties make our approach robust and compelling for many settings.

Availability: All the methods are implemented as a GNU R package RobustRankAggreg, freely available at the Comprehensive R Archive Network http://cran.r-project.org/.

PubMed Disclaimer

Figures

**Fig. 1.**
Visual description of RRA. (A) Shows an example of 20 ranked lists, with the positions of two genes highlighted. The first gene is placed to the top of the lists and the second distributed uniformly. (B) Shows in detail how β_k,n scores change and how the ρ score is found.

**Fig. 2.**
Results from a simulation study. (A) Shows significance scores calculated with different methods on the 10 lists, which contained 50 planted elements. The number of true and false positives was computed on FDR level of 0.05. Both methods based on order statistics (RRA and Stuart) separate planted elements from noise better than average rank. Still, the Stuart method produces many false positives and thus cannot be used for deciding the significance of genes. (B) Shows ROC curves of different methods on noisy data (10 lists with signal, 30 random). Methods based on order statistics outperform the average rank considerably. (C) Shows the number of true positives given at different levels of noise. At each level, we simulated 10 datasets. RRA shows much higher resistance to noise than an average rank. The Stuart method was excluded from (C) as it failed to identify planted elements from noise.

**Fig. 3.**
The proportion of planted elements that were correctly identified by RRA given different numbers of top elements available in input rankings. The gray line shows the proportion of planted elements in the inputs. We can see that the number of correctly identified elements starts to drop only after almost the whole list is dropped. Therefore, by using partial instead of full rankings we usually lose very little information.

**Fig. 4.**
Predicting genes to a GO category based on the knockouts of its transcription factors. A gene name on the x-axis corresponds to a knockout and each bubble represents the Fisher's exact test P-value, showing the enrichment of the knock-out affected genes in the GO category. The horizontal line shows the same enrichment P-value for the aggregated list. The size of the bubble corresponds to the number of regulated genes in the knockout and the color shows if the P-value is significant. The P-values show that the aggregated list is more enriched in the genes related to the corresponding process than most of the inputs.

**Fig. 5.**
AUC scores when predicting transcription factor targets based on gene co-expression. The gray dots represent the individual results and black dots and plus signs aggregated results with RRA and Stuart method. These values show that in the presence of a signal in the inputs, aggregation methods pick it up and outperform most of the inputs. When the signal is low in the input (AUC ∼0.5), aggregated results are not considerably better. The results for RRA and Stuart method are almost identical, since they use very similar criteria for aggregation.

See this image and copyright information in PMC

References

1. Adler P., et al. Mining for coexpression across hundreds of datasets using novel rank aggregation and visualization methods. Genome Biol. 2009;10:R139. - PMC - PubMed
1. Aerts S., et al. Gene prioritization through genomic data fusion. Nat. Biotechnol. 2006;24:537–544. - PubMed
1. Barrett T., et al. Ncbi geo: archive for high-throughput functional genomic data. Nucleic Acids Res. 2009;37:D885–D890. - PMC - PubMed
1. Bie T.D., et al. Kernel-based data fusion for gene prioritization. Bioinformatics. 2007;23:i125–i132. - PubMed
1. Boulesteix A., Slawski M. Stability and aggregation of ranked gene lists. Brief. Bioinformatics. 2009;10:556. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Robust rank aggregation for gene list integration and meta-analysis

Affiliation

Robust rank aggregation for gene list integration and meta-analysis

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases

Research Materials