Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jul 29;2(2):e7.
doi: 10.2196/games.3350.

The cure: design and evaluation of a crowdsourcing game for gene selection for breast cancer survival prediction

Affiliations

The cure: design and evaluation of a crowdsourcing game for gene selection for breast cancer survival prediction

Benjamin M Good et al. JMIR Serious Games. .

Abstract

Background: Molecular signatures for predicting breast cancer prognosis could greatly improve care through personalization of treatment. Computational analyses of genome-wide expression datasets have identified such signatures, but these signatures leave much to be desired in terms of accuracy, reproducibility, and biological interpretability. Methods that take advantage of structured prior knowledge (eg, protein interaction networks) show promise in helping to define better signatures, but most knowledge remains unstructured. Crowdsourcing via scientific discovery games is an emerging methodology that has the potential to tap into human intelligence at scales and in modes unheard of before.

Objective: The main objective of this study was to test the hypothesis that knowledge linking expression patterns of specific genes to breast cancer outcomes could be captured from players of an open, Web-based game. We envisioned capturing knowledge both from the player's prior experience and from their ability to interpret text related to candidate genes presented to them in the context of the game.

Methods: We developed and evaluated an online game called The Cure that captured information from players regarding genes for use as predictors of breast cancer survival. Information gathered from game play was aggregated using a voting approach, and used to create rankings of genes. The top genes from these rankings were evaluated using annotation enrichment analysis, comparison to prior predictor gene sets, and by using them to train and test machine learning systems for predicting 10 year survival.

Results: Between its launch in September 2012 and September 2013, The Cure attracted more than 1000 registered players, who collectively played nearly 10,000 games. Gene sets assembled through aggregation of the collected data showed significant enrichment for genes known to be related to key concepts such as cancer, disease progression, and recurrence. In terms of the predictive accuracy of models trained using this information, these gene sets provided comparable performance to gene sets generated using other methods, including those used in commercial tests. The Cure is available on the Internet.

Conclusions: The principal contribution of this work is to show that crowdsourcing games can be developed as a means to address problems involving domain knowledge. While most prior work on scientific discovery games and crowdsourcing in general takes as a premise that contributors have little or no expertise, here we demonstrated a crowdsourcing system that succeeded in capturing expert knowledge.

Keywords: Web applications; artificial intelligence; breast neoplasms; collaborative and social computing systems and tools; computer games; crowdsourcing; feature selection; gene expression; supervised learning; survival analysis.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

Figure 1
Figure 1
The Cure game. The figure shows a game in progress in which both players have completed 2 of the 5 turns. Players alternate turns, taking a gene card from the board and adding it to their hand. The player with the highest score after 5 turns is the winner. The tabbed display provides gene annotations ("Ontology", "Rifs") and views of decision trees constructed by the system using the selected genes. The scores reflect the predictive power of the selected genes. The system produces these scores by using data associated with the selected genes to train and test a decision tree classifier. The scores are the accuracy of these inferred classifiers.
Figure 2
Figure 2
The Gene Rifs tab showing information about the Dicer gene. Gene Rifs provide textual descriptions of gene function extracted from abstracts. These can be used to gain insights into the possible connections between the gene and breast cancer prognosis, and thus can help players to intelligently select genes in the game.
Figure 3
Figure 3
The board selection view. Stars indicate boards the active player has completed, circles indicate boards that have been completed by a sufficient number of different players, and numbers indicate open boards. The pink progress bar indicates how close the community is to finishing the board.
Figure 4
Figure 4
New player registrations per month, with academic degree. The figure shows the fluctuations in both the size and the demographics of the player population over time.
Figure 5
Figure 5
Games played per player. The majority of players only played a few games, while some players played several hundred games.
Figure 6
Figure 6
Overlap of game-derived gene sets.
Figure 7
Figure 7
Overlap of "expert" gene set derived from game data (in green) with prior published predictor gene sets. RFRS: Random Forest Relapse Score.
Figure 8
Figure 8
Evaluation of accuracy of models trained to predict ten year survival using gene sets derived from the game, and prior gene sets from the breast cancer literature. Lauss, Literature survey [27]. Vant’Veer datasets [3]. RFRS: Random Forest Relapse Score.
Figure 9
Figure 9
Ages of players.
Figure 10
Figure 10
Levels of breast cancer knowledge among players.

Similar articles

Cited by

References

    1. Bray F, Ren JS, Masuyer E, Ferlay J. Global estimates of cancer prevalence for 27 sites in the adult population in 2008. Int J Cancer. 2013 Mar 1;132(5):1133–1145. doi: 10.1002/ijc.27711. - DOI - PubMed
    1. van 't Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT, Schreiber GJ, Kerkhoven RM, Roberts C, Linsley PS, Bernards R, Friend SH. Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002 Jan 31;415(6871):530–536. doi: 10.1038/415530a. - DOI - PubMed
    1. Margolin AA, Bilal E, Huang E, Norman TC, Ottestad L, Mecham BH, Sauerwine B, Kellen MR, Mangravite LM, Furia MD, Vollan HK, Rueda OM, Guinney J, Deflaux NA, Hoff B, Schildwachter X, Russnes HG, Park D, Vang VO, Pirtle T, Youseff L, Citro C, Curtis C, Kristensen VN, Hellerstein J, Friend SH, Stolovitzky G, Aparicio S, Caldas C, Børresen-Dale AL. Systematic analysis of challenge-driven improvements in molecular prognostic models for breast cancer. Sci Transl Med. 2013 Apr 17;5(181):181re1. doi: 10.1126/scitranslmed.3006112. http://stm.sciencemag.org/cgi/pmidlookup?view=long&pmid=23596205 - DOI - PMC - PubMed
    1. Griffith OL, Pepin F, Enache OM, Heiser LM, Collisson EA, Spellman PT, Gray JW. A robust prognostic signature for hormone-positive node-negative breast cancer. Genome Med. 2013;5(10):92. doi: 10.1186/gm496. http://www.genomemedicine.com/content/5/10/92 - DOI - PMC - PubMed
    1. Daemen A, Griffith OL, Heiser LM, Wang NJ, Enache OM, Sanborn Z, Pepin F, Durinck S, Korkola JE, Griffith M, Hur JS, Huh N, Chung J, Cope L, Fackler MJ, Umbricht C, Sukumar S, Seth P, Sukhatme VP, Jakkula LR, Lu Y, Mills GB, Cho RJ, Collisson EA, van't Veer LJ, Spellman PT, Gray JW. Modeling precision treatment of breast cancer. Genome Biol. 2013;14(10):R110. doi: 10.1186/gb-2013-14-10-r110. http://genomebiology.com/content/14/10/R110 - DOI - PMC - PubMed

LinkOut - more resources