Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012;7(3):e31362.
doi: 10.1371/journal.pone.0031362. Epub 2012 Mar 7.

Phylo: a citizen science approach for improving multiple sequence alignment

Affiliations

Phylo: a citizen science approach for improving multiple sequence alignment

Alexander Kawrykow et al. PLoS One. 2012.

Abstract

Background: Comparative genomics, or the study of the relationships of genome structure and function across different species, offers a powerful tool for studying evolution, annotating genomes, and understanding the causes of various genetic disorders. However, aligning multiple sequences of DNA, an essential intermediate step for most types of analyses, is a difficult computational task. In parallel, citizen science, an approach that takes advantage of the fact that the human brain is exquisitely tuned to solving specific types of problems, is becoming increasingly popular. There, instances of hard computational problems are dispatched to a crowd of non-expert human game players and solutions are sent back to a central server.

Methodology/principal findings: We introduce Phylo, a human-based computing framework applying "crowd sourcing" techniques to solve the Multiple Sequence Alignment (MSA) problem. The key idea of Phylo is to convert the MSA problem into a casual game that can be played by ordinary web users with a minimal prior knowledge of the biological context. We applied this strategy to improve the alignment of the promoters of disease-related genes from up to 44 vertebrate species. Since the launch in November 2010, we received more than 350,000 solutions submitted from more than 12,000 registered users. Our results show that solutions submitted contributed to improving the accuracy of up to 70% of the alignment blocks considered.

Conclusions/significance: We demonstrate that, combined with classical algorithms, crowd computing techniques can be successfully used to help improving the accuracy of MSA. More importantly, we show that an NP-hard computational problem can be embedded in casual game that can be easily played by people without significant scientific training. This suggests that citizen science approaches can be used to exploit the billions of "human-brain peta-flops" of computation that are spent every day playing games. Phylo is available at: http://phylo.cs.mcgill.ca.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: Luis Sarmenta is an employee of Nokia and Jerome Waldispuhl received a donation from Nokia. This does not alter the author's adherence to all the PLoS ONE policies on sharing data and materials.

Figures

Figure 1
Figure 1. Phylo crowd-sourcing system for local improvement of multiple genome alignments.
Figure 2
Figure 2. Statistics on the number of players.
The top figure shows the number of puzzles played by registered and anonymous players during the seven first months of Phylo. The bottom figure shows the number of registered players w.r.t. the number of puzzle they solved.
Figure 3
Figure 3. Statistics on the performance of players as a function of the number of sequence in the puzzle.
(a) Average Phylo score of original alignments (red) and average best score obtained (yellow). (b) Success rate per level: Average number of times a puzzle has been played (red), and average number of times a player reaches the final stage of a puzzle (yellow).

References

    1. Sankoff D, Morel C, Cedergren RJ. Evolution of 5S RNA and the non-randomness of base replacement. Nat New Biol. 1973;245:232–4. - PubMed
    1. Notredame C. Recent evolutions of multiple sequence alignment algorithms. PLoS Comput Biol. 2007;3:e123. - PMC - PubMed
    1. Blanchette M. Computation and analysis of genomic multi-sequence alignments. Annu Rev Genomics Hum Genet. 2007;8:193–213. - PubMed
    1. Siepel A, Diekhans M, Brejová B, Langton L, Stevens M, et al. Targeted discovery of novel human exons by comparative genomics. Genome Res. 2007;17:1763–73. - PMC - PubMed
    1. Pedersen JS, Bejerano G, Siepel A, Rosenbloom K, Lindblad-Toh K, et al. Identification and classification of conserved RNA secondary structures in the human genome. PLoS Comput Biol. 2006;2:e33. - PMC - PubMed

Publication types