eHive: an artificial intelligence workflow system for genomic analysis
- PMID: 20459813
- PMCID: PMC2885371
- DOI: 10.1186/1471-2105-11-240
eHive: an artificial intelligence workflow system for genomic analysis
Abstract
Background: The Ensembl project produces updates to its comparative genomics resources with each of its several releases per year. During each release cycle approximately two weeks are allocated to generate all the genomic alignments and the protein homology predictions. The number of calculations required for this task grows approximately quadratically with the number of species. We currently support 50 species in Ensembl and we expect the number to continue to grow in the future.
Results: We present eHive, a new fault tolerant distributed processing system initially designed to support comparative genomic analysis, based on blackboard systems, network distributed autonomous agents, dataflow graphs and block-branch diagrams. In the eHive system a MySQL database serves as the central blackboard and the autonomous agent, a Perl script, queries the system and runs jobs as required. The system allows us to define dataflow and branching rules to suit all our production pipelines. We describe the implementation of three pipelines: (1) pairwise whole genome alignments, (2) multiple whole genome alignments and (3) gene trees with protein homology inference. Finally, we show the efficiency of the system in real case scenarios.
Conclusions: eHive allows us to produce computationally demanding results in a reliable and efficient way with minimal supervision and high throughput. Further documentation is available at: http://www.ensembl.org/info/docs/eHive/.
Figures





Similar articles
-
Ensembl Plants: Integrating Tools for Visualizing, Mining, and Analyzing Plant Genomics Data.Methods Mol Biol. 2016;1374:115-40. doi: 10.1007/978-1-4939-3167-5_6. Methods Mol Biol. 2016. PMID: 26519403
-
A De-Novo Genome Analysis Pipeline (DeNoGAP) for large-scale comparative prokaryotic genomics studies.BMC Bioinformatics. 2016 Jun 30;17(1):260. doi: 10.1186/s12859-016-1142-2. BMC Bioinformatics. 2016. PMID: 27363390 Free PMC article.
-
Ensembl 2019.Nucleic Acids Res. 2019 Jan 8;47(D1):D745-D751. doi: 10.1093/nar/gky1113. Nucleic Acids Res. 2019. PMID: 30407521 Free PMC article.
-
Ensembl Plants: Integrating Tools for Visualizing, Mining, and Analyzing Plant Genomic Data.Methods Mol Biol. 2017;1533:1-31. doi: 10.1007/978-1-4939-6658-5_1. Methods Mol Biol. 2017. PMID: 27987162
-
Ensembl 2018.Nucleic Acids Res. 2018 Jan 4;46(D1):D754-D761. doi: 10.1093/nar/gkx1098. Nucleic Acids Res. 2018. PMID: 29155950 Free PMC article.
Cited by
-
Ensembl 2015.Nucleic Acids Res. 2015 Jan;43(Database issue):D662-9. doi: 10.1093/nar/gku1010. Epub 2014 Oct 28. Nucleic Acids Res. 2015. PMID: 25352552 Free PMC article.
-
Cloud Computing Enabled Big Multi-Omics Data Analytics.Bioinform Biol Insights. 2021 Jul 28;15:11779322211035921. doi: 10.1177/11779322211035921. eCollection 2021. Bioinform Biol Insights. 2021. PMID: 34376975 Free PMC article. Review.
-
WormBase ParaSite - a comprehensive resource for helminth genomics.Mol Biochem Parasitol. 2017 Jul;215:2-10. doi: 10.1016/j.molbiopara.2016.11.005. Epub 2016 Nov 27. Mol Biochem Parasitol. 2017. PMID: 27899279 Free PMC article.
-
The ensembl regulatory build.Genome Biol. 2015 Mar 24;16(1):56. doi: 10.1186/s13059-015-0621-5. Genome Biol. 2015. PMID: 25887522 Free PMC article.
-
Ensembl 2011.Nucleic Acids Res. 2011 Jan;39(Database issue):D800-6. doi: 10.1093/nar/gkq1064. Epub 2010 Nov 2. Nucleic Acids Res. 2011. PMID: 21045057 Free PMC article.
References
-
- Hubbard TJ, Aken BL, Ayling S, Ballester B, Beal K, Bragin E, Brent S, Chen Y, Clapham P, Clarke L, Coates G, Fairley S, Fitzgerald S, Fernandez-Banet J, Gordon L, Graf S, Haider S, Hammond M, Holland R, Howe K, Jenkinson A, Johnson N, Kahari A, Keefe D, Keenan S, Kinsella R, Kokocinski F, Kulesha E, Lawson D, Longden I, Megy K, Meidl P, Overduin B, Parker A, Pritchard B, Rios D, Schuster M, Slater G, Smedley D, Spooner W, Spudich G, Trevanion S, Vilella A, Vogel J, White S, Wilder S, Zadissa A, Birney E, Cunningham F, Curwen V, Durbin R, Fernandez-Suarez XM, Herrero J, Kasprzyk A, Proctor G, Smith J, Searle S, Flicek P. Ensembl 2009. Nucleic Acids Res. 2009;37:D690–D697. doi: 10.1093/nar/gkn828. - DOI - PMC - PubMed
-
- Reynolds CW. Flocks, herds and schools: A distributed behavioral model. Proceedings of the 14th annual conference on Computer graphics and interactive techniques. 1987. pp. 25–34. full_text.
-
- Nii HP. The blackboard model of problem solving and the evolution of blackboard architectures. AI Magazine. 1986;7:38–53.
-
- Nwana HS. Software agents: An overview. Knowledge Engineering Review. 1996;11:205–244. doi: 10.1017/S026988890000789X. - DOI
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources