Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Nov 29;19(1):457.
doi: 10.1186/s12859-018-2446-1.

Developing reproducible bioinformatics analysis workflows for heterogeneous computing environments to support African genomics

Affiliations

Developing reproducible bioinformatics analysis workflows for heterogeneous computing environments to support African genomics

Shakuntala Baichoo et al. BMC Bioinformatics. .

Abstract

Background: The Pan-African bioinformatics network, H3ABioNet, comprises 27 research institutions in 17 African countries. H3ABioNet is part of the Human Health and Heredity in Africa program (H3Africa), an African-led research consortium funded by the US National Institutes of Health and the UK Wellcome Trust, aimed at using genomics to study and improve the health of Africans. A key role of H3ABioNet is to support H3Africa projects by building bioinformatics infrastructure such as portable and reproducible bioinformatics workflows for use on heterogeneous African computing environments. Processing and analysis of genomic data is an example of a big data application requiring complex interdependent data analysis workflows. Such bioinformatics workflows take the primary and secondary input data through several computationally-intensive processing steps using different software packages, where some of the outputs form inputs for other steps. Implementing scalable, reproducible, portable and easy-to-use workflows is particularly challenging.

Results: H3ABioNet has built four workflows to support (1) the calling of variants from high-throughput sequencing data; (2) the analysis of microbial populations from 16S rDNA sequence data; (3) genotyping and genome-wide association studies; and (4) single nucleotide polymorphism imputation. A week-long hackathon was organized in August 2016 with participants from six African bioinformatics groups, and US and European collaborators. Two of the workflows are built using the Common Workflow Language framework (CWL) and two using Nextflow. All the workflows are containerized for improved portability and reproducibility using Docker, and are publicly available for use by members of the H3Africa consortium and the international research community.

Conclusion: The H3ABioNet workflows have been implemented in view of offering ease of use for the end user and high levels of reproducibility and portability, all while following modern state of the art bioinformatics data processing protocols. The H3ABioNet workflows will service the H3Africa consortium projects and are currently in use. All four workflows are also publicly available for research scientists worldwide to use and adapt for their respective needs. The H3ABioNet workflows will help develop bioinformatics capacity and assist genomics research within Africa and serve to increase the scientific output of H3Africa and its Pan-African Bioinformatics Network.

Keywords: Africa; Bioinformatics; Docker; Genomics; Pipeline; Reproducibility; Workflows.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

Michael R. Crusoe, in his role as CWL Community Engineer, has had his salary supported in the past by grants from Seven Bridges Genomics, Inc to his employers. The other authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
Workflow A: whole genome/Exome NGS data analysis
Fig. 2
Fig. 2
Workflow B: 16S rDNA diversity analysis
Fig. 3
Fig. 3
Workflow C: genome wide-association studies
Fig. 4
Fig. 4
Workflow D: SNPs imputation

Similar articles

  • Development of Bioinformatics Infrastructure for Genomics Research.
    Mulder NJ, Adebiyi E, Adebiyi M, Adeyemi S, Ahmed A, Ahmed R, Akanle B, Alibi M, Armstrong DL, Aron S, Ashano E, Baichoo S, Benkahla A, Brown DK, Chimusa ER, Fadlelmola FM, Falola D, Fatumo S, Ghedira K, Ghouila A, Hazelhurst S, Isewon I, Jung S, Kassim SK, Kayondo JK, Mbiyavanga M, Meintjes A, Mohammed S, Mosaku A, Moussa A, Muhammd M, Mungloo-Dilmohamud Z, Nashiru O, Odia T, Okafor A, Oladipo O, Osamor V, Oyelade J, Sadki K, Salifu SP, Soyemi J, Panji S, Radouani F, Souiai O, Tastan Bishop Ö; H3ABioNet Consortium, as members of the H3Africa Consortium. Mulder NJ, et al. Glob Heart. 2017 Jun;12(2):91-98. doi: 10.1016/j.gheart.2017.01.005. Epub 2017 Mar 13. Glob Heart. 2017. PMID: 28302555 Free PMC article. Review.
  • Organizing and running bioinformatics hackathons within Africa: The H3ABioNet cloud computing experience.
    Ahmed AE, Mpangase PT, Panji S, Baichoo S, Souilmi Y, Fadlelmola FM, Alghali M, Aron S, Bendou H, De Beste E, Mbiyavanga M, Souiai O, Yi L, Zermeno J, Armstrong D, O'Connor BD, Mainzer LS, Crusoe MR, Meintjes A, Van Heusden P, Botha G, Joubert F, Jongeneel CV, Hazelhurst S, Mulder N. Ahmed AE, et al. AAS Open Res. 2019 Aug 7;1:9. doi: 10.12688/aasopenres.12847.2. eCollection 2018. AAS Open Res. 2019. PMID: 32382696 Free PMC article.
  • H3ABioNet, a sustainable pan-African bioinformatics network for human heredity and health in Africa.
    Mulder NJ, Adebiyi E, Alami R, Benkahla A, Brandful J, Doumbia S, Everett D, Fadlelmola FM, Gaboun F, Gaseitsiwe S, Ghazal H, Hazelhurst S, Hide W, Ibrahimi A, Jaufeerally Fakim Y, Jongeneel CV, Joubert F, Kassim S, Kayondo J, Kumuthini J, Lyantagaye S, Makani J, Mansour Alzohairy A, Masiga D, Moussa A, Nash O, Ouwe Missi Oukem-Boyer O, Owusu-Dabo E, Panji S, Patterton H, Radouani F, Sadki K, Seghrouchni F, Tastan Bishop Ö, Tiffin N, Ulenga N; H3ABioNet Consortium. Mulder NJ, et al. Genome Res. 2016 Feb;26(2):271-7. doi: 10.1101/gr.196295.115. Epub 2015 Dec 1. Genome Res. 2016. PMID: 26627985 Free PMC article.
  • Assessing computational genomics skills: Our experience in the H3ABioNet African bioinformatics network.
    Jongeneel CV, Achinike-Oduaran O, Adebiyi E, Adebiyi M, Adeyemi S, Akanle B, Aron S, Ashano E, Bendou H, Botha G, Chimusa E, Choudhury A, Donthu R, Drnevich J, Falola O, Fields CJ, Hazelhurst S, Hendry L, Isewon I, Khetani RS, Kumuthini J, Kimuda MP, Magosi L, Mainzer LS, Maslamoney S, Mbiyavanga M, Meintjes A, Mugutso D, Mpangase P, Munthali R, Nembaware V, Ndhlovu A, Odia T, Okafor A, Oladipo O, Panji S, Pillay V, Rendon G, Sengupta D, Mulder N. Jongeneel CV, et al. PLoS Comput Biol. 2017 Jun 1;13(6):e1005419. doi: 10.1371/journal.pcbi.1005419. eCollection 2017 Jun. PLoS Comput Biol. 2017. PMID: 28570565 Free PMC article.
  • H3Africa and the African life sciences ecosystem: building sustainable innovation.
    Dandara C, Huzair F, Borda-Rodriguez A, Chirikure S, Okpechi I, Warnich L, Masimirembwa C. Dandara C, et al. OMICS. 2014 Dec;18(12):733-9. doi: 10.1089/omi.2014.0145. OMICS. 2014. PMID: 25454511 Free PMC article. Review.

Cited by

References

    1. Kircher Martin, Kelso Janet. High-throughput DNA sequencing - concepts and limitations. BioEssays. 2010;32(6):524–536. doi: 10.1002/bies.200900181. - DOI - PubMed
    1. Sandve Geir Kjetil, Nekrutenko Anton, Taylor James, Hovig Eivind. Ten Simple Rules for Reproducible Computational Research. PLoS Computational Biology. 2013;9(10):e1003285. doi: 10.1371/journal.pcbi.1003285. - DOI - PMC - PubMed
    1. Schulz WadeL, Durant Thomas, Siddon AlexaJ, Torres Richard. Use of application containers and workflows for genomic data analysis. Journal of Pathology Informatics. 2016;7(1):53. doi: 10.4103/2153-3539.197197. - DOI - PMC - PubMed
    1. Leipzig J. A review of bioinformatic pipeline frameworks. Brief Bioinform. 2017; 18(3):530–6. 10.1093/bib/bbw020. - PMC - PubMed
    1. Liu Bo, Madduri Ravi K, Sotomayor Borja, Chard Kyle, Lacinski Lukasz, Dave Utpal J, Li Jianqiang, Liu Chunchen, Foster Ian T. Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses. Journal of Biomedical Informatics. 2014;49:119–133. doi: 10.1016/j.jbi.2014.01.005. - DOI - PMC - PubMed

LinkOut - more resources