Computational pan-genomics: status, promises and challenges
- PMID: 27769991
- PMCID: PMC5862344
- DOI: 10.1093/bib/bbw089
Computational pan-genomics: status, promises and challenges
Abstract
Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different computational methods and paradigms are needed. We will witness the rapid extension of computational pan-genomics, a new sub-area of research in computational biology. In this article, we generalize existing definitions and understand a pan-genome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to computational pan-genomics can help address many of the problems currently faced in various domains.
Keywords: data structures; haplotypes; pan-genome; read mapping; sequence graph.
© The Author 2016. Published by Oxford University Press.
Figures



Similar articles
-
PanTools: representation, storage and exploration of pan-genomic data.Bioinformatics. 2016 Sep 1;32(17):i487-i493. doi: 10.1093/bioinformatics/btw455. Bioinformatics. 2016. PMID: 27587666
-
Pan-Genome Storage and Analysis Techniques.Methods Mol Biol. 2018;1704:29-53. doi: 10.1007/978-1-4939-7463-4_2. Methods Mol Biol. 2018. PMID: 29277862 Review.
-
seq-seq-pan: building a computational pan-genome data structure on whole genome alignment.BMC Genomics. 2018 Jan 15;19(1):47. doi: 10.1186/s12864-017-4401-3. BMC Genomics. 2018. PMID: 29334898 Free PMC article.
-
Simplitigs as an efficient and scalable representation of de Bruijn graphs.Genome Biol. 2021 Apr 6;22(1):96. doi: 10.1186/s13059-021-02297-z. Genome Biol. 2021. PMID: 33823902 Free PMC article.
-
Plant pan-genomics: recent advances, new challenges, and roads ahead.J Genet Genomics. 2022 Sep;49(9):833-846. doi: 10.1016/j.jgg.2022.06.004. Epub 2022 Jun 21. J Genet Genomics. 2022. PMID: 35750315 Review.
Cited by
-
ProPan: a comprehensive database for profiling prokaryotic pan-genome dynamics.Nucleic Acids Res. 2023 Jan 6;51(D1):D767-D776. doi: 10.1093/nar/gkac832. Nucleic Acids Res. 2023. PMID: 36169225 Free PMC article.
-
Pangenomics in Microbial and Crop Research: Progress, Applications, and Perspectives.Genes (Basel). 2022 Mar 27;13(4):598. doi: 10.3390/genes13040598. Genes (Basel). 2022. PMID: 35456404 Free PMC article. Review.
-
iPTMnet RESTful API for Post-translational Modification Network Analysis.Methods Mol Biol. 2022;2499:187-204. doi: 10.1007/978-1-0716-2317-6_10. Methods Mol Biol. 2022. PMID: 35696082 Free PMC article. Review.
-
A review of the pangenome: how it affects our understanding of genomic variation, selection and breeding in domestic animals?J Anim Sci Biotechnol. 2023 May 5;14(1):73. doi: 10.1186/s40104-023-00860-1. J Anim Sci Biotechnol. 2023. PMID: 37143156 Free PMC article. Review.
-
Unitig-centered pan-genome machine learning approach for predicting antibiotic resistance and discovering novel resistance genes in bacterial strains.Comput Struct Biotechnol J. 2024 Apr 16;23:1864-1876. doi: 10.1016/j.csbj.2024.04.035. eCollection 2024 Dec. Comput Struct Biotechnol J. 2024. PMID: 38707536 Free PMC article.
References
-
- Fleischmann RD, Adams MD, White O, et al.Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 1995;269(5223):496–512. - PubMed
-
- Goffeau A, Barrell BG, Bussey H, et al.Life with 6000 genes. Science 1996;274(5287):546–67. - PubMed
-
- Lander ES, Linton LM, Birren B, et al.Initial sequencing and analysis of the human genome. Nature 2001;409(6822):860–921. - PubMed
-
- Venter JC, Adams MD, Myers EW, et al.The sequence of the human genome. Science 2001;291(5507):1304–51. - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials