Navigating bottlenecks and trade-offs in genomic data analysis
- PMID: 36476810
- PMCID: PMC10204111
- DOI: 10.1038/s41576-022-00551-z
Navigating bottlenecks and trade-offs in genomic data analysis
Abstract
Genome sequencing and analysis allow researchers to decode the functional information hidden in DNA sequences as well as to study cell to cell variation within a cell population. Traditionally, the primary bottleneck in genomic analysis pipelines has been the sequencing itself, which has been much more expensive than the computational analyses that follow. However, an important consequence of the continued drive to expand the throughput of sequencing platforms at lower cost is that often the analytical pipelines are struggling to keep up with the sheer amount of raw data produced. Computational cost and efficiency have thus become of ever increasing importance. Recent methodological advances, such as data sketching, accelerators and domain-specific libraries/languages, promise to address these modern computational challenges. However, despite being more efficient, these innovations come with a new set of trade-offs, both expected, such as accuracy versus memory and expense versus time, and more subtle, including the human expertise needed to use non-standard programming interfaces and set up complex infrastructure. In this Review, we discuss how to navigate these new methodological advances and their trade-offs.
© 2022. Springer Nature Limited.
Conflict of interest statement
Competing interests
The authors declare no competing interests.
Figures


Similar articles
-
Nanopore sequencing technology and tools for genome assembly: computational analysis of the current state, bottlenecks and future directions.Brief Bioinform. 2019 Jul 19;20(4):1542-1559. doi: 10.1093/bib/bby017. Brief Bioinform. 2019. PMID: 29617724 Free PMC article. Review.
-
The future of Cochrane Neonatal.Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12. Early Hum Dev. 2020. PMID: 33036834
-
DolphinNext: a distributed data processing platform for high throughput genomics.BMC Genomics. 2020 Apr 19;21(1):310. doi: 10.1186/s12864-020-6714-x. BMC Genomics. 2020. PMID: 32306927 Free PMC article.
-
Enhancement of Plant Productivity in the Post-Genomics Era.Curr Genomics. 2016 Aug;17(4):295-6. doi: 10.2174/138920291704160607182507. Curr Genomics. 2016. PMID: 27499678 Free PMC article.
-
Plant pan-genomics and its applications.Mol Plant. 2023 Jan 2;16(1):168-186. doi: 10.1016/j.molp.2022.12.009. Epub 2022 Dec 15. Mol Plant. 2023. PMID: 36523157 Review.
Cited by
-
gymnotoa-db: a database and application to optimize functional annotation in gymnosperms.Database (Oxford). 2025 Mar 5;2025:baaf019. doi: 10.1093/database/baaf019. Database (Oxford). 2025. PMID: 40052362 Free PMC article.
-
Rapid species-level metagenome profiling and containment estimation with sylph.Nat Biotechnol. 2025 Aug;43(8):1348-1359. doi: 10.1038/s41587-024-02412-y. Epub 2024 Oct 8. Nat Biotechnol. 2025. PMID: 39379646 Free PMC article.
-
skandiver: a divergence-based analysis tool for identifying intercellular mobile genetic elements.Bioinformatics. 2024 Sep 1;40(Suppl 2):ii155-ii164. doi: 10.1093/bioinformatics/btae398. Bioinformatics. 2024. PMID: 39230688 Free PMC article.
-
Efficient mapping of accurate long reads in minimizer space with mapquik.Genome Res. 2023 Jul;33(7):1188-1197. doi: 10.1101/gr.277679.123. Epub 2023 Jun 30. Genome Res. 2023. PMID: 37399256 Free PMC article.
-
Machine Learning Approaches for Microorganism Identification, Virulence Assessment, and Antimicrobial Susceptibility Evaluation Using DNA Sequencing Methods: A Systematic Review.Mol Biotechnol. 2024 Nov 9. doi: 10.1007/s12033-024-01309-0. Online ahead of print. Mol Biotechnol. 2024. PMID: 39520638 Review.
References
-
- Wetterstrand KA DNA sequencing costs: data. National Human Genome Research Institute; www.genome.gov/sequencingcostsdata (2022).
-
- Preston J, VanZeeland A, & Peiffer DA Innovation at illumina: the road to the $600 human genome. Nature Portfolio https://www.nature.com/articles/d42473-021-00030-9 (2021).
-
- Pennisi EA $100 genome? New DNA sequencers could be a ‘game changer’ for biology, medicine. Science 376, 1257–1258 (2022). - PubMed
-
- Regalado A China’s BGI says it can sequence a genome for just $100. MIT Technology Review. https://www.technologyreview.com/2020/02/26/905658/china-bgi-100-dollarg... (2020).
Related links
-
- BEETL-fastq: https://github.com/BEETL/BEETL
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources