Ultrafast one-pass FASTQ data preprocessing, quality control, and deduplication using fastp
- PMID: 38868435
- PMCID: PMC10989850
- DOI: 10.1002/imt2.107
Ultrafast one-pass FASTQ data preprocessing, quality control, and deduplication using fastp
Abstract
A large amount of sequencing data is generated and processed every day with the continuous evolution of sequencing technology and the expansion of sequencing applications. One consequence of such sequencing data explosion is the increasing cost and complexity of data processing. The preprocessing of FASTQ data, which means removing adapter contamination, filtering low-quality reads, and correcting wrongly represented bases, is an indispensable but resource intensive part of sequencing data analysis. Therefore, although a lot of software applications have been developed to solve this problem, bioinformatics scientists and engineers are still pursuing faster, simpler, and more energy-efficient software. Several years ago, the author developed fastp, which is an ultrafast all-in-one FASTQ data preprocessor with many modern features. This software has been approved by many bioinformatics users and has been continuously maintained and updated. Since the first publication on fastp, it has been greatly improved, making it even faster and more powerful. For instance, the duplication evaluation module has been improved, and a new deduplication module has been added. This study aimed to introduce the new features of fastp and demonstrate how it was designed and implemented.
Keywords: FASTQ; adapter; duplication; filtering; preprocessing; quality control.
© 2023 The Authors. iMeta published by John Wiley & Sons Australia, Ltd on behalf of iMeta Science.
Conflict of interest statement
The author declares no conflict of interest.
Figures



Similar articles
-
fastp: an ultra-fast all-in-one FASTQ preprocessor.Bioinformatics. 2018 Sep 1;34(17):i884-i890. doi: 10.1093/bioinformatics/bty560. Bioinformatics. 2018. PMID: 30423086 Free PMC article.
-
fastQ_brew: module for analysis, preprocessing, and reformatting of FASTQ sequence data.BMC Res Notes. 2017 Jul 12;10(1):275. doi: 10.1186/s13104-017-2616-7. BMC Res Notes. 2017. PMID: 28701181 Free PMC article.
-
FastqPuri: high-performance preprocessing of RNA-seq data.BMC Bioinformatics. 2019 May 3;20(1):226. doi: 10.1186/s12859-019-2799-0. BMC Bioinformatics. 2019. PMID: 31053060 Free PMC article.
-
PathoQC: Computationally Efficient Read Preprocessing and Quality Control for High-Throughput Sequencing Data Sets.Cancer Inform. 2015 May 12;13(Suppl 1):167-76. doi: 10.4137/CIN.S13890. eCollection 2014. Cancer Inform. 2015. PMID: 25983538 Free PMC article. Review.
-
The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants.Nucleic Acids Res. 2010 Apr;38(6):1767-71. doi: 10.1093/nar/gkp1137. Epub 2009 Dec 16. Nucleic Acids Res. 2010. PMID: 20015970 Free PMC article. Review.
Cited by
-
Metabolome and Transcriptome Combined Reveal the Main Floral Volatile Compounds and Key Regulatory Genes of Castanea mollissima.Plants (Basel). 2024 Oct 14;13(20):2865. doi: 10.3390/plants13202865. Plants (Basel). 2024. PMID: 39458813 Free PMC article.
-
Integrated large-scale metagenome assembly and multi-kingdom network analyses identify sex differences in the human nasal microbiome.Genome Biol. 2024 Oct 8;25(1):257. doi: 10.1186/s13059-024-03389-2. Genome Biol. 2024. PMID: 39380016 Free PMC article.
-
Microglia replacement by ER-Hoxb8 conditionally immortalized macrophages provides insight into Aicardi-Goutières Syndrome neuropathology.bioRxiv [Preprint]. 2025 May 15:2024.09.18.613629. doi: 10.1101/2024.09.18.613629. bioRxiv. 2025. PMID: 39345609 Free PMC article. Preprint.
-
Target gene regulatory network of miR-497 in angiosarcoma.bioRxiv [Preprint]. 2023 Sep 25:2023.09.24.559218. doi: 10.1101/2023.09.24.559218. bioRxiv. 2023. Update in: Mol Cancer Res. 2024 Sep 4;22(9):879-890. doi: 10.1158/1541-7786.MCR-23-1075. PMID: 37808715 Free PMC article. Updated. Preprint.
-
A near-complete assembly of the Houttuynia cordata genome provides insights into the regulatory mechanism of flavonoid biosynthesis in Yuxingcao.Plant Commun. 2024 Oct 14;5(10):101075. doi: 10.1016/j.xplc.2024.101075. Epub 2024 Sep 2. Plant Commun. 2024. PMID: 39228129 Free PMC article.
References
-
- Martin, Marcel . 2011. “Cutadapt Removes Adapter Sequences from High‐throughput Sequencing Reads.” EMBnet.journal 17: 10. 10.14806/ej.17.1.200 - DOI
LinkOut - more resources
Full Text Sources