Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Aug 2;40(8):btae477.
doi: 10.1093/bioinformatics/btae477.

AssemblyQC: a Nextflow pipeline for reproducible reporting of assembly quality

Affiliations

AssemblyQC: a Nextflow pipeline for reproducible reporting of assembly quality

Usman Rashid et al. Bioinformatics. .

Abstract

Summary: Genome assembly projects have grown exponentially due to breakthroughs in sequencing technologies and assembly algorithms. Evaluating the quality of genome assemblies is critical to ensure the reliability of downstream analysis and interpretation. To fulfil this task, we have developed the AssemblyQC pipeline that performs file-format validation, contaminant checking, contiguity measurement, gene- and repeat-space completeness quantification, telomere inspection, taxonomic assignment, synteny alignment, scaffold examination through Hi-C contact-map visualization, and assessments of completeness, consensus quality and phasing through k-mer analysis. It produces a comprehensive HTML report with method descriptions, tables, and visualizations.

Availability and implementation: The pipeline uses Nextflow for workflow orchestration and adheres to the best-practice established by the nf-core community. This pipeline offers a reproducible, scalable, and portable method to assess the quality of genome assemblies-the code is available online at GitHub: https://github.com/Plant-Food-Research-Open/assemblyqc.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Figure 1.
Figure 1.
Pipeline flowchart.

References

    1. Agarwal T, Suravajhala R, Bhushan M. et al. Recent Advances in Gene and Genome Assembly: Challenges and Implications. Advances in Synthetic Biology2020:199–220.
    1. Andrews S. FastQC: A Quality Control Tool for High Throughput Sequence Data. Cambridge, United Kingdom: Babraham Bioinformatics, Babraham Institute. 2010.
    1. Astashyn A, Tvedte ES, Sweeney D. et al. Rapid and sensitive detection of genome contamination at scale with FCS-GX. Genome Biology 2024;25(1):60. - PMC - PubMed
    1. Brown M, González De la Rosa PM, Mark B.. A Telomere Identification Toolkit. Zenodo (2023). 10.5281/zenodo.10091385. Code repository: https://github.com/tolkit/telomeric-identifier. - DOI
    1. Cabanettes F, Klopp C.. D-GENIES: dot plot large genomes in an interactive, efficient and simple way. PeerJ 2018;6:e4958. - PMC - PubMed

Publication types

Grants and funding