Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Apr 9;10(4):1193-1196.
doi: 10.1534/g3.119.400864.

LongQC: A Quality Control Tool for Third Generation Sequencing Long Read Data

Affiliations

LongQC: A Quality Control Tool for Third Generation Sequencing Long Read Data

Yoshinori Fukasawa et al. G3 (Bethesda). .

Erratum in

  • CORRIGENDUM.
    [No authors listed] [No authors listed] G3 (Bethesda). 2020 Nov 5;10(11):4295. doi: 10.1534/g3.120.401778. G3 (Bethesda). 2020. PMID: 33154025 Free PMC article. No abstract available.

Abstract

We propose LongQC as an easy and automated quality control tool for genomic datasets generated by third generation sequencing (TGS) technologies such as Oxford Nanopore technologies (ONT) and SMRT sequencing from Pacific Bioscience (PacBio). Key statistics were optimized for long read data, and LongQC covers all major TGS platforms. LongQC processes and visualizes those statistics automatically and quickly.

Keywords: Long read; Oxford Nanopore; PacBio; Quality control; third generation sequencers.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schematic diagram of non-sense reads and example plots for E.coli genome. (A) Blue rectangles represent normal read derived from large molecules such as genomic DNA and orange rectangle shows non-sense read. Non-sense reads have no coverage due to randomness or an even higher error rate. (B) whisker plots for standardized per-read coverage in two challenging and two normal datasets. Standardized per-read coverage is centered by mean of per-read coverage values and divided by standard deviation of per-read coverage values. Blue lines represent 3 standard deviations. (C) read length histograms for the same datasets.
Figure 2
Figure 2
Effects of E. coli filter on ONT A. thaliana dataset. Top panels were generated from the original dataset and bottom panels show plots after E. coli read removal. (A, D) Distribution of per-read coverage. (B, E) GC content distributions. (C, F) Length distributions. Yellow boxes highlighted the spikes that disappeared after E. coli read removal.

References

    1. De Coster W., D’Hert S., Schultz D. T., Cruts M., and Van Broeckhoven C., 2018. NanoPack: visualizing and processing long-read sequencing data. Bioinformatics 34: 2666–2669. 10.1093/bioinformatics/bty149 - DOI - PMC - PubMed
    1. Lanfear R., Schalamun M., Kainer D., Wang W., and Schwessinger B., 2019. MinIONQC: fast and simple quality control for MinION sequencing data. Bioinformatics 35: 523–525. 10.1093/bioinformatics/bty654 - DOI - PMC - PubMed
    1. Li, H., 2018 Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 10.1093/bioinformatics/bty191 - DOI - PMC - PubMed
    1. Loman N. J., and Quinlan A. R., 2014. Poretools: a toolkit for analyzing nanopore sequence data. Bioinformatics 30: 3399–3401. 10.1093/bioinformatics/btu555 - DOI - PMC - PubMed
    1. Michael T. P., Jupe F., Bemm F., Motley S. T., Sandoval J. P. et al. , 2018. High contiguity Arabidopsis thaliana genome assembly with a single nanopore flow cell. Nat. Commun. 9: 541 10.1038/s41467-018-03016-2 - DOI - PMC - PubMed

Publication types

LinkOut - more resources