Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jan 22;19(1):16.
doi: 10.1186/s12859-018-2024-6.

MutScan: fast detection and visualization of target mutations by scanning FASTQ data

Affiliations

MutScan: fast detection and visualization of target mutations by scanning FASTQ data

Shifu Chen et al. BMC Bioinformatics. .

Abstract

Background: Some types of clinical genetic tests, such as cancer testing using circulating tumor DNA (ctDNA), require sensitive detection of known target mutations. However, conventional next-generation sequencing (NGS) data analysis pipelines typically involve different steps of filtering, which may cause miss-detection of key mutations with low frequencies. Variant validation is also indicated for key mutations detected by bioinformatics pipelines. Typically, this process can be executed using alignment visualization tools such as IGV or GenomeBrowse. However, these tools are too heavy and therefore unsuitable for validating mutations in ultra-deep sequencing data.

Result: We developed MutScan to address problems of sensitive detection and efficient validation for target mutations. MutScan involves highly optimized string-searching algorithms, which can scan input FASTQ files to grab all reads that support target mutations. The collected supporting reads for each target mutation will be piled up and visualized using web technologies such as HTML and JavaScript. Algorithms such as rolling hash and bloom filter are applied to accelerate scanning and make MutScan applicable to detect or visualize target mutations in a very fast way.

Conclusion: MutScan is a tool for the detection and visualization of target mutations by only scanning FASTQ raw data directly. Compared to conventional pipelines, this offers a very high performance, executing about 20 times faster, and offering maximal sensitivity since it can grab mutations with even one single supporting read. MutScan visualizes detected mutations by generating interactive pile-ups using web technologies. These can serve to validate target mutations, thus avoiding false positives. Furthermore, MutScan can visualize all mutation records in a VCF file to HTML pages for cloud-friendly VCF validation. MutScan is an open source tool available at GitHub: https://github.com/OpenGene/MutScan.

Keywords: Fast detection; MutScan; Mutation scan; Variant visualization.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

N/A

Consent for publication

N/A

Competing interests

The authors have the following interests: Shifu Chen, Tanxiao Huang, Hong Li and Mingyan Xu are employed by HaploX BioTechnology. There are no patents, products in development or marketed products to declare.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
The overall design of MutScan. Three steps are presented: indexing, matching, and reporting. In the indexing step, a hashmap of KMER (all possible substrings of length k, k = 16 in MutScan’s implementation) mapping to mutations is computed; in the matching step, reads are associated with mutations by looking up the indexed hashmap; in the reporting step, the detected mutations are validated, the supporting reads for each mutation are piled up and rendered to an HTML page. The input and output files are then highlighted in grey
Fig. 2
Fig. 2
Screenshot of a MutScan’s pile-up result. The demonstrated mutation is EGFR p.T790 M (hg19 chr7:55,249,071 C > T), which is an important drugable target for lung cancer. This mutation’s (L, M, R) sequences are provided at the top of this figure, and M is the mutation base (C > T). The color of the bases indicates the quality score (green and blue indicate high quality, red indicates low quality). This screenshot is incomplete, and the complete report can be found at http://opengene.org/MutScan/report.html
Fig. 3
Fig. 3
Comparison result of MutScan and conventional NGS pipeline. The conventional NGS is a tumor variant calling pipeline using AfterQC + BWA + Samtools + VarScan2, which can be found at https://github.com/sfchen/tumor-pipeline. Mutations are given in columns and samples are given in rows. Tumor pipeline detected mutations are highlighted in shades of red, and MutScan detected mutations are highlighted in shades of green. The depth of the color reflects the unique supporting read number, which is also shown in the table cells

References

    1. Bratman SV, et al. Potential clinical utility of ultrasensitive circulating tumor DNA detection with CAPP-Seq. Expert Rev Mol Diagn. 2015;15(6):715–719. - PMC - PubMed
    1. Wu K, et al. Personalized targeted therapy for lung cancer. Int J Mol Sci. 2012;13(9):11471–11496. doi: 10.3390/ijms130911471. - DOI - PMC - PubMed
    1. Bettegowda C, et al. Detection of circulating tumor DNA in early- and late-stage human malignancies. Sci Transl Med. 2014;6(224):224ra24. doi: 10.1126/scitranslmed.3007094. - DOI - PMC - PubMed
    1. Newman AM, et al. An ultrasensitive method for quantitating circulating tumor DNA with broad patient coverage. Nat Med. 2014;20(5):548–554. doi: 10.1038/nm.3519. - DOI - PMC - PubMed
    1. Chen S, et al. AfterQC: automatic filtering, trimming, error removing and quality control for FASTQ data. BMC Bioinformatics. 2017;18(Suppl 3):80. doi: 10.1186/s12859-017-1469-3. - DOI - PMC - PubMed

Publication types

Substances