Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jul 24;12(7):e0181304.
doi: 10.1371/journal.pone.0181304. eCollection 2017.

Evaluation of exome variants using the Ion Proton Platform to sequence error-prone regions

Affiliations

Evaluation of exome variants using the Ion Proton Platform to sequence error-prone regions

Heewon Seo et al. PLoS One. .

Abstract

The Ion Proton sequencer from Thermo Fisher accurately determines sequence variants from target regions with a rapid turnaround time at a low cost. However, misleading variant-calling errors can occur. We performed a systematic evaluation and manual curation of read-level alignments for the 675 ultrarare variants reported by the Ion Proton sequencer from 27 whole-exome sequencing data but that are not present in either the 1000 Genomes Project and the Exome Aggregation Consortium. We classified positive variant calls into 393 highly likely false positives, 126 likely false positives, and 156 likely true positives, which comprised 58.2%, 18.7%, and 23.1% of the variants, respectively. We identified four distinct error patterns of variant calling that may be bioinformatically corrected when using different strategies: simplicity region, SNV cluster, peripheral sequence read, and base inversion. Local de novo assembly successfully corrected 201 (38.7%) of the 519 highly likely or likely false positives. We also demonstrate that the two sequencing kits from Thermo Fisher (the Ion PI Sequencing 200 kit V3 and the Ion PI Hi-Q kit) exhibit different error profiles across different error types. A refined calling algorithm with better polymerase may improve the performance of the Ion Proton sequencing platform.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Workflow for selecting error-prone variants.
This workflow shows the steps used to select the error-prone variants from 27 VCF (Variant Call Format) files of WES data. The number of loci excluded after each filtering step is indicated. The loci remaining after the filtering steps were classified into four distinct error types, which were considered to be false positives. Note that a locus could be classified into multiple categories if it satisfied multiple error-prone conditions. A call was a highly likely false positive (HLFP) when both experts agreed it was a false positive, a likely false positive (LFP) when the two experts disagreed, and a likely true positive (LTP) when both experts agreed it was a true positive call. Cohen’s kappa coefficient (κ) for the correlation between the two experts was 0.6513.
Fig 2
Fig 2. Read alignment patterns and variant calls classified as false positives.
We captured the alignment status of reads from BAM files. Each plot shows 10 reads per sample (i.e., not full-depth reads). ‘A,’ ‘C,’ ‘G,’ and ‘T’ indicate mismatch bases against the reference genome. ‘I’ and ‘D’ indicate an insertion and a deletion, respectively, introduced into a read. Left and right angle brackets indicate the direction of aligned reads. (a) A call for rs200623371 is classified as a simplicity region error. (b) An erroneous call for rs201971277 immediately after homopolymeric T-repeats was made with HiQ. (c) Reads with T-deletion and T-insertion were made with S200V3 at the same chromosomal position chr5:132335824 as in panel b. (d) An rs201635586 call is classified as an SNV cluster error. (e) An rs199938722 call is classified as a peripheral sequence read. (f) An example of CG-to-GC inversion for a base inversion error, where REF-ALT bases are inverted (chr7:150556054, rs71516432). (g) A misalignment results in erroneous C-deletion and C-insertion calls around a G allele at the same chromosomal position chr7:150556054 as in panel f. (h) An AGC-to-CAG inversion call is shown as an example of a three-base base inversion.
Fig 3
Fig 3. Classification proportions.
(a) Simplicity region and SNV cluster errors dominated, constituting 219 and 172 of the HLFPs, respectively. Peripheral sequence read and base inversion errors were relatively rare. LFPs and LTPs comprised 126 and 156 of the ultrarare variants, respectively. (b) Of the 393 HLFPs, 354 were distinctly classified into 4 categories, and 39 variants met multiple criteria.
Fig 4
Fig 4. HLFP and LFP variants called by two different algorithms.
The variant frequency distribution plot shows the number of variants harbored in certain numbers of samples. Gray and black bars indicate the number of variants called by Torrent Suite Software and the GATK-HC, respectively. Applying local de novo assembly technique provided by the GATK-HC corrected 201 out of 519 HLFPs and LFPs to negative calls. Numbers in the panel under the plot indicate the number of variants for each category.
Fig 5
Fig 5. Comparison of sequencing kit-specific effects between S200V3 and HiQ among four variant-calling error types.
Box plots of the numbers of HLFP variants per sample in kit groups with SNV and INDEL, respectively. Upper and lower box hinges show the 25th and 75th percentile; whiskers extend from hinges is the maximum and minimum values within 1.5 times the interquartile range (IQR); the median is represented by a horizontal bar in the box. *P < 0.05, ***P < 0.001 by Student’s t-test.

Similar articles

Cited by

References

    1. Saudi Mendeliome Group. Comprehensive gene panels provide advantages over clinical exome sequencing for Mendelian diseases. Genome Biol. BioMed Central; 2015;16: 134 doi: 10.1186/s13059-015-0693-2 - DOI - PMC - PubMed
    1. Ravenscroft G, Nolent F, Rajagopalan S, Meireles AM, Paavola KJ, Gaillard D, et al. Mutations of GPR126 are responsible for severe arthrogryposis multiplex congenita. Am J Hum Genet. 2015;96: 955–961. doi: 10.1016/j.ajhg.2015.04.014 - DOI - PMC - PubMed
    1. Begemann M, Zirn B, Santen G, Wirthgen E, Soellner L, Büttel H-M, et al. Paternally Inherited IGF2 Mutation and Growth Restriction. N Engl J Med. Massachusetts Medical Society; 2015;373: 349–356. doi: 10.1056/NEJMoa1415227 - DOI - PubMed
    1. Damiati E, Borsani G, Giacopuzzi E. Amplicon-based semiconductor sequencing of human exomes: performance evaluation and optimization strategies. Hum Genet. Springer Berlin Heidelberg; 2016;135: 499–511. doi: 10.1007/s00439-016-1656-8 - DOI - PMC - PubMed
    1. Quail M, Smith ME, Coupland P, Otto TD, Harris SR, Connor TR, et al. A tale of three next generation sequencing platforms: comparison of Ion torrent, pacific biosciences and illumina MiSeq sequencers. BMC Genomics. BioMed Central; 2012;13: 341 doi: 10.1186/1471-2164-13-341 - DOI - PMC - PubMed