Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2013 Apr 8;8(4):e60585.
doi: 10.1371/journal.pone.0060585. Print 2013.

Revising a personal genome by comparing and combining data from two different sequencing platforms

Affiliations
Comparative Study

Revising a personal genome by comparing and combining data from two different sequencing platforms

Deokhoon Kim et al. PLoS One. .

Abstract

For the robust practice of genomic medicine, sequencing results must be compatible, regardless of the sequencing technologies and algorithms used. Presently, genome sequencing is still an imprecise science and is complicated by differences in the chemistry, coverage, alignment, and variant-calling algorithms. We identified ~3.33 million single nucleotide variants (SNVs) and ~3.62 million SNVs in the SJK genome using SOLiD and Illumina data, respectively. Approximately 3 million SNVs were concordant between the two platforms while 68,532 SNVs were discordant; 219,616 SNVs were SOLiD-specific and 516,080 SNVs were Illumina-specific (i.e., platform-specific). Concordant, discordant, and platform-specific SNVs were further analyzed and characterized. Overall, a large portion of heterozygous SNVs that were discordant with genotyping calls of single nucleotide polymorphism chips were highly confident. Approximately 70% of the platform-specific SNVs were located in regions containing repetitive sequences. Such platform-specificity may arise from differences between platforms, with regard to read length (36 bp and 72 bp vs. 50 bp), insert size (~100-300 bp vs. ~1-2 kb), sequencing chemistry (sequencing-by-synthesis using single nucleotides vs. ligation-based sequencing using oligomers), and sequencing quality. When data from the two platforms were merged for variant calling, the proportion of callable regions of the reference genome increased to 99.66%, which was 1.43% higher than the average callability of the two platforms, representing ~40 million bases. In this study, we compared the differences in sequencing results between two sequencing platforms. Approximately 90% of the SNVs were concordant between the two platforms, yet ~10% of the SNVs were either discordant or platform-specific, indicating that each platform had its own strengths and weaknesses. When data from the two platforms were merged, both the overall callability of the reference genome and the overall accuracy of the SNVs improved, demonstrating the likelihood that a re-sequenced genome can be revised using complementary data.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: W.Y.K., H.Y., S.Y.S., J.L., Y.H., Y.W., and Y.S.L. are employees of Samsung SDS, a public company that develops and markets bioinformatics services. The Bioinformatics team of Samsung SDS contributed to the bioinformatics analysis of this study, in which no patents or products (either marketed or in development) of Samsung SDS were used. All other authors have declared that no competing interests exist. There are no patents, products in development, or marketed products to declare. This does not alter the authors' adherence to all PLOS ONE policies on sharing data and materials.

Figures

Figure 1
Figure 1. Concordance of SNVs identified by the two different sequencing platforms.
Figure 2
Figure 2. Cumulative frequency plot of sequencing depths in heterozygous calls. The sequencing depths of heterozygous calls in the SOLiD data are plotted.
The patterns of concordant SNVs that are either chip-concordant or chip-discordant are almost compatible, which explains why the majority of heterozygous concordant SNVs that are chip-concordant are highly confident calls. In contrast, the median of discordant SNVs that are chip-discordant is substantially lower than those of concordant SNVs, which explains why only 25% of them are highly confident calls.
Figure 3
Figure 3. Callable and non-callable regions.
(A) Using Illumina and SOLiD data, 98.3% and 98.15% of the reference genome are callable, respectively. Using the merged data, the callability increases to 99.66%, which is 1.43% higher than the average callability of two platforms, representing about 40 million bases. (B) The base composition of non-callable regions. In SOLiD data, the proportions of A, T, C, and G were almost even. In Illumina data, the proportions of A and T were higher than those of C and G.

Similar articles

Cited by

References

    1. Mardis ER (2008) The impact of next-generation sequencing technology on genetics. Trends Genet 24: 133–141. - PubMed
    1. Shendure J, Ji H (2008) Next-generation DNA sequencing. Nat Biotechnol 26: 1135–1145. - PubMed
    1. Ahn SM, Kim TH, Lee S, Kim D, Ghang H, et al. (2009) The first Korean genome sequence and analysis: full genome sequencing for a socio-ethnic group. Genome Res 19: 1622–1629. - PMC - PubMed
    1. Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, et al. (2008) Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456: 53–59. - PMC - PubMed
    1. Drmanac R, Sparks AB, Callow MJ, Halpern AL, Burns NL, et al. (2010) Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science 327: 78–81. - PubMed

Publication types

LinkOut - more resources