Detection of single nucleotide variations in expressed exons of the human genome using RNA-Seq

doi:10.1093/nar/gkp507

. 2009 Sep;37(16):e106.

doi: 10.1093/nar/gkp507. Epub 2009 Jun 15.

Detection of single nucleotide variations in expressed exons of the human genome using RNA-Seq

Iouri Chepelev¹, Gang Wei, Qingsong Tang, Keji Zhao

Affiliations

PMID: 19528076
PMCID: PMC2760790
DOI: 10.1093/nar/gkp507

Detection of single nucleotide variations in expressed exons of the human genome using RNA-Seq

Iouri Chepelev et al. Nucleic Acids Res. 2009 Sep.

. 2009 Sep;37(16):e106.

doi: 10.1093/nar/gkp507. Epub 2009 Jun 15.

Authors

Iouri Chepelev¹, Gang Wei, Qingsong Tang, Keji Zhao

Affiliation

¹ Laboratory of Molecular Immunology, National Heart, Lung and Blood Institute, NIH, Bethesda, MD 20892, USA.

PMID: 19528076
PMCID: PMC2760790
DOI: 10.1093/nar/gkp507

Abstract

Whole-genome resequencing is still a costly method to detect genetic mutations that lead to altered forms of proteins and may be associated with disease development. Since the majority of disease-related single nucleotide variations (SNVs) are found in protein-coding regions, we propose to identify SNVs in expressed exons of the human genome using the recently developed RNA-Seq technique. We identify 12 176 and 10 621 SNVs, respectively, in Jurkat T cells and CD4(+) T cells from a healthy donor. Interestingly, our data show that one copy of the TAL-1 proto-oncogene has a point mutation in 3' UTR and only the mutant allele is expressed in Jurkat cells. We provide a comprehensive dataset for further understanding the cancer biology of Jurkat cells. Our results indicate that this is a cost-effective and efficient strategy to systematically identify SNVs in the expressed regions of the human genome.

PubMed Disclaimer

Figures

**Figure 1.**
The flow chart of single nucleotide variations identification in expressed exons using RNA-Seq.

**Figure 2.**
Redundant Reads Filter and SNV probability calculation examples. (A) There are nine reads that map uniquely to the same genomic location (top box). Nucleotide mismatches with reference sequence are highlighted in red. Filter 1 retains a single copy of each read. Thus, only five reads remain after Filter 1 is applied (middle box). There are two U1 reads, two U2 reads and one U0 read in the middle box. Filter 2 randomly selects one U1, one U2 and one U0 read. This leaves three reads at the same genomic location (bottom box). (B) Example of SNV probability calculation. Colored in red is a candidate SNV site. Seven short reads map uniquely to that site. The reference nucleotide is T. Five reads have nucleotides that differ from the reference nucleotide and two reads have nucleotide T at the candidate SNV site. Let the error rate estimated from the total number of U0, U1 and U2 nonredundant reads be q = 0.02. The binomial (random chance) probability to observe two matches and five mismatches at the same location is proportional to q⁵ (1−q)². The P-value is given by the binomial probability of observing five or more mismatches in a seven-read alignment and it is equal to 6.5 × 10^–8.

**Figure 3.**
Demonstration that Redundant Reads Filter is necessary. (A) As described in ‘Material and methods’ section, application of redundant reads filter (Filter 1 + Filter 2) to uniquely mapped reads leaves at most three reads at a given genomic location: one U0, one U1 and one U2 read. By restricting the number of reads that can map to the same genomic location, we reduce false-positive rate of SNV detection. The evidence for presence of SNV comes mainly from overlapping but noncoincident reads. There are many overlapping but noncoincident reads that can cover a single SNV. In fact, there can still be as many as 90 reads of length 30 bp that cover a single SNV after the filtering step. Thus, the statistical power to detect the SNV is not reduced by the filtering procedure. (B) The number of detected (P-value = 10^–9) known, i.e. SNPs from dbSNP database, and unknown (novel) SNVs using reads filtered using four different filters: Filter A is the Redundant Reads Filter; Filter B is Filter 1 followed by randomly selecting two reads each from U1 and U2 categories; Filter C is Filter 1 followed by randomly selecting three reads each from U1 and U2 categories; the last filter is an empty filter, i.e. no filtering of unique reads is done. The number of detected known SNVs is not sensitive to the filtering method used, confirming very low false-positive rate among detected known SNVs. However, the number of detected unknown SNVs is much higher for the cases of Filters B, C and No filter than for Filter A, demonstrating high false-positive rates resulting from the use of these alternative filters. Thus, Filter A is the best of four filters.

**Figure 4.**
Reads coverage analysis and cost analysis of SNV detection. (A) Percentage of exonic sequences passing coverage threshold. Three curves correspond to different numbers of uniquely mapped nonredundant reads: 13 million (Jurkat), 7 million (random subsample of 50% Jurkat reads) and 26 million (Jurkat + CD4). For example, about 30% of exonic regions are covered at least 5-fold by nonredundant uniquely mapped reads in Jurkat sample. In the combined Jurkat and CD4 sample, about 40% of exonic regions are covered at least 5-fold. (B) Two curves correspond to estimates of sequencing costs for homozygous (red curve) and heterozygous (blue dotted curve) SNV detection in CD4⁺ sample. About 80% of all homozygous SNVs in expressed (RPKM ≥ 1) exons can be detected using 67 million 30-bp nonredundant unique reads (∼2000 Mbp). At this sequencing depth, about 55% of all heterozygous SNVs in expressed exons can be detected. (See ‘Materials and methods’ section for details on derivation of cost curves).

**Figure 5.**
Summary of results. (A) Venn diagram of single nucleotide variants (SNVs) detected in Jurkat and CD4 samples. (B) Summary table of SNVs detected in Jurkat and CD4 samples. Shown in the brackets are numbers of SNVs that are novel, i.e. not present in dbSNP Build 126 database.

See this image and copyright information in PMC

Cited by

Inconsistency and features of single nucleotide variants detected in whole exome sequencing versus transcriptome sequencing: A case study in lung cancer.
O'Brien TD, Jia P, Xia J, Saxena U, Jin H, Vuong H, Kim P, Wang Q, Aryee MJ, Mino-Kenudson M, Engelman JA, Le LP, Iafrate AJ, Heist RS, Pao W, Zhao Z. O'Brien TD, et al. Methods. 2015 Jul 15;83:118-27. doi: 10.1016/j.ymeth.2015.04.016. Epub 2015 Apr 23. Methods. 2015. PMID: 25913717 Free PMC article.
Whole transcriptome analyses of six thoroughbred horses before and after exercise using RNA-Seq.
Park KD, Park J, Ko J, Kim BC, Kim HS, Ahn K, Do KT, Choi H, Kim HM, Song S, Lee S, Jho S, Kong HS, Yang YM, Jhun BH, Kim C, Kim TH, Hwang S, Bhak J, Lee HK, Cho BW. Park KD, et al. BMC Genomics. 2012 Sep 12;13:473. doi: 10.1186/1471-2164-13-473. BMC Genomics. 2012. PMID: 22971240 Free PMC article.
Development of Transcriptomic Markers for Population Analysis Using Restriction Site Associated RNA Sequencing (RARseq).
Alabady MS, Rogers WL, Malmberg RL. Alabady MS, et al. PLoS One. 2015 Aug 4;10(8):e0134855. doi: 10.1371/journal.pone.0134855. eCollection 2015. PLoS One. 2015. PMID: 26241739 Free PMC article.
Transcriptome analysis in switchgrass discloses ecotype difference in photosynthetic efficiency.
Serba DD, Uppalapati SR, Krom N, Mukherjee S, Tang Y, Mysore KS, Saha MC. Serba DD, et al. BMC Genomics. 2016 Dec 16;17(1):1040. doi: 10.1186/s12864-016-3377-8. BMC Genomics. 2016. PMID: 27986076 Free PMC article.
Statistical design and analysis of RNA sequencing data.
Auer PL, Doerge RW. Auer PL, et al. Genetics. 2010 Jun;185(2):405-16. doi: 10.1534/genetics.110.114983. Epub 2010 May 3. Genetics. 2010. PMID: 20439781 Free PMC article.

See all "Cited by" articles

References

1. Ley TJ, Mardis ER, Ding L, Fulton B, McLellan MD, Chen K, Dooling D, Dunford-Shore BH, McGrath S, Hickenbotham M, et al. DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature. 2008;456:66–72. - PMC - PubMed
1. Stenson PD, Ball EV, Mort M, Phillips AD, Shiel JA, Thomas NS, Abeysinghe S, Krawczak M, Cooper DN. Human Gene Mutation Database (HGMD): 2003 update. Hum. Mutat. 2003;21:577–581. - PubMed
1. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods. 2008;5:621–628. - PubMed
1. Sultan M, Schulz MH, Richard H, Magen A, Klingenhoff A, Scherf M, Seifert M, Borodina T, Soldatov A, Parkhomchuk D, et al. A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science. 2008;321:956–960. - PubMed
1. Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science. 2008;320:1344–1349. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

Grants and funding

Intramural NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

[1] Ley TJ, Mardis ER, Ding L, Fulton B, McLellan MD, Chen K, Dooling D, Dunford-Shore BH, McGrath S, Hickenbotham M, et al. DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature. 2008;456:66–72. - PMC - PubMed

[2] Ley TJ, Mardis ER, Ding L, Fulton B, McLellan MD, Chen K, Dooling D, Dunford-Shore BH, McGrath S, Hickenbotham M, et al. DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature. 2008;456:66–72. - PMC - PubMed

[3] Stenson PD, Ball EV, Mort M, Phillips AD, Shiel JA, Thomas NS, Abeysinghe S, Krawczak M, Cooper DN. Human Gene Mutation Database (HGMD): 2003 update. Hum. Mutat. 2003;21:577–581. - PubMed

[4] Stenson PD, Ball EV, Mort M, Phillips AD, Shiel JA, Thomas NS, Abeysinghe S, Krawczak M, Cooper DN. Human Gene Mutation Database (HGMD): 2003 update. Hum. Mutat. 2003;21:577–581. - PubMed

[5] Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods. 2008;5:621–628. - PubMed

[6] Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods. 2008;5:621–628. - PubMed

[7] Sultan M, Schulz MH, Richard H, Magen A, Klingenhoff A, Scherf M, Seifert M, Borodina T, Soldatov A, Parkhomchuk D, et al. A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science. 2008;321:956–960. - PubMed

[8] Sultan M, Schulz MH, Richard H, Magen A, Klingenhoff A, Scherf M, Seifert M, Borodina T, Soldatov A, Parkhomchuk D, et al. A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science. 2008;321:956–960. - PubMed

[9] Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science. 2008;320:1344–1349. - PMC - PubMed

[10] Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science. 2008;320:1344–1349. - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Detection of single nucleotide variations in expressed exons of the human genome using RNA-Seq

Affiliation

Detection of single nucleotide variations in expressed exons of the human genome using RNA-Seq

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases

Research Materials

Miscellaneous

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases

Research Materials

Miscellaneous