Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2010;11(5):R57.
doi: 10.1186/gb-2010-11-5-r57. Epub 2010 May 28.

Screening the human exome: a comparison of whole genome and whole transcriptome sequencing

Affiliations
Comparative Study

Screening the human exome: a comparison of whole genome and whole transcriptome sequencing

Elizabeth T Cirulli et al. Genome Biol. 2010.

Abstract

Background: There is considerable interest in the development of methods to efficiently identify all coding variants present in large sample sets of humans. There are three approaches possible: whole-genome sequencing, whole-exome sequencing using exon capture methods, and RNA-Seq. While whole-genome sequencing is the most complete, it remains sufficiently expensive that cost effective alternatives are important.

Results: Here we provide a systematic exploration of how well RNA-Seq can identify human coding variants by comparing variants identified through high coverage whole-genome sequencing to those identified by high coverage RNA-Seq in the same individual. This comparison allowed us to directly evaluate the sensitivity and specificity of RNA-Seq in identifying coding variants, and to evaluate how key parameters such as the degree of coverage and the expression levels of genes interact to influence performance. We find that although only 40% of exonic variants identified by whole genome sequencing were captured using RNA-Seq; this number rose to 81% when concentrating on genes known to be well-expressed in the source tissue. We also find that a high false positive rate can be problematic when working with RNA-Seq data, especially at higher levels of coverage.

Conclusions: We conclude that as long as a tissue relevant to the trait under study is available and suitable quality control screens are implemented, RNA-Seq is a fast and inexpensive alternative approach for finding coding variants in genes with sufficiently high expression levels.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Sensitivity and specificity as a function of the amount of sequence data generated. Shown for all exons, core exons, and exons that are well expressed in PBMCs, designated as an expression level of at least 4% of the most highly expressed transcript in PBMCs. There were approximately 35 million sequence reads in each lane.
Figure 2
Figure 2
Sensitivity and specificity by PBMC expression level. The level of PBMC expression was broken up into bins based on a log scale. The expression value is written as the percent of the most highly expressed transcript in the dataset. The measures of sensitivity and specificity are shown for increasing levels of PBMC expression, for sequence data from one lane, four lanes and eight lanes. There were approximately 35 million sequence reads in each lane.
Figure 3
Figure 3
True positive SNVs identified as a function of the amount of sequence data generated. The number of true positive SNVs identified by RNA-Seq is shown for between one and eight lanes of sequence data, for exonic, core exonic and PBMC-expressed SNVs. PBMC-expressed genes are designated as those with an expression level of at least 4% of the most highly expressed PBMC transcript. There were approximately 35 million sequence reads in each lane.
Figure 4
Figure 4
Distribution of genes by PBMC expression level. The number of genes lying within each PBMC expression level bin is shown in red. The cumulative number of genes expressed above each expression level is listed in blue. The expression value is written as the percent of the most highly expressed transcript in the dataset.

Similar articles

Cited by

References

    1. Botstein D, Risch N. Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease. Nat Genet. 2003;33(Suppl):228–237. doi: 10.1038/ng1090. - DOI - PubMed
    1. Chepelev I, Wei G, Tang Q, Zhao K. Detection of single nucleotide variations in expressed exons of the human genome using RNA-Seq. Nucleic Acids Res. 2009;37:e106. doi: 10.1093/nar/gkp507. - DOI - PMC - PubMed
    1. Ng SB, Turner EH, Robertson PD, Flygare SD, Bigham AW, Lee C, Shaffer T, Wong M, Bhattacharjee A, Eichler EE, Bamshad M, Nickerson DA, Shendure J. Targeted capture and massively parallel sequencing of 12 human exomes. Nature. 2009;461:272–276. doi: 10.1038/nature08250. - DOI - PMC - PubMed
    1. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. - DOI - PMC - PubMed
    1. Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics. 2009;25:1105–1111. doi: 10.1093/bioinformatics/btp120. - DOI - PMC - PubMed

Publication types