Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Dec 2:12:590.
doi: 10.1186/1471-2164-12-590.

Revealing the missing expressed genes beyond the human reference genome by RNA-Seq

Affiliations

Revealing the missing expressed genes beyond the human reference genome by RNA-Seq

Geng Chen et al. BMC Genomics. .

Abstract

Background: The complete and accurate human reference genome is important for functional genomics researches. Therefore, the incomplete reference genome and individual specific sequences have significant effects on various studies.

Results: we used two RNA-Seq datasets from human brain tissues and 10 mixed cell lines to investigate the completeness of human reference genome. First, we demonstrated that in previously identified ~5 Mb Asian and ~5 Mb African novel sequences that are absent from the human reference genome of NCBI build 36, ~211 kb and ~201 kb of them could be transcribed, respectively. Our results suggest that many of those transcribed regions are not specific to Asian and African, but also present in Caucasian. Then, we found that the expressions of 104 RefSeq genes that are unalignable to NCBI build 37 in brain and cell lines are higher than 0.1 RPKM. 55 of them are conserved across human, chimpanzee and macaque, suggesting that there are still a significant number of functional human genes absent from the human reference genome. Moreover, we identified hundreds of novel transcript contigs that cannot be aligned to NCBI build 37, RefSeq genes and EST sequences. Some of those novel transcript contigs are also conserved among human, chimpanzee and macaque. By positioning those contigs onto the human genome, we identified several large deletions in the reference genome. Several conserved novel transcript contigs were further validated by RT-PCR.

Conclusion: Our findings demonstrate that a significant number of genes are still absent from the incomplete human reference genome, highlighting the importance of further refining the human reference genome and curating those missing genes. Our study also shows the importance of de novo transcriptome assembly. The comparative approach between reference genome and other related human genomes based on the transcriptome provides an alternative way to refine the human reference genome.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Overview of identification of the missing expressed genes beyond the human reference genome. Human brain and cell transcriptome sequencing reads were used to validate the transcribed regions in Asian and African novel sequences, quantify the expression of unalignable RefSeq genes and identify novel transcript contigs.
Figure 2
Figure 2
The expression levels of those unalignable RefSeq genes in brain and cell lines. The threshold is 0.1 RPKM (Reads Per Kilobase of the transcript per Million mapped reads).
Figure 3
Figure 3
RT-PCR validating of conserved novel transcript contigs. Six conserved novel transcript contigs were validated expressed in three different types of human normal cells. Because gene expression usually exhibit temporal and spatial specificity, not all those novel transcript contigs were validated in every type of normal human cells. MCF10A: normal human breast cell; hFOB: human fetal osteoblast; 293T: human embryonic kidney cell; β-ACTIN: positive control; Luciferase: negative control; Marker: sm0331 DNA Ladder Mix.

References

    1. Sebat J, Lakshmi B, Troge J, Alexander J, Young J, Lundin P, Maner S, Massa H, Walker M, Chi M. et al.Large-scale copy number polymorphism in the human genome. Science. 2004;305(5683):525–528. doi: 10.1126/science.1098918. - DOI - PubMed
    1. Tuzun E, Sharp AJ, Bailey JA, Kaul R, Morrison VA, Pertz LM, Haugen E, Hayden H, Albertson D, Pinkel D. et al.Fine-scale structural variation of the human genome. Nat Genet. 2005;37(7):727–732. doi: 10.1038/ng1562. - DOI - PubMed
    1. Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, Carson AR, Chen W. et al.Global variation in copy number in the human genome. Nature. 2006;444(7118):444–454. doi: 10.1038/nature05329. - DOI - PMC - PubMed
    1. Hinds DA, Kloek AP, Jen M, Chen X, Frazer KA. Common deletions and SNPs are in linkage disequilibrium in the human genome. Nat Genet. 2006;38(1):82–85. doi: 10.1038/ng1695. - DOI - PubMed
    1. McCarroll SA, Hadnott TN, Perry GH, Sabeti PC, Zody MC, Barrett JC, Dallaire S, Gabriel SB, Lee C, Daly MJ. et al.Common deletion polymorphisms in the human genome. Nat Genet. 2006;38(1):86–92. doi: 10.1038/ng1696. - DOI - PubMed

Publication types