Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2004 Jul;14(7):1413-23.
doi: 10.1101/gr.2111304. Epub 2004 Jun 14.

A transcript finishing initiative for closing gaps in the human transcriptome

Mari Cleide SogayarAnamaria A CamargoFabiana BettoniDirce Maria CarraroLilian C PiresRaphael B ParmigianiElisa N FerreiraEloísa de Sá MoreiraMaria do Rosário D de O LatorreAndrew J G SimpsonLuciana Oliveira CruzTheri Leica DegakiFernanda FestaKatlin B MassirerMari C SogayarFernando Camargo FilhoLuiz Paulo CamargoMarco A V CunhaSandro J De SouzaMilton Faria JrSilvana GiuliattiLeonardo KoppPaulo S L de OliveiraPaulo B PaivaAnderson A PereiraDaniel G PinheiroRenato D PugaJorge Estefano S de SouzaDulcineia M AlbuquerqueLuís E C AndradeGilson S BaiaMarcelo R S BrionesAna M S Cavaleiro-LunaJanete M CeruttiFernando F CostaEugenia Costanzi-StraussEnilza M EspreaficoAdriana C FerrasiEmer S FerroMaria A H Z FortesJoelma R F FurchiDaniel Giannella-NetoGustavo H GoldmanMaria H S GoldmanArthur GruberGustavo S GuimarãesChristine HackelFlavio Henrique-SilvaEdna T KimuraSuzana G LeoniCláudia MacedoBettina MalnicCarina V Manzini BSuely K N MarieNilce M Martinez-RossiMarcelo MenossiElisabete C MiraccaMaria A NagaiFrancisco G NobregaMarina P NobregaSueli M Oba-ShinjoMárika K OliveiraGuilherme M OrabonaAudrey Y OtsukaMaria L Paço-LarsonBeatriz M C PaixãoJose R C PandolfiMaria I M C PardiniMaria R Passos BuenoGeraldo A S PassosJoao B PesqueroJuliana G PessoaPaula RahalCláudia A RainhoCaroline P ReisTatiana I RiccaVanderlei RodriguesSilvia R RogattoCamila M RomanoJanaína G RomeiroAntonio RossiRenata G SáMagaly M SalesSimone C Sant'AnnaPatrícia L SantarosaFernando SegatoWilson A Silva JrIsmael D C G SilvaNeusa P SilvaAndrea Soares-CostaMaria F SonatiBryan E StraussEloiza H TajaraSandro R ValentiniFabiola E VillanovaLaura S WardDalila L ZanetteLudwig-FAPESP Transcript Finishing Initiative
Comparative Study

A transcript finishing initiative for closing gaps in the human transcriptome

Mari Cleide Sogayar et al. Genome Res. 2004 Jul.

Abstract

We report the results of a transcript finishing initiative, undertaken for the purpose of identifying and characterizing novel human transcripts, in which RT-PCR was used to bridge gaps between paired EST clusters, mapped against the genomic sequence. Each pair of EST clusters selected for experimental validation was designated a transcript finishing unit (TFU). A total of 489 TFUs were selected for validation, and an overall efficiency of 43.1% was achieved. We generated a total of 59,975 bp of transcribed sequences organized into 432 exons, contributing to the definition of the structure of 211 human transcripts. The structure of several transcripts reported here was confirmed during the course of this project, through the generation of their corresponding full-length cDNA sequences. Nevertheless, for 21% of the validated TFUs, a full-length cDNA sequence is not yet available in public databases, and the structure of 69.2% of these TFUs was not correctly predicted by computer programs. The TF strategy provides a significant contribution to the definition of the complete catalog of human genes and transcripts, because it appears to be particularly useful for identification of low abundance transcripts expressed in a restricted set of tissues as well as for the delineation of gene boundaries and alternatively spliced isoforms.

PubMed Disclaimer

Figures

Figure 1
Figure 1
TFI graphical interface. The TFI graphical interface displays a region of the human genome sequence as a yellow line, with a scale in base pairs (bp). Expressed sequence tags (ESTs) that align with the genome sequence are shown in different colors, depending on the project of origin: ORESTES from the FAPESP/LICR Human Cancer Genome Project in purple; CGAP in green, MGC in blue, and TFI in yellow, with splicing structures represented as gray lines. The interface shows an experimentally validated TFU (number 171) joining two EST clusters. The TFI interface also provides information on the tissue of origin of the transcript sequences, the percentage of similarity of each exon with the human genome sequence, and the presence of 3′ tags represented as green triangles.
Figure 2
Figure 2
General scheme of the TFI strategy. Schematic outline of the strategy used for computational and experimental validation of TFU sequences. Following the development of bioinformatics tools, the generation of the transcriptome database, and automatic cluster selection, the project tasks were divided between the coordination and the validation laboratories.
Figure 3
Figure 3
Characterization and annotation of validated TFUs. Alignment of four consensus sequences, derived from the validated TFUs, to the July 2003 version of the UCSC human genome sequence assembly, using the BLAT search tool. (A) TFU00023 corresponds to YourSeq (black) completely overlapping with known genes based on SWISS-PROT, TrEMBL, mRNA, and RefSeq (dark blue). (B) TFU01102 represents a 5′ extension of a partial cDNA (FLJ23834). (C) TFU01013 represents a new human transcript structure that was correctly predicted by ab initio gene predition transcripts, such as Fgenesh++ (green). (D) TFU00125 represents a new human transcript with no predicted transcripts described by gene prediction programs.
Figure 4
Figure 4
Experimental validation of MGC5601 gene alternative splicing isoforms. (A) Gene structure for exons XVI-XIX (boxes) of the MGC5601 gene located on chromosome 12. Introns are represented by lines. Two alternative exons are shown on TFU reads, and a hypothetical combination of these two exons is also shown. Sequence F07R has an extra exon between exon XVII and XVIII. Sequence A01R has an extended exon XIX. Four primers were designed for validation tests, as indicated in the figure (P1–P4), and each pair of primers were assayed against all 22 cDNA preparations without pooling. (B) We detected all four of these alternative splicing isoforms in MGC5601. Numbers one through four indicate the tissues from which the cDNA was obtained (1, multiform glioblastoma; 2, glioblastoma; 3, prostate carcinoma; and 4, primary kidney cell culture). The sizes of the bands obtained are indicated. L indicates 100-bp ladder.

References

    1. Adams, M.D., Kelley, J.M., Gocayne, J.D., Dubnick, M., Polymeropoulos, M.H., Xiao, H., Merril, C.R., Wu, A., Olde, B., and Moreno, R.F. 1991. Complementary DNA sequencing: Expressed sequence tags and human genome project. Science 252: 1651–1656. - PubMed
    1. Adams, M.D., Dubnick, M., Kerlavage, A.R., Moreno, R., Kelley, J.M., Utterback, T.R., Nagle, J.W., Fields, C., and Venter, J.C. 1992. Sequence identification of 2375 human brain genes. Nature 355: 632–634. - PubMed
    1. Adams, M.D., Kerlavage, A.R., Fields, C., and Venter, J.C. 1993. 3,400 new expressed sequence tags identify diversity of transcripts in human brain. Nat. Genet. 4: 256–267. - PubMed
    1. Bailey, L.C., Searls Jr., D.B., and Overton, G.C. 1998. Analysis of EST-driven gene annotation in human genomic sequence. Genome Res. 8: 362–376. - PubMed
    1. Batzoglou, S., Pachter, L., Mesirov, J.P., Berger, B., and Lander, E.S. 2000. Human and mouse gene structure: Comparative analysis and application to exon prediction. Genome Res. 10: 950–958. - PMC - PubMed

WEB SITE REFERENCES

    1. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=unigene; Unigene home page.
    1. http://www.atcc.org; American Type Culture Collection home page.
    1. http://www.ncbi.nlm.nih.gov/refseq; RefSeq home page.
    1. http://www.repeatmasker.org; RepeatMasker program.
    1. http://genome.ucsc.edu/cgi-bin/hgBlat; University of California Santa Cruz, Genome Browser. - PubMed

Publication types

MeSH terms

Associated data