Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 May 23;2(5):e460.
doi: 10.1371/journal.pone.0000460.

Virtual Northern analysis of the human genome

Affiliations

Virtual Northern analysis of the human genome

Evan H Hurowitz et al. PLoS One. .

Abstract

Background: We applied the Virtual Northern technique to human brain mRNA to systematically measure human mRNA transcript lengths on a genome-wide scale.

Methodology/principal findings: We used separation by gel electrophoresis followed by hybridization to cDNA microarrays to measure 8,774 mRNA transcript lengths representing at least 6,238 genes at high (>90%) confidence. By comparing these transcript lengths to the Refseq and H-Invitational full-length cDNA databases, we found that nearly half of our measurements appeared to represent novel transcript variants. Comparison of length measurements determined by hybridization to different cDNAs derived from the same gene identified clones that potentially correspond to alternative transcript variants. We observed a close linear relationship between ORF and mRNA lengths in human mRNAs, identical in form to the relationship we had previously identified in yeast. Some functional classes of protein are encoded by mRNAs whose untranslated regions (UTRs) tend to be longer or shorter than average; these functional classes were similar in both human and yeast.

Conclusions/significance: Human transcript diversity is extensive and largely unannotated. Our length dataset can be used as a new criterion for judging the completeness of cDNAs and annotating mRNA sequences. Similar relationships between the lengths of the UTRs in human and yeast mRNAs and the functions of the proteins they encode suggest that UTR sequences serve an important regulatory role among eukaryotes.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Virtual Northern scheme.
Figure 2
Figure 2. Example length profile with deconvolution results.
An example length profile is shown in blue. The normalized ratio from each length fraction is plotted against the length fractions in order, where the first length fraction is the one with the highest gel mobility. The rolling baseline is shown in green, and the deconvolution result is shown in red.
Figure 3
Figure 3. Calibrating the relationship between gel mobility and transcript length.
The precise gel mobilities of all peaks from gold standard genes are plotted against the natural log of the sum of their matching Refseq length and an estimated poly(A) tail length of 225 nucleotides. The least squares fit to a line is shown by a black line with the parameters y = 0.054731 x+5.997276 (R2 = 0.99). Closed circles represent points used to determine the calibration line. Points shown by open circles were excluded from the least squares calculation.
Figure 4
Figure 4. Solute carrier family 2 (facilitated glucose transporter), member 1 (SLC2A1) gene.
The SLC2A1 gene is pictured schematically. The transcribed portion of the gene is shown with the filled boxes representing exons. The ORF is represented by the taller boxes. The genomic positions of four cDNA clones that map only to the SLC2A1 gene, and a proposed novel SLC2A1 transcript, are shown relative to the SLC2A1 gene. The transcript length measured for each clone is shown in parentheses.
Figure 5
Figure 5. Relationship between ORF length and transcript length.
Refseq length in nucleotides is plotted against ORF length in nucleotides. The black line is the linear least squares fit. It has the parameters mRNA = 1.03 (ORF)+1263 (R2 = 0.74).
Figure 6
Figure 6. Deconvolution kernel.
The stereotypical peak used as the deconvolution kernel to identify potential peaks is shown.
Figure 7
Figure 7. Calibration between bootstrap value and true positive rate.
The bootstrap value is plotted against the true positive rate for bootstrap values from 0 to 0.6. The least squares fit to a sigmoidal equation is shown by a solid line with the parameters y = −0.07563+1.09892/(1+e (0.15928−x)/0.08257).

References

    1. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. - PubMed
    1. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, et al. The sequence of the human genome. Science. 2001;291:1304–1351. - PubMed
    1. International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature. 2004;431:931–45. - PubMed
    1. Imanishi T, Itoh T, Suzuki Y, O'Donovan C, Fukuchi S, et al. Integrative annotation of 21,037 human genes validated by full-length cDNA clones. PLOS Biology. 2004;2:1–20. - PMC - PubMed
    1. Suzuki Y, Yoshitomo-Nakagawa K, Maruyama K, Suyama A, Sugano S. Construction and characterization of a full length-enriched and a 5′-end-enriched cDNA library. Gene. 1997;200:149–56. - PubMed

Publication types