Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2002 Feb;12(2):333-8.
doi: 10.1101/gr.211202.

ViewGene: a graphical tool for polymorphism visualization and characterization

Affiliations

ViewGene: a graphical tool for polymorphism visualization and characterization

Carl Kashuk et al. Genome Res. 2002 Feb.

Abstract

The human genome project is producing an enormous amount of sequence data, based on which single base changes between individuals can be identified. Unfortunately, computer tools that were adequate for sequence assembly are less than ideal for the characterization of polymorphism data [single nucleotide (snp) or insertion/deletion (indel)] and other sequence features, and their relationship to each other. We have developed viewGene as a flexible tool that takes input from a number of sequence formats and analysis programs (Genbank, FASTA, RepeatMasker, Cross match, BLAST, user-defined data) to construct a sequence reference scaffold that can be viewed through a simple graphical interface. polymorphisms generated from many sources can be added to this scaffold through the same sequence formats, with a variety of options to control what is displayed. Large amounts of polymorphism data can be organized so that patterns and haplotypes can be readily discerned. In our laboratory, viewGene has been used to view annotated genbank records, find nonrepetitive sequence fragments for polymorphism detection, and visualize similarity search results. Manipulation, cross-referencing, and haplotype viewing of snp data are essential for quality assessment and identification of variants associated with genetic disease, and viewGene provides all three of these important functions.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Workflow diagram. Flowchart for a hypothetical SNP discovery project, with two possible uses of viewGene. A genomic reference sequence for a region of interest is processed with several sequence analysis programs (RepeatMasker, miropeats, BLAST). viewGene is being used to visualize the unique sequence and neighboring genomic features. Target areas are sequenced in a number of DNA samples, and the resulting sequences are aligned to the reference sequence. viewGene is being used again to compare the cross match results to the BLAST and UCSC data already compiled.
Figure 2
Figure 2
The Features subwindow: Data from GenBank record AL031542, containing 13 exons of the human dystrophin gene, is shown with green boxes representing the exons of “CDS” tags, blue boxes representing “repeat region” tags, and red areas defining “misc feature” tags. The user can click on any box and obtain details about the feature, control which types and classes of features are displayed, and label features. Grey bars above the scale denote areas where sequence GC content is above a threshold value (in this case,40%).
Figure 3
Figure 3
Expanded Features subwindow. The dystrophin region from Fig. 2 with several additions. The Features subwindow has been expanded to separate classes of features, and to present GC content as a line graph. Sequence and exon data are still derived from the GenBank record, but the repeat information (blue boxes) has been replaced by output from RepeatMasker. Clicking on a particular repeat will bring up the corresponding data from the RepeatMasker output. Light blue areas on the Genes line denote internally repeated regions derived from Miropeats. Clicking on one of these areas will bring up the Miropeats output file and, in addition, will highlight the matching area elsewhere on the sequence. A BLAST search of the dystrophin region against the nr database has been loaded into the Matches subwindow. Dark bars denote the matching areas, and green and red tick marks identify base differences between the matched sequence and dystrophin (green for substitutions, red for indels).
Figure 4
Figure 4
The Fragments subwindow. The Features and Matches subwindows have both been condensed. Cross match data from a set of individual samples that were sequenced in the dystrophin region has been added to the Fragments window. The output has been restricted to only those base changes that appear both in Matches (BLAST data) and Fragments (user data). Haplotypes and other patterns among samples can be picked out from this type of view.
Figure 5
Figure 5
viewGene assembly file. The file shown was used to generate Fig. 4. The example has been truncated; the actual file contains 96 cmFragment lines corresponding to the DNA samples used. Each type of output file has several options associated with it, including grouping of related information and controlling loading of information based on quality scores. The file DMD.primers.viewGene contains custom data items in a simple format that can be easily generated by the user from most input sources. Details on the different file types and options for viewGene assembly files, as well as several examples and scripts used for formatting, are available from the viewGene web page.
Figure 6
Figure 6
Protein polymorphisms. An area of the exons from the dystrophin region has been translated to demonstrate an amino acid change. The window contains all of the same controls and subwindows as the “parent” viewGene window (Figs. 2, 3, and 4), but it also allows for protein translation over the six possible reading frames. The figure shows that at base 397–399 in the user data, a base difference changed the amino acid from Asn to Lys.

References

    1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–410. - PubMed
    1. Burge C, Karlin S. Prediction of complete gene structures in human genomic DNA. J Mol Biol. 1997;268:78–94. - PubMed
    1. Cutler DJ, Zwick ME, Carrasquillo MM, Yohn CT, Tobin KP, Kashuk C, Mathews DJ, Shah NA, Eichler EE, Warrington JA, Chakravarti A. High-throughput variation detection and genotyping using microarrays. Genome Res. 2001;11:1913–1925. - PMC - PubMed
    1. Florea L, Hartzell G, Zhang Z, Rubin GM, Miller W. A computer program for aligning a cDNA sequence with a genomic DNA sequence. Genome Res. 1998;8:967–974. - PMC - PubMed
    1. Parsons JD. Miropeats: Graphical DNA sequence comparisons. Comput Appl Biosci. 1995;11:615–619. - PubMed

Publication types