Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 1999 Nov 9;96(23):13241-6.
doi: 10.1073/pnas.96.23.13241.

Integrated pararetroviral sequences define a unique class of dispersed repetitive DNA in plants

Affiliations

Integrated pararetroviral sequences define a unique class of dispersed repetitive DNA in plants

J Jakowitsch et al. Proc Natl Acad Sci U S A. .

Abstract

Although integration of viral DNA into host chromosomes occurs regularly in bacteria and animals, there are few reported cases in plants, and these involve insertion at only one or a few sites. Here, we report that pararetrovirus-like sequences have integrated repeatedly into tobacco chromosomes, attaining a copy number of approximately 10(3). Insertion apparently occurred by illegitimate recombination. From the sequences of 22 independent insertions recovered from a healthy plant, an 8-kilobase genome encoding a previously uncharacterized pararetrovirus that does not contain an integrase function could be assembled. Preferred boundaries of the viral inserts may correspond to recombinogenic gaps in open circular viral DNA. An unusual feature of the integrated viral sequences is a variable tandem repeat cluster, which might reflect defective genomes that preferentially recombine into plant DNA. The recurrent invasion of pararetroviral DNA into tobacco chromosomes demonstrates that viral sequences can contribute significantly to plant genome evolution.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Structure of integrated TPVL sequences. Twenty-two independent clones from two tobacco genomic λ libraries, made with Sau3AI (V) or EcoRI (E), were fully or partially sequenced. White bars represent TPLV sequences; black bars indicate plant DNA. The extent of the putative TPV genome, beginning with nucleotide 1 (tRNA binding site) at the left and ending variably around nucleotide 8,000 at the right, is bound by two vertical lines. The order of TPV ORFs and relevant restriction enzyme sites are shown at the top. Repeated regions are shaded, including the block of tandem repeats in the putative TPV leader region at the end of ORF4; the NTS9 tandem repeat in flanking plant DNA in clone V1; and short duplications of TPLV sequences at the right of V3, V6, and E21. In-frame deletions are indicated by paired vertical lines within the white bars and are spaced according to the size (3, 6, 12, or 15 bp). Frameshifts are denoted by arrowheads; stop codons by diamonds. Narrow lines connecting white bars represent gaps in the sequence; spaces between unconnected white bars represent unsequenced regions. Gray bars above junctions in V1, V5, V7, V8, and V13 indicate sequenced PCR fragments synthesized from tobacco DNA. Asterisks at the left and right of V14 indicate a short triplication of TPLV sequence. The × in E21 and V3 indicates a short region in inverse orientation in the two clones relative to the shaded sequence, which is duplicated in E21. The inverted duplication of TPVL sequences in V4 is represented by white arrowheads. T7 and T3 signify the end of clones; the extent of other clones was not determined. Because this represents a linear projection of the circular map, ends of clones—depending on their position—can appear to be located internally. Vertical arrows at the top point out the position of junctions between TPLV sequences and tobacco DNA. Abbreviations: CP, coat protein; MP, movement protein; POL, polyprotein; TAV, transactivation protein. Three short ORFs of unknown function are between 7 and 8 kb.
Figure 2
Figure 2
Comparison of the genomic organization of cassava vein mosaic virus (CsVMV) (20) and the putative tobacco pararetrovirus (TPV). The putative TPV genome, which was assembled from cloned TPLV sequences, did not contain a recognizable ORF5. Three short TPV ORFs of unknown significance in the putative leader after ORF4 are unlabeled. Such short ORFs are present in the leader regions of other pararetroviruses (21). The percent identity on the amino acid level between TPV and CsVMV is shown below the TPV ORFs. Abbreviations: CP, coat protein; RB, RNA binding site; MD, movement domain; MP; movement protein; POL, polyprotein; PR, proteinase; RH, RNase H; IBP, inclusion body protein; TAV, transactivation protein.
Figure 3
Figure 3
Variations in the tandem repeat in the putative TPV leader region beyond ORF4. The TPLV sequence clones indicated to the right contained this tandem repeat block (Fig. 1). The 63-bp monomer comprises internal inverted and direct repeats (arrows, top). Length heterogeneity of the 63-bp monomer involves specific sequences (bold) that could form RNA stem-loop structures. Extensions of the 63-bp monomeric unit creating a 76-bp unit involve sequences present in the large loop of a possible RNA hairpin (bold, dotted underline). Partial copies in V4 and V9 consist of sequences in the stem and loop of a possible large hairpin (bold; dotted and heavy underline). Internal deletions (boxed regions; V9, V14) involve sequences in a second putative small stem-loop region (bold and boxed). In V11, the third monomer copy is partial because of the end of clone.
Figure 4
Figure 4
Southern blot analysis using a TPVL sequence probe on different plant DNA preparations. Approximately 5 μg of DNA were loaded in each lane. (A) Uncut DNA. (B) DNA digested with XbaI. The position of two XbaI sites in the putative TPV genome is shown above the ORFs in Fig. 1. The probe consisted of the XbaI-EcoRI fragment containing part of ORF3 (Fig. 1). The hybridizing fragments in A are >20 kb; the major band in B (arrow) is ≈3.5 kb.
Figure 5
Figure 5
Hypothetical way to increase the copy number of the tandem repeat during the reverse transcriptase step of TPV genome replication. Based on the position of the RNA polymerase II promoter/leader after ORF4 in the closely related CsVMV (25), it is assumed that the tandem repeat containing two copies of the 63-bp monomer (two blocks) in the putative TPV leader will be present in the terminal repeats of the slightly greater than genome-length TPV RNA (dotted line). Using tRNA (cloverleaf) as a primer at nucleotide 1 of TPV DNA, reverse transcriptase (RT) synthesizes minus strand DNA (solid line) and degrades the RNA template until it reaches the terminal redundancy, where the DNA hybridizes to the complementary RNA sequence before RT switches strands. Because of the presence of the repeat in this region, there could be misalignment in the hybrid (oppositely pointing arrows), leading to the addition of one copy to produce three copies of the monomer. No misalignment maintains two copies. An signifies the poly(A) tail on the RNA.

Similar articles

Cited by

References

    1. Grierson D, Covey S N. Plant Molecular Biology. New York: Chapman & Hall; 1988.
    1. Wright D A, Voytas D F. Genetics. 1998;149:703–715. - PMC - PubMed
    1. Laten H M, Majumdar A, Gaucher E A. Proc Natl Acad Sci USA. 1998;95:6897–6902. - PMC - PubMed
    1. Kumar A. Trends Plant Sci. 1998;3:371–374.
    1. Kiss-László Z, Hohn T. Trends Microbiol. 1996;4:480–485. - PubMed

Publication types

Associated data

LinkOut - more resources