Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2003 May;13(5):800-12.
doi: 10.1101/gr.893803.

Retroposed copies of the HMG genes: a window to genome dynamics

Affiliations

Retroposed copies of the HMG genes: a window to genome dynamics

Liora Z Strichman-Almashanu et al. Genome Res. 2003 May.

Abstract

Retroposed copies (RPCs) of genes are functional (intronless paralogs) or nonfunctional (processed pseudogenes) copies derived from mRNA through a process of retrotransposition. Previous studies found that gene families involved in mRNA translation or nuclear function were more likely to have large numbers of RPCs. Here we characterize RPCs of the few families coding for the abundant high-mobility-group (HMG) proteins in humans. Using an algorithm we developed, we identified and studied 219 HMG RPCs. For slightly more than 10% of these RPCs, we found evidence indicating expression. Furthermore, eight of these are potentially new members of the HMG families of proteins. For three RPCs, the evidence indicated expression as part of other transcripts; in all of these, we found the presence of alternative splicing or multiple polyadenylation signals. RPC distribution among the HMGs was not even, with 33-65 each for HMGB1, HMGB3, HMGN1, and HMGN2, and 0-6 each for HMGA1, HMGA2, HMGB2, and HMGN3. Analysis of the sequences flanking the RPCs revealed that the junction between the target site duplications and the 5'-flanking sequences exhibited the same TT/AAAA consensus found for the L1 endonuclease, supporting an L1-mediated retrotransposition mechanism. Finally, because our algorithm included aligning RPC flanking sequences with the corresponding HMG genomic sequence, we were able to identify transcribed regions of HMG genes that were not part of the published mRNA sequences.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Identification of retroposed copies (RPCs) for HMG genes. HMG mRNA sequences (in gray) were soft-masked and used in BLAST searches against the Human Genome Build #26 chromosome files (in blue). HMG parental genes were first identified (see Methods), and splice junction positions in the mRNA sequences were defined from these alignments (black arrowheads) and were subsequently used as guides for identifying RPCs (in red). RPC endpoints were adjusted by conducting a BLAST2SEQ comparison against the corresponding HMG gene, as well as with less-stringent parameters (see Methods) against the HMG mRNA. Horizontal dashed lines represent gaps in the alignment, and vertical dashed lines represent the splice positions in the mRNA–RPC alignment.
Figure 2.
Figure 2.
Genomic distribution of 198 mapped RPCs. The colored tick marks represent individual RPC locations according to the legend; a diamond shape represents the corresponding gene. The chromosome shading illustrates the number of RPCs in 20 megabase bins, according to the horizontal scale below; red regions in the chromosomes are centromeres.
Figure 3.
Figure 3.
Genes and pseudogenes flanking HMG RPCs. HMG RPC (shown as red boxes) flanking sequences were masked and used in a BLAST search to find surrounding genes and pseudogenes (depicted as blue boxes). The orientation of all HMG RPCs is left to right. (A) A pseudogene of the Ca-ATPase (M23114, 92% identical) was found in the reverse orientation (denoted by a thin blue arrow) to an HMGN1 RPC (89% identical), at a distance of 40 bp (intervening black line). (B) An HMGN1 RPC (88% identical) in the 5′ flank of the KARP-1 gene contributing two exons. The three hatched boxes depict Alu elements. (C) An HMGN2 RPC (89% identical) in the fourth intron of the IL-1 homolog. (D) An HMGN2 RPC (93% identical) upstream and in the same orientation as an RPC of ribosomal protein L12 (95% identical; see text). The 47 bp between them is a fragment of a THE1 element (underlined in the middle insert), from which the TSDs for both RPCs are derived (colored nucleotides). (Open triangles) TSDs; TSD sequences appear in the inserts, and nucleotides in green represent ambiguity; (AAA) poly(A) tracts; (black arrowheads) splice sites; (hatched boxes) repetitive elements; (thick blue arrows) CDS; (dotted lines) position of introns in CDS; (lines between the boxes) intervening genomic DNA, or introns if colored; (–//–) a break artificially inserted into the long sequence for convenient display. Figure not drawn to scale.
Figure 4.
Figure 4.
Potentially expressed RPCs. RPCs were masked and used in a BLAST search against the mRNA and EST databases. (A) HMGA1 RPC lacking exon 2 (vertical line) and with a 33-bp insertion (stippled box) harbors an ORF (red arrow) similar to HMGA1 CDS with an insertion (embedded stippled line) and a premature stop codon (asterisk), as well as one EST (represented by a thin black arrow). (Gray arrow) The CDS of HMGA1. (B) An HMGA1 PS with a 500-bp deletion (vertical line) encodes exons shared with an alternatively spliced mRNA (NM_052844; thin blue arrow) through ESTs (thin black arrows); (dotted lines) introns in transcribed sequence. The HMG insertion ends with poly(A); however, it is internal to the HMGA1 transcript. (C) HMGB1 RPC has an ORF (thick black arrow) in an opposite orientation to EST , and another ORF (thick red arrow) similar to HMG CDS. (D) An HMGN1 RPC has an ORF within EST (thick black arrow) that is not similar to the HMGN1 CDS, and encodes an exon shared with an alternatively spliced mRNA (NM_023071; thin blue arrow) through ESTs. EST is part of UniGene cluster Hs.152982. (E) An HMGN2 RPC has an ORF similar to HMG CDS (thick red arrow) and three ESTs in the same orientation. (F) An HMGN2 RPC in the 3′-UTR of cDNA provides an alternative poly(A) signal (downward-pointing arrow). ESTs illustrate the use of both signals; a + after EST stands for more ESTs at the same position. The gene structure of was derived from an alignment with genomic DNA. (Green arrows) Monkey and mouse mRNA sequences similar to that do not include an HMG sequence; the black part of the mouse DNA represents nonaligning sequence; (hatched box) a sequence of repetitive elements; (open triangles) TSDs, TSD sequences appear in the inserts, nucleotides in green represent ambiguity; (AAA) poly(A) tracts; (downward-pointing arrowheads) poly(A) signals; (upward-pointing arrowheads) splice sites; (thick arrows) ORFs or CDSs: The position of the HMG CDS is depicted in gray; ORFs similar to HMG CDS are red; ORFs not similar to HMG CDS are black, and CDSs outside the RPC region are blue. (Dotted lines) Intron positions in transcripts; (lines between the boxes) introns in genomic DNA; 3′ EST orientation is reversed to presumed sense; not drawn to scale.
Figure 5.
Figure 5.
Transcripts of the human HMGB1 gene: alternative 3′ end, extended 5′ end. RPCs aligned directly to the HMGB1 genomic DNA (gray boxes) were found to have sequences extending upstream and downstream (black and blue boxes) relative to the HMGB1 RefSeq mRNA NM_002128. The middle structure is a schematic representing all RPCs derived from HMGB1, with gray structures corresponding to NM_002128 sequence, blue structures corresponding to a cDNA downstream from NM_002128 (AL110194), and black boxes are genomic sequences outside these cDNAs. The numbers within the boxes show how many RPCs start (with upward-pointing red arrows) or end (with downward-pointing red arrows) within this region. ESTs shown belong to UniGene cluster Hs.337757, and illustrate the alternative use of poly(A) signals (downward-pointing blue and gray arrowheads), as well as the existence of transcription upstream and downstream to NM_002128, also supported by a cDNA (). (Green arrows) Pig and mouse HMGB1 mRNAs, which also extend downstream of NM_002128. (Open triangles) TSDs; (AAA) poly(A) tracts; (upward-pointing black arrowheads) splice sites. The dotted lines in depict a 70-bp deletion, and 3′ EST orientations are reversed.
Figure 6.
Figure 6.
HMGA1 RPCs aligned with mRNA splice isoforms. (Black arrow in the middle) The NM_002131.1 RefSeq mRNA; (thick black arrow) its CDS position. Above are transcript variants in gray, below are RPCs in red; different shades of blue lines represent different alternative exons, and the hatched line in variant 6 represents an Alu element. (Downward-pointing arrowheads) The position of an additional 33 bp between exons 3 and 4 in some of the transcripts and RPCs. (Upward-pointing arrowheads) Splice positions; (dotted lines) deletions or exon skipping.
Figure 7.
Figure 7.
TSD composition, length distribution, and inverted/truncated elements. (A) A sequence logo representation of the 5′ flank junction for TSDs ≥10 bp (n = 37) without ambiguities. The cartoon underneath represents the junction position relative to the RPC. (B) TSD length distribution of RPCs (n = 116). (C) An HMGN1 RPC 5′-truncated at bp 152 and inverted at bp 299–316 (gi 16160227, bp 1,018,816–1,019,883). (D) An HMGN2 PS 5′-truncated at bp 788 and inverted at bp 938–959 (gi 16157330, bp 297,023–297,461). This sequence has a TSD and a poly(A) tract, most likely an RPC; see text. Numbers correspond to position in the HMG mRNA; (triangles) TSDs; (thin arrows) direction of corresponding HMG transcript.

Similar articles

Cited by

References

    1. Altschul S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215: 403-410. - PubMed
    1. Birger Y., Ito, Y., West, K.L., Landsman, D., and Bustin, M. 2001. HMGN4, a newly discovered nucleosome-binding protein encoded by an intronless gene. DNA Cell. Biol. 20: 257-264. - PubMed
    1. Boeke J.D. 1997. LINEs and Alus—The polyA connection. Nat. Genet. 16: 6-7. - PubMed
    1. Bustin M. 1999. Regulation of DNA-dependent activities by the functional motifs of the high-mobility-group chromosomal proteins. Mol. Cell. Biol. 19: 5237-5246. - PMC - PubMed
    1. Dunham I., Shimizu, N., Roe, B.A., Chissoe, S., Hunt, A.R., Collins, J.E., Bruskiewich, R., Beare, D.M., Clamp, M., Smink, L.J., et al. 1999. The DNA sequence of human Chromosome 22. Nature 402: 489-495. - PubMed

MeSH terms

LinkOut - more resources