Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2003 Dec;13(12):2621-36.
doi: 10.1101/gr.1736803.

Analysis of the gene-dense major histocompatibility complex class III region and its comparison to mouse

Affiliations
Comparative Study

Analysis of the gene-dense major histocompatibility complex class III region and its comparison to mouse

Tao Xie et al. Genome Res. 2003 Dec.

Abstract

In mammals, the Major Histocompatibility Complex class I and II gene clusters are separated by an approximately 700-kb stretch of sequence called the MHC class III region, which has been associated with susceptibility to numerous diseases. To facilitate understanding of this medically important and architecturally interesting portion of the genome, we have sequenced and analyzed both the human and mouse class III regions. The cross-species comparison has facilitated the identification of 60 genes in human and 61 in mouse, including a potential RNA gene for which the introns are more conserved across species than the exons. Delineation of global organization, gene structure, alternative splice forms, protein similarities, and potential cis-regulatory elements leads to several conclusions: (1) The human MHC class III region is the most gene-dense region of the human genome: >14% of the sequence is coding, approximately 72% of the region is transcribed, and there is an average of 8.5 genes per 100 kb. (2) Gene sizes, number of exons, and intergenic distances are for the most part similar in both species, implying that interspersed repeats have had little impact in disrupting the tight organization of this densely packed set of genes. (3) The region contains a heterogeneous mixture of genes, only a few of which have a clearly defined and proven function. Although many of the genes are of ancient origin, some appear to exist only in mammals and fish, implying they might be specific to vertebrates. (4) Conserved noncoding sequences are found primarily in or near the 5'-UTR or the first intron of genes, and seldom in the intergenic regions. Many of these conserved blocks are likely to be cis-regulatory elements.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Using LocusLink entries for reviewed and provisional RefSeqs belonging to a given gene, and assigning the longest alignable RefSeq to the gene so that each gene is counted only once, the number of genes per megabase and GC content were plotted for the human genome using the April 2003 GoldenPath assembly (http://genome.cse.ucsc.edu). The nonoverlapping megabase-sized windows begin at 0, 250,000, 500,000, and 750,000 for each chromosome. Results for the window beginning at 250,000 are shown. The uppermost point on the graph represents Chromosome 6:31250001-32250000, which includes the entire MHC class III region.
Figure 2
Figure 2
DOTTER (version 3.1, default setting) dot-matrix comparison of the extended human and mouse MHC class III regions. The X-axis represents the mouse sequence, the Y-axis the human. The depicted human and mouse sequences begin at POU5F1 (in the class I region, for both human and mouse) and extend to TSBP. The location of the class III region to class I and II is indicated using arrows outside the top X-axis and left Y-axis. Only a portion of the human and mouse class I and II regions is included as indicated by the open arrows. GC contents were also determined and are shown above the axes. Arrows in the GC plots define the class III region as a GC-rich isochore. In mouse, the locations of the first class III gene, Bat1, and the last class III gene, Notch4, are shown as black boxes. There are a pair of short diagonal lines (see the arrows inside dot plots) away from the main diagonal axis, that correspond to the C4-CYP21 module (see text). Interspersed repeats are represented by the background dots.
Figure 3
Figure 3
VISTA plot of the human and mouse MHC class III regions. Conserved sequences (percent identity >50%) are shown in different colors according to the type of their sequence: blue for coding regions, turquoise for UTRs, and red for CNSs. Two names with “*” (TNXA, RP2) denote gene fragments. Gene names are given for the human orthologs. Three gene names (NCR3, C4A, CYP21A2) are painted in red, to indicate their mouse orthologs are pseudogenes, and the names of two human pseudogenes (CYP21A1P and LY6G6E) are green. The two regions where the sequences do not align are due to a unique gene in mouse, G7e (name in blue), which resembles a viral envelope gene (Snoek et al. 1996) at 260 kb; and to several transposable elements that have inserted into the mouse genome between 470 and 510 kb. The approximate positions on the April 03 Goldenpath assemblies are chr6:31550009-32223670 (human) and chr17:33160937-33875007 (mouse).
Figure 4
Figure 4
Conserved noncoding regions and nonconserved coding regions. (A) The graphic representation of the BLASTN alignment between human EST hits and the genomic DNA region around C6orf48 is superimposed based on the DNA coordinates. (B) VISTA output of the same region showing the intron-exon structure of C6orf48. (C,D) The VISTA and BLASTN outputs for the corresponding gene, G8, in mouse. Two conserved noncoding regions (indicated by dotted arrows) are found to encode two snRNAs.
Figure 5
Figure 5
Distribution of gene size (A) and intergenic sequences (B) for the human and mouse MHC class III gene pairs. The X-axis coordinate is the rank order of the human genes. The outlier in B (indicated by *) results from a mouse-specific insertion between the Lsm2 and Vars2 genes, which also harbors the mouse unique gene, G7e. The intergenic distance of mouse Lsm2 to its closest upstream gene, G7e, is 10,829 bp; whereas there are only 1588 bp from the human LSM2 gene to its closest upstream gene, VARS2. Supporting data can be found in Supplemental Table 2.
Figure 6
Figure 6
Alternatively spliced exons of the mouse Agpat1 gene found by genome comparison. The upper part of the figure is the graphic representation of the alignment from the result of BLASTN search, against the mouse EST database. The query sequence is the 6-kb upstream sequence from the first coding exon of the gene Agpat1. The graph below is the VISTA output of this region. Three alternatively spliced exons can be clearly identified and mapped to the conserved genomic sequence (see arrows). The coding sequence begins from the eleventh base of the second exon (see the ATG sign).
Figure 7
Figure 7
Distal cis-element found by CNS analysis. The Z promoter (Wijesuriya et al. 1999) of the human CYP21A2P gene is located in the 35th intron of its upstream gene, C4B. This promoter sits exactly in a conserved noncoding sequence, represented by a black peak.

References

    1. Abi-Rached, L., Gilles, A., Shiina, T., Pontarotti, P., and Inoko, H. 2002. Evidence of en bloc duplication in vertebrate genomes. Nat. Genet. 31: 100-105. - PubMed
    1. Aguado, B. and Campbell, R.D. 1998. Characterization of a human lysophosphatidic acid acyltransferase that is encoded by a gene located in the class III region of the human major histocompatibility complex. J. Biol. Chem. 273: 4096-4105. - PubMed
    1. Albig, W. and Doenecke, D. 1997. The human histone gene cluster at the D6S105 locus. Hum. Genet. 101: 284-294. - PubMed
    1. Ansari-Lari, M.A., Muzny, D.M., Lu, J., Lu, F., Lilley, C.E., Spanos, S., Malley, T., and Gibbs, R.A. 1996. A gene-rich cluster between the CD4 and triosephosphate isomerase genes at human chromosome 12p13. Genome Res. 6: 314-326. - PubMed
    1. Aparicio, S., Chapman, J., Stupka, E., Putnam, N., Chia, J.M., Dehal, P., Christoffels, A., Rash, S., Hoon, S., Smit, A., et al. 2002. Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science 297: 1301-1310. - PubMed

WEB SITE REFERENCES

    1. http://bio.math.berkeley.edu/avid/; AVID program.
    1. http://db.systemsbiology.net/projects/local/mhc/SNP/; comparison of SNPs.
    1. http://db.systemsbiology.net/projects/mhc/acgt/; ACGT (A Comparative Genomics Tool). - PubMed
    1. http://genome.cse.ucsc.edu; GoldenPath assembly.
    1. http://rast.abajian.com/sputnik/; Sputnik program.

Publication types