Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Dec;19(12):2324-33.
doi: 10.1101/gr.095976.109. Epub 2009 Sep 18.

The completion of the Mammalian Gene Collection (MGC)

MGC Project TeamGary TempleDaniela S GerhardRebekah RasoolyElise A FeingoldPeter J GoodCristen RobinsonAllison MandichJeffrey G DergeJeanne LewisDebonny ShoafFrancis S CollinsWonhee JangLukas WagnerCarolyn M ShenmenLeonie MisquittaCarl F SchaeferKenneth H BuetowTom I BonnerLinda YankieMing WardLon PhanAlex AstashynGarth BrownCatherine FarrellJennifer HartMelissa LandrumBonnie L MaidakMichael MurphyTerence MurphyBhanu RajputLillian RiddickDavid WebbJanet WeberWendy WuKim D PruittDonna MaglottAdam SiepelBrona BrejovaMark DiekhansRachel HarteRobert BaertschJim KentDavid HausslerMichael BrentLaura LangtonCharles L G ComstockMichael StevensChaochun WeiMarijke J van BarenKourosh Salehi-AshtianiRyan R MurrayLila GhamsariElizabeth MelloChenwei LinChrista PennacchioKirsten SchreiberNicole ShapiroAmber MarshElizabeth PardesTroy MooreAnita LebeauMike MuratetBlake SimmonsDavid KloskeStephanie SiejaJames HudsonPraveen SethupathyMichael BrownsteinNarayan BhatJoseph LazarHoward JacobChris E GruberMark R SmithJohn McPhersonAngela M GarciaPreethi H GunaratneJiaqian WuDonna MuznyRichard A GibbsAlice C YoungGerard G BouffardRobert W BlakesleyJim MullikinEric D GreenMark C DicksonAlex C RodriguezJane GrimwoodJeremy SchmutzRichard M MyersMartin HirstThomas ZengKane TseMichelle MoksaMerinda DengKevin MaDiana MahJohnson PangGreg TaylorEric ChuahAthena DengKeith FichterAnne GoStephanie LeeJing WangMalachi GriffithRyan MorinRichard A MooreMichael MayoSarah MunroSusan WagnerSteven J M JonesRobert A HoltMarco A MarraSun LuShuwei YangJames HartiganMarcus GrafRalf WagnerStanley LetovksyJacqueline C PulidoKeith RobisonDominic EspositoJames HartleyVanessa E WallRalph F HopkinsOsamu OharaStefan Wiemann

The completion of the Mammalian Gene Collection (MGC)

MGC Project Team et al. Genome Res. 2009 Dec.

Abstract

Since its start, the Mammalian Gene Collection (MGC) has sought to provide at least one full-protein-coding sequence cDNA clone for every human and mouse gene with a RefSeq transcript, and at least 6200 rat genes. The MGC cloning effort initially relied on random expressed sequence tag screening of cDNA libraries. Here, we summarize our recent progress using directed RT-PCR cloning and DNA synthesis. The MGC now contains clones with the entire protein-coding sequence for 92% of human and 89% of mouse genes with curated RefSeq (NM-accession) transcripts, and for 97% of human and 96% of mouse genes with curated RefSeq transcripts that have one or more PubMed publications, in addition to clones for more than 6300 rat genes. These high-quality MGC clones and their sequences are accessible without restriction to researchers worldwide.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Cumulated gene counts for MGC, XGC, and ZGC. The progressive addition of clones, measured by genes represented in each collection, is shown for MGC, XGC, and ZGC from the beginning to conclusion of these programs. “Gene Count” is the total final number of RefSeq genes represented by each set of clones. This number includes some noncurated genes (XM accessions) that are not counted in Table 1. “Clone Count” includes all clones, including duplicate transcripts and isoforms. Isoforms constitute 2%–3% of the human, mouse, and rat collections.
Figure 2.
Figure 2.
MGC progress represented over time by method. (A) Human; (B) mouse. The absolute contribution (by genes represented) of each cloning method is shown for EST-based cloning, PCR-Rescue, and DNA synthesis, over time.
Figure 3.
Figure 3.
PCR rescue success versus target size. (Black bars) The number of assigned targets in each size range; (white bars) the number of assigned targets that were obtained as full-CDS clones, with the number of clones recovered shown above the bars. The triangles and trendline show the percentage recovered for each size group. Excluded from these calculations are RefSeq targets where the assigned CDS later was changed, suppressed, or withdrawn over the course of the PCR rescue program. Among 8764 human and mouse targets with changed annotation, we obtained a full-CDS clone for 3197 (36%), including one 10.8-kb clone (BC150731).
Figure 4.
Figure 4.
Synthesis success versus target size. (Black bars) The number of assigned targets in each size range; (white bars) the number of assigned targets that were obtained as full-CDS clones, with the number of clones recovered shown above the bars. The triangles and trendline show the percentage recovered for each size group. RefSeq targets where the assigned CDS later was changed, suppressed, or withdrawn (233 in total) were excluded from these calculations.
Figure 5.
Figure 5.
Venn diagram comparing the number of loci containing protein-coding genes from MGC, RefSeq, and Ensembl. (A) Human; (B) mouse. The loci were computed by clustering transcripts from all three gene sets based on the overlap of the genomic location of the CDS portion of the exons. When a transcript is not uniquely mapped to the genome, the clusters for all mappings of that transcript were combined and counted as one locus. For human, this resulted in 17,239 loci containing MGC clones, 18,494 loci with RefSeq mRNAs (Pruitt et al. 2009b), and 20,856 Ensembl gene loci (Hubbard et al. 2002). Mouse had 17,455 loci with MGC clones, 19,064 loci with RefSeq mRNAs, and 23,087 Ensembl gene loci. Genes counted as shared between any two gene sets exclude genes in the third set. BLAT (Kent 2002) alignments of MGC clones and RefSeq mRNAs (NM accessions) obtained from the UCSC Genome Browser database (Karolchik et al. 2008) for human genome assembly 36.1 and mouse assembly 37, and Ensembl Release 52 were used in the analysis. Genomic loci serve as an estimate of the number of genes in these data sets. The counts vary from those seen in Table 1, owing to the different method of computation.

References

    1. Athanasiadis A, Rich A, Maas S. Widespread A-to-I RNA editing of Alu-containing mRNAs in the human transcriptome. PLoS Biol. 2004;2:e391. doi: 10.1371/journal.pbio.0020391. - DOI - PMC - PubMed
    1. Baross A, Butterfield YS, Coughlin SM, Zeng T, Griffith M, Griffith OL, Petrescu AS, Smailus DE, Khattra J, McDonald HL, et al. Systematic recovery and analysis of full-ORF human cDNA clones. Genome Res. 2004;14:2083–2092. - PMC - PubMed
    1. Bass BL. RNA editing by adenosine deaminases that act on RNA. Annu Rev Biochem. 2002;71:817–846. - PMC - PubMed
    1. Blow M, Futreal PA, Wooster R, Stratton MR. A survey of RNA editing in human brain. Genome Res. 2004;14:2379–2387. - PMC - PubMed
    1. Carninci P, Kasukawa T, Katayama S, Gough J, Frith MC, Maeda N, Oyama R, Ravasi T, Lenhard B, Wells C, et al. The transcriptional landscape of the mammalian genome. Science. 2005;309:1559–1563. - PubMed

Publication types