Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Jan 1;25(1):1-10.
doi: 10.1101/gad.1968411.

Gene inactivation and its implications for annotation in the era of personal genomics

Affiliations

Gene inactivation and its implications for annotation in the era of personal genomics

Suganthi Balasubramanian et al. Genes Dev. .

Abstract

The first wave of personal genomes documents how no single individual genome contains the full complement of functional genes. Here, we describe the extent of variation in gene and pseudogene numbers between individuals arising from inactivation events such as premature termination or aberrant splicing due to single-nucleotide polymorphisms. This highlights the inadequacy of the current reference sequence and gene set. We present a proposal to define a reference gene set that will remain stable as more individuals are sequenced. In particular, we recommend that the ancestral allele be used to define the reference sequence from which a core human reference gene annotation set can be derived. In addition, we call for the development of an expanded gene set to include human-specific genes that have arisen recently and are absent from the ancestral set.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Consequences of nonsense SNPs and SNPs in canonical splice sites. The SNP is indicated by a red line in A and D, and a SNP in either of the canonical splice site positions is indicated by a red line in B and C.
Figure 2.
Figure 2.
(A) Frequency distribution of STOP-causing SNPs leading to premature truncations of proteins. Allele frequencies were obtained from HapMap data (Altshuler et al. 2010). The histograms are divided into bins of size 0.05 along the X-axis; each bin, except the first one, is inclusive of the lower bound and exclusive of the upper bound value. In the case of the interval 0–0.05, alleles with 0 frequency are not included. All other bins correspond to similar ranges. (B) Nonreference allele frequency at positions that lead to loss of a STOP codon in the human reference genome. Colors represent the HapMap populations: CEU, European (blue); CHB, Chinese (dark gray); JPT, Japanese (light gray); and YRI, Yoruban (red).
Figure 3.
Figure 3.
SNPs resulting in premature STOP codons. (A) A G/A SNP, rs2293766, introduces a premature STOP codon at Trp 1883. The truncated form of ZAN is found predominantly in Asian populations at ∼50% frequency. The homozygous A/A genotype is seen in AK1 (Korean), and the heterozygous G/A genotype is seen in the YH (Chinese) personal genome. (B) A C/T SNP, rs1404453, results in truncation of ZNF117. The truncated form is conserved in other species and is the major allele in humans. The human reference genome sequence contains the minor allele C. The homozygous T/T genotype is seen in 13 (ABT, AK1, YH, Korean, NA07022, NA12156, NA12878, NA18517, NA18555, NA18956, NA20431, P0, and Venter) and the heterozygous C/T genotype is seen in three (NA19129, NA19240, and Yoruban) of the 21 personal genomes, as indicated on the right side of the figure. The SNP is labeled in red in the personal genome sequences.
Figure 4.
Figure 4.
A T/C SNP, rs6661174, at the annotated STOP codon of FMO2 leads to loss of the STOP codon (TAG to CAG). This results in a 535-residue protein, 64 amino acids longer than the annotated FMO2 protein. The C allele is present in many mammalian species and is seen predominantly in the African genomes. The homozygous C/C genotype is seen in TK1 (Khoisan African genome), and the heterozygous T/C genotype is seen in four (MD8, NB1, NA19240, and Yoruban) of the 21 personal genomes, as indicated on the right side of the figure. The SNP is labeled in red in the personal genome sequences.
Figure 5.
Figure 5.
The SNP A/G, rs2276122, activates a cryptic splice site. A G at this position leads to creation of a new accepter site and insertion of two amino acids, leucine and glutamine, in isoform 2. The homozygous G/G is seen in three (AK1, NA12156, and NA20431) and the heterozygous A/G genotype is seen in four (YH, NA18956, P0, and Watson) of the 21 personal genomes, as indicated on the right side of the figure. The SNP is labeled in red in the personal genome sequences.

References

    1. Ahn SM, Kim TH, Lee S, Kim D, Ghang H, Kim BC, Kim SY, Kim WY, Kim C, Park D, et al. 2009. The first Korean genome sequence and analysis: Full genome sequencing for a socio-ethnic group. Genome Res 19: 1622–1629 - PMC - PubMed
    1. Alkan C, Kidd JM, Marques-Bonet T, Aksay G, Antonacci F, Hormozdiari F, Kitzman JO, Baker C, Malig M, Mutlu O, et al. 2009. Personalized copy number and segmental duplication maps using next-generation sequencing. Nat Genet 41: 1061–1067 - PMC - PubMed
    1. Altshuler DM, Gibbs RA, Peltonen L, Dermitzakis E, Schaffner SF, Yu F, Bonnen PE, de Bakker PI, Deloukas P, Gabriel SB, et al. 2010. Integrating common and rare genetic variation in diverse human populations. Nature 467: 52–58 - PMC - PubMed
    1. Aoshima M, Nunoi H, Shimazu M, Shimizu S, Tatsuzawa O, Kenney RT, Kanegasaki S 1996. Two-exon skipping due to a point mutation in p67-phox–deficient chronic granulomatous disease. Blood 88: 1841–1845 - PubMed
    1. Baralle D, Baralle M 2005. Splicing in action: Assessing disease causing sequence changes. J Med Genet 42: 737–748 - PMC - PubMed

Publication types

LinkOut - more resources