Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2008 Aug 15;4(8):e1000160.
doi: 10.1371/journal.pgen.1000160.

Genetic variation in an individual human exome

Affiliations
Comparative Study

Genetic variation in an individual human exome

Pauline C Ng et al. PLoS Genet. .

Abstract

There is much interest in characterizing the variation in a human individual, because this may elucidate what contributes significantly to a person's phenotype, thereby enabling personalized genomics. We focus here on the variants in a person's 'exome,' which is the set of exons in a genome, because the exome is believed to harbor much of the functional variation. We provide an analysis of the approximately 12,500 variants that affect the protein coding portion of an individual's genome. We identified approximately 10,400 nonsynonymous single nucleotide polymorphisms (nsSNPs) in this individual, of which approximately 15-20% are rare in the human population. We predict approximately 1,500 nsSNPs affect protein function and these tend be heterozygous, rare, or novel. Of the approximately 700 coding indels, approximately half tend to have lengths that are a multiple of three, which causes insertions/deletions of amino acids in the corresponding protein, rather than introducing frameshifts. Coding indels also occur frequently at the termini of genes, so even if an indel causes a frameshift, an alternative start or stop site in the gene can still be used to make a functional protein. In summary, we reduced the set of approximately 12,500 nonsilent coding variants by approximately 8-fold to a set of variants that are most likely to have major effects on their proteins' functions. This is our first glimpse of an individual's exome and a snapshot of the current state of personalized genomics. The majority of coding variants in this individual are common and appear to be functionally neutral. Our results also indicate that some variants can be used to improve the current NCBI human reference genome. As more genomes are sequenced, many rare variants and non-SNP variants will be discovered. We present an approach to analyze the coding variation in humans by proposing multiple bioinformatic methods to hone in on possible functional variation.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. The allele frequencies of heterozygous and homozygous nsSNPs in HuRef.
For heterozygous SNPs, the minor allele frequency is plotted. For homozygous nsSNPs, the frequency for the observed allele in HuRef is plotted.
Figure 2
Figure 2. The percentage of nsSNPs predicted to affect protein function, by category.
A higher fraction of heterozygous, novel, and rare nsSNPs are predicted to affect function compared to homozygous and common nsSNPs. Rare nsSNPs have allele frequencies <0.05; common nsSNPs have allele frequencies > = 0.05.
Figure 3
Figure 3. The size distribution of coding indels.
Coding indels are predominantly the size of 3n, where n is an integer. 3n coding indels do not cause frameshifts, whereas non-3n coding indels do.
Figure 4
Figure 4. Location of coding indels.
On the x-axis is the relative protein location of the coding indel, which is the first amino acid position of the indel divided by the protein length. A relative protein location near zero indicates that the indel is located near the N-terminus of the protein and a relative protein location near one indicates that the indel is located near the C-terminus of the protein. Indels occur frequently at the N- and C-termini of proteins.
Figure 5
Figure 5. An example of a homozygous indel located near an exon boundary.
The HuRef assembly has a homozygous insertion of A at chr11: 44881936. This insertion resides inside a coding exon of the gene TP53I11, but is near a 2 bp intron. With this new base inserted, a single amino acid is introduced into the protein sequence, which is the more likely scenario instead of a 2 bp intron.
Figure 6
Figure 6. The Ka/Ks ratios of Commonly-Affected genes and Rarely-Affected Genes.
Commonly-Affected genes have a higher Ka/Ks ratio than Rarely-Affected genes, which suggests that Commonly-Affected genes are under weaker selection.
Figure 7
Figure 7. A summary of the nonsilent coding variants and their observed trends.

References

    1. Botstein D, Risch N. Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease. Nat Genet. 2003;33(Suppl):228–237. - PubMed
    1. Stenson PD, Ball EV, Mort M, Phillips AD, Shiel JA, et al. Human Gene Mutation Database (HGMD): 2003 update. Hum Mutat. 2003;21:577–581. - PubMed
    1. Chakravarti A. Population genetics–making sense out of sequence. Nat Genet. 1999;21:56–60. - PubMed
    1. Hirschhorn JN, Daly MJ. Genome-wide association studies for common diseases and complex traits. Nat Rev Genet. 2005;6:95–108. - PubMed
    1. Stephens JC, Schneider JA, Tanguay DA, Choi J, Acharya T, et al. Haplotype variation and linkage disequilibrium in 313 human genes. Science. 2001;293:489–493. - PubMed

Publication types