Human genome sequencing in health and disease

Claudia Gonzaga-Jauregui¹, James R Lupski, Richard A Gibbs

Affiliations

PMID: 22248320
PMCID: PMC3656720
DOI: 10.1146/annurev-med-051010-162644

Review

Human genome sequencing in health and disease

Claudia Gonzaga-Jauregui et al. Annu Rev Med. 2012.

. 2012:63:35-61.

doi: 10.1146/annurev-med-051010-162644.

Authors

Claudia Gonzaga-Jauregui¹, James R Lupski, Richard A Gibbs

Affiliation

¹ Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA. gonzagaj@bcm.edu

PMID: 22248320
PMCID: PMC3656720
DOI: 10.1146/annurev-med-051010-162644

Abstract

Following the "finished," euchromatic, haploid human reference genome sequence, the rapid development of novel, faster, and cheaper sequencing technologies is making possible the era of personalized human genomics. Personal diploid human genome sequences have been generated, and each has contributed to our better understanding of variation in the human genome. We have consequently begun to appreciate the vastness of individual genetic variation from single nucleotide to structural variants. Translation of genome-scale variation into medically useful information is, however, in its infancy. This review summarizes the initial steps undertaken in clinical implementation of personal genome information, and describes the application of whole-genome and exome sequencing to identify the cause of genetic diseases and to suggest adjuvant therapies. Better analysis tools and a deeper understanding of the biology of our genome are necessary in order to decipher, interpret, and optimize clinical utility of what the variation in the human genome can teach us. Personal genome sequencing may eventually become an instrument of common medical practice, providing information that assists in the formulation of a differential diagnosis. We outline herein some of the remaining challenges.

PubMed Disclaimer

Figures

**Figure 1**
Comparison of single nucleotide polymorphisms (SNPs) in 10 personal genomes. All SNPs in any of 10 sequenced personal genomes were compared with the other 9 genomes. Altogether, the 10 genomes contribute 14,608,404 nonredundant SNPs (first bar). The second bar pictures all SNPs that are unique to each of the personal genomes; the third bar represents all the SNPs that are unique in a given personal genome but also novel; the fourth bar shows the SNPs shared by individuals of the same ethnic group. Abbreviations: AF1, NA18507(1) Illumina; AF2, NA18507(2) SOLiD; KB1, Khoisan genome; ABT, Archbishop Desmond Tutu; YH, Chinese genome; SJK, Korean genome 1; AK1, Korean genome 2; JCV, J. Craig Venter; JDW, James D. Watson; JRL, James R. Lupski.

**Figure 2**
Size distribution of large indels (100 bp–1 kb) and copy-number variants (CNVs) (>1 kb) in sequenced personal human genomes. Distribution of large indels and CNVs in 8 personal genomes is shown by size. We can observe peaks between 300 and 400 bp, consistent with *Alu* indel polymorphisms, and at ~1–2 kb. Few polymorphic CNVs are larger than 200 kb. Abbreviations: AF1, NA18507(1) Illumina; AF2, NA18507(2) SOLiD; KB1, Khoisan genome; ABT, Archbishop Desmond Tutu; YH, Chinese genome; SJK, Korean genome 1; AK1, Korean genome 2; JCV, J. Craig Venter; JDW, James D. Watson; JRL, James R. Lupski.

**Figure 3**
A comparison of the weaknesses and strengths of whole-genome sequencing (WGS) and exome sequencing approaches for disease-gene identification. Abbreviations: CNVs, copy-number variants; SNVs, simple nucleotide variants.

**Figure 4**
Schematic workflow of whole-genome/exome sequencing data analysis. After sequencing, the sequence reads are mapped and aligned against the human reference genome assembly in order to obtain a list of variants at every position that does not match the reference. Quality filters are applied to obtain high-quality variant calls. Various filtering criteria are applied to prioritize the candidate variants. Most variants will be excluded because they are known, meaning that they are already in variation databases, such as the database of single nucleotide polymorphisms (dbSNP), The 1000 Genomes Project database, etc. The focus is mainly on novel variants, which can be tiered in functional classes according to their annotation. For coding variants, priority is given to nonsense, frameshifting, splice-site, and then missense mutations. Computational prediction of the functional impact of these variants can also help prioritize candidate mutations. Based on the characteristics of the trait or disease of interest, variants can be examined under a dominant or recessive model. Additional confirmation through other resources can strengthen the hypotheses of the functional significance of identified variants. Genetic and functional confirmation of the candidate disease-causing variants is the final, most important step.

See this image and copyright information in PMC

References

1. International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. - PubMed
1. International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature. 2004;431:931–45. - PubMed
1. Bailey JA, Yavor AM, Massa HF, et al. Segmental duplications: organization and impact within the current Human Genome Project assembly. Genome Res. 2001;11:1005–17. - PMC - PubMed
1. Lupski JR. Genomic disorders: structural features of the genome can lead to DNA rearrangements and human disease traits. Trends Genet. 1998;14:417–22. - PubMed
1. The International HapMap Consortium. The International HapMap Project. Nature. 2003;426:789–96. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- H1 Connect - Access expert opinions and insights on biomedical research.
- The Lens - Patent Citations Database
Medical
- ClinicalTrials.gov
- MedlinePlus Health Information
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Human genome sequencing in health and disease

Affiliation

Human genome sequencing in health and disease

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Molecular Biology Databases