Comprehensive characterization of human genome variation by high coverage whole-genome sequencing of forty four Caucasians
- PMID: 23577066
- PMCID: PMC3618277
- DOI: 10.1371/journal.pone.0059494
Comprehensive characterization of human genome variation by high coverage whole-genome sequencing of forty four Caucasians
Abstract
Whole genome sequencing studies are essential to obtain a comprehensive understanding of the vast pattern of human genomic variations. Here we report the results of a high-coverage whole genome sequencing study for 44 unrelated healthy Caucasian adults, each sequenced to over 50-fold coverage (averaging 65.8×). We identified approximately 11 million single nucleotide polymorphisms (SNPs), 2.8 million short insertions and deletions, and over 500,000 block substitutions. We showed that, although previous studies, including the 1000 Genomes Project Phase 1 study, have catalogued the vast majority of common SNPs, many of the low-frequency and rare variants remain undiscovered. For instance, approximately 1.4 million SNPs and 1.3 million short indels that we found were novel to both the dbSNP and the 1000 Genomes Project Phase 1 data sets, and the majority of which (∼96%) have a minor allele frequency less than 5%. On average, each individual genome carried ∼3.3 million SNPs and ∼492,000 indels/block substitutions, including approximately 179 variants that were predicted to cause loss of function of the gene products. Moreover, each individual genome carried an average of 44 such loss-of-function variants in a homozygous state, which would completely "knock out" the corresponding genes. Across all the 44 genomes, a total of 182 genes were "knocked-out" in at least one individual genome, among which 46 genes were "knocked out" in over 30% of our samples, suggesting that a number of genes are commonly "knocked-out" in general populations. Gene ontology analysis suggested that these commonly "knocked-out" genes are enriched in biological process related to antigen processing and immune response. Our results contribute towards a comprehensive characterization of human genomic variation, especially for less-common and rare variants, and provide an invaluable resource for future genetic studies of human variation and diseases.
Conflict of interest statement
Figures



Similar articles
-
A map of human genome variation from population-scale sequencing.Nature. 2010 Oct 28;467(7319):1061-73. doi: 10.1038/nature09534. Nature. 2010. PMID: 20981092 Free PMC article.
-
Deep sequencing of Danish Holstein dairy cattle for variant detection and insight into potential loss-of-function variants in protein coding genes.BMC Genomics. 2015 Dec 9;16:1043. doi: 10.1186/s12864-015-2249-y. BMC Genomics. 2015. PMID: 26645365 Free PMC article.
-
A global reference for human genetic variation.Nature. 2015 Oct 1;526(7571):68-74. doi: 10.1038/nature15393. Nature. 2015. PMID: 26432245 Free PMC article.
-
[DNA polymorphisms].Rinsho Byori. 2013 Nov;61(11):1001-7. Rinsho Byori. 2013. PMID: 24450105 Review. Japanese.
-
Small insertions and deletions (INDELs) in human genomes.Hum Mol Genet. 2010 Oct 15;19(R2):R131-6. doi: 10.1093/hmg/ddq400. Epub 2010 Sep 21. Hum Mol Genet. 2010. PMID: 20858594 Free PMC article. Review.
Cited by
-
Simple sequence repeats in the national longitudinal study of adolescent health: an ethnically diverse resource for genetic analysis of health and behavior.Behav Genet. 2014 Sep;44(5):487-97. doi: 10.1007/s10519-014-9662-x. Epub 2014 Jun 3. Behav Genet. 2014. PMID: 24890516 Free PMC article.
-
Bridging two scholarly islands enriches both: COI DNA barcodes for species identification versus human mitochondrial variation for the study of migrations and pathologies.Ecol Evol. 2016 Sep 4;6(19):6824-6835. doi: 10.1002/ece3.2394. eCollection 2016 Oct. Ecol Evol. 2016. PMID: 28725363 Free PMC article.
-
Identification of novel functional CpG-SNPs associated with Type 2 diabetes and birth weight.Aging (Albany NY). 2021 Apr 4;13(7):10619-10658. doi: 10.18632/aging.202828. Epub 2021 Apr 4. Aging (Albany NY). 2021. PMID: 33835050 Free PMC article.
-
Focused Strategies for Defining the Genetic Architecture of Congenital Heart Defects.Genes (Basel). 2021 May 28;12(6):827. doi: 10.3390/genes12060827. Genes (Basel). 2021. PMID: 34071175 Free PMC article. Review.
-
An Ensemble Approach to Predict the Pathogenicity of Synonymous Variants.Genes (Basel). 2020 Sep 21;11(9):1102. doi: 10.3390/genes11091102. Genes (Basel). 2020. PMID: 32967157 Free PMC article.
References
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources