An expanded sequence context model broadly explains variability in polymorphism levels across the human genome
- PMID: 26878723
- PMCID: PMC4811712
- DOI: 10.1038/ng.3511
An expanded sequence context model broadly explains variability in polymorphism levels across the human genome
Abstract
The rate of single-nucleotide polymorphism varies substantially across the human genome and fundamentally influences evolution and incidence of genetic disease. Previous studies have only considered the immediately flanking nucleotides around a polymorphic site--the site's trinucleotide sequence context--to study polymorphism levels across the genome. Moreover, the impact of larger sequence contexts has not been fully clarified, even though context substantially influences rates of polymorphism. Using a new statistical framework and data from the 1000 Genomes Project, we demonstrate that a heptanucleotide context explains >81% of variability in substitution probabilities, highlighting new mutation-promoting motifs at ApT dinucleotide, CAAT and TACG sequences. Our approach also identifies previously undocumented variability in C-to-T substitutions at CpG sites, which is not immediately explained by differential methylation intensity. Using our model, we present informative substitution intolerance scores for genes and a new intolerance score for amino acids, and we demonstrate clinical use of the model in neuropsychiatric diseases.
Conflict of interest statement
The authors declare no conflict of interest.
Figures




Similar articles
-
Sequence context analysis of 8.2 million single nucleotide polymorphisms in the human genome.Gene. 2006 Feb 1;366(2):316-24. doi: 10.1016/j.gene.2005.08.024. Epub 2005 Nov 28. Gene. 2006. PMID: 16314054
-
Sequence context analysis in the mouse genome: single nucleotide polymorphisms and CpG island sequences.Genomics. 2006 Jan;87(1):68-74. doi: 10.1016/j.ygeno.2005.09.012. Epub 2005 Nov 28. Genomics. 2006. PMID: 16316740
-
The genome-wide landscape of C:G > T:A polymorphism at the CpG contexts in the human population.BMC Genomics. 2020 Mar 30;21(1):270. doi: 10.1186/s12864-020-6674-1. BMC Genomics. 2020. PMID: 32228436 Free PMC article.
-
The influence of neighboring-nucleotide composition on single nucleotide polymorphisms (SNPs) in the mouse genome and its comparison with human SNPs.Genomics. 2004 Nov;84(5):785-95. doi: 10.1016/j.ygeno.2004.06.015. Genomics. 2004. PMID: 15475257
-
Variation in the mutation rate across mammalian genomes.Nat Rev Genet. 2011 Oct 4;12(11):756-66. doi: 10.1038/nrg3098. Nat Rev Genet. 2011. PMID: 21969038 Review.
Cited by
-
VarCards2: an integrated genetic and clinical database for ACMG-AMP variant-interpretation guidelines in the human whole genome.Nucleic Acids Res. 2024 Jan 5;52(D1):D1478-D1489. doi: 10.1093/nar/gkad1061. Nucleic Acids Res. 2024. PMID: 37956311 Free PMC article.
-
GPCards: An integrated database of genotype-phenotype correlations in human genetic diseases.Comput Struct Biotechnol J. 2021 Mar 22;19:1603-1611. doi: 10.1016/j.csbj.2021.03.011. eCollection 2021. Comput Struct Biotechnol J. 2021. PMID: 33868597 Free PMC article.
-
Fast neutron mutagenesis in soybean enriches for small indels and creates frameshift mutations.G3 (Bethesda). 2022 Feb 4;12(2):jkab431. doi: 10.1093/g3journal/jkab431. G3 (Bethesda). 2022. PMID: 35100358 Free PMC article.
-
A map of constrained coding regions in the human genome.Nat Genet. 2019 Jan;51(1):88-95. doi: 10.1038/s41588-018-0294-6. Epub 2018 Dec 10. Nat Genet. 2019. PMID: 30531870 Free PMC article.
-
Recurrent mutation in the ancestry of a rare variant.Genetics. 2023 Jul 6;224(3):iyad049. doi: 10.1093/genetics/iyad049. Genetics. 2023. PMID: 36967220 Free PMC article.
References
-
- Hodgkinson A, Eyre-Walker A. Variation in the mutation rate across mammalian genomes. Nat Rev Genet. 2011;12:756–66. - PubMed
-
- Ehrlich M, Wang RY. 5-Methylcytosine in eukaryotic DNA. Science. 1981;212:1350–7. - PubMed
-
- Rideout WM, Coetzee GA, Olumi AF, Jones PA. 5-Methylcytosine as an endogenous mutagen in the human LDL receptor and p53 genes. Science. 1990;249:1288–90. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources