Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Mar;6(3):655-65.
doi: 10.1093/gbe/evu042.

Increased substitution rates surrounding low-complexity regions within primate proteins

Affiliations

Increased substitution rates surrounding low-complexity regions within primate proteins

Carolyn Lenz et al. Genome Biol Evol. 2014 Mar.

Abstract

Previous studies have found that DNA-flanking low-complexity regions (LCRs) have an increased substitution rate. Here, the substitution rate was confirmed to increase in the vicinity of LCRs in several primate species, including humans. This effect was also found among human sequences from the 1000 Genomes Project. A strong correlation was found between average substitution rate per site and distance from the LCR, as well as the proportion of genes with gaps in the alignment at each site and distance from the LCR. Along with substitution rates, dN/dS ratios were also determined for each site, and the proportion of sites undergoing negative selection was found to have a negative relationship with distance from the LCR.

Keywords: low-complexity region; mutation; primate; substitution.

PubMed Disclaimer

Figures

F<sc>ig</sc>. 1.—
Fig. 1.—
Data workflow: (1) Homologous proteins found for five primate species; (2) LCRs identified using Seg; (3) Maximum length of flanking sequences determined by protein termini and midpoints between two LCRs; (4) Flanking regions filtered for examples with homologous sequences from all five species. Because the second LCR in this example is present in only four species, its flanking regions are not used; (5) The 3′ and 5′ flanking sequences are considered separately so that if all five homologous sequences are not available, the other can still be used; (6, continuing with the 3′ sequence) The upstream and downstream flanking regions are aligned separately; gaps are represented with thin lines; (7) Alignments of individual codons are used to find the number of substitutions at each site for each flanking region with CodeML. Codons with gaps (in this case codons 1, 2, and 7–12) are not usable and are not considered when calculating the average number of substitutions per site. The number of substitutions was found for all usable sites of all genes found to contain LCRs, and the average across all usable sites was found for all positions relative to the LCR (i.e., codon 1, codon 2, etc.).
F<sc>ig</sc>. 2.—
Fig. 2.—
Effect of distance from LCR on average number of (A) total substitutions, (B) nonsynonymous substitutions, and (C) synonymous substitutions at each codon in five primate species. Gray points indicate N, the number of genes that could provide information and were free of gaps at each site. Negative values are upstream of the LCR.
F<sc>ig</sc>. 3.—
Fig. 3.—
Effect of distance from LCR on proportion of flanking regions that had evidence for negative selection (formula image). Gray points indicate the number of genes that had substitutions and could be used to calculate dN/dS. Negative values are upstream of the LCR.
F<sc>ig</sc>. 4.—
Fig. 4.—
Effect of distance from LCR on proportion of flanking regions that had evidence for positive selection (ω > 1). Gray points indicate the number of genes that had substitutions and could be used to calculate dN/dS. Negative values are upstream of the LCR.
F<sc>ig</sc>. 5.—
Fig. 5.—
Effect of distance from LCR on average number of (A) total substitutions, (B) nonsynonymous substitutions and (C) synonymous substitutions at each codon in humans. Gray points indicate N, the number of genes that could provide information and were free of gaps at each site. Negative values are upstream of the LCR.
F<sc>ig</sc>. 6.—
Fig. 6.—
Effect of distance from the LCR on proportion of genes that had negative values for Tajima’s D at each codon. Gray points indicate the number of genes with substitutions that could be used to calculate Tajima’s D at each site. Negative values are upstream of the LCR.
F<sc>ig</sc>. 7.—
Fig. 7.—
Effect of distance from the LCR on average proportion of genes with gaps in the alignment at each nucleotide in five primate species. Gray points indicate N, the number of genes that could provide information on each site. Negative values are upstream of the LCR.

Similar articles

Cited by

References

    1. 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature. 2010;467:1061–1073. - PMC - PubMed
    1. Alba M, Guigo R. Comparative analysis of amino acid repeats in rodents and humans. Genome Res. 2004;14:549–554. - PMC - PubMed
    1. Alba M, Santibanez-Koref M, Hancock J. Conservation of polyglutamine tract size between mice and humans depends on codon interruption. Mol Biol Evol. 1999;16:1641–1644. - PubMed
    1. Alba M, Santibanez-Koref M, Hancock J. The comparative genomics of polyglutamine repeats: extreme differences in the codon organization of repeat-encoding regions between mammals and Drosophila. J Mol Evol. 2001;52:249–259. - PubMed
    1. Amos W. Even small SNP clusters are non-randomly distributed: is this evidence of mutational non-independence? Proc Biol Sci. 2010a;277:1443–1449. - PMC - PubMed

Publication types

LinkOut - more resources