Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jan 22;9(1):e85260.
doi: 10.1371/journal.pone.0085260. eCollection 2014.

Genome-wide analysis of promoters: clustering by alignment and analysis of regular patterns

Affiliations

Genome-wide analysis of promoters: clustering by alignment and analysis of regular patterns

Lucia Pettinato et al. PLoS One. .

Abstract

In this paper we perform a genome-wide analysis of H. sapiens promoters. To this aim, we developed and combined two mathematical methods that allow us to (i) classify promoters into groups characterized by specific global structural features, and (ii) recover, in full generality, any regular sequence in the different classes of promoters. One of the main findings of this analysis is that H. sapiens promoters can be classified into three main groups. Two of them are distinguished by the prevalence of weak or strong nucleotides and are characterized by short compositionally biased sequences, while the most frequent regular sequences in the third group are strongly correlated with transposons. Taking advantage of the generality of these mathematical procedures, we have compared the promoter database of H. sapiens with those of other species. We have found that the above-mentioned features characterize also the evolutionary content appearing in mammalian promoters, at variance with ancestral species in the phylogenetic tree, that exhibit a definitely lower level of differentiation among promoters.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Distribution of points in the clustering space (see Methods) relative to the alignment of 2880 human promoters.
Each point represents a promoter of the sample. A. The color code represents the four clusters. B. The color code represents the TATA (blue dots) and TATA–less (orange dots) classification.
Figure 2
Figure 2. BCA of human promoters.
BCA of the entire repertoire of human promoters (panel A) and of the two sets of TATA and TATA-less promoters (panels B and C). We report the frequency formula image of each of the four nucleotides A (black), T (blue), C (red) and G (green) as a function of the position formula image along the promoter (0 corresponds to the TSS). Figure from .
Figure 3
Figure 3. BCA of each of the clusters obtained with the clustering algorithm for H. sapiens.
We report the frequency formula image of each of the four nucleotides A (black), T (blue), C (red) and G (green) as a function of the position formula image along the promoter (0 corresponds to the TSS).
Figure 4
Figure 4. Occurrence of regular sequences in the clusters of promoters of different species.
A. Average fraction of the promoter occupied by regular sequences. B. Number of promoters within the clusters.
Figure 5
Figure 5. The most frequent regular sequences found in the clusters of H. sapiens.
We report the percentage of promoters of the cluster in which the sequence appears at least once (left column), and the percentage of times the sequence is found inside a transposon (right column): it is calculated dividing the number of times it appears in a transposon by the total number of times it appears in the cluster.
Figure 6
Figure 6. Distribution of the different families of transposons in the four clusters of H. sapiens.
We report the total percentage of nucleotides in the cluster covered by transposons (pie chart) and the percentage of nucleotides covered by each family of transposons (histogram). Note the different scales in the histograms.
Figure 7
Figure 7. Eigenvalues of the Laplacian matrix.
First formula image eigenvalues in ascending order of the normalized Laplacian matrix relative to the alignment of 2880 H. sapiens promoters. The method used is the Needleman–Wunsch with GAPOPENformula image and GAPEXTENDformula image for panel A, GAPEXTENDformula image for panel B.
Figure 8
Figure 8. Eigenvectors of the Hessian matrix with different properties of delocalization.
The eigenvector formula image, in panel A, has comparable values of participation number and extension (formula image and formula image), while the eigenvector formula image, in panel B, has a small participation number, formula image, but very large extension (formula image). In the insets an enlargement of the region of delocalization is shown. Data refer to the promoter of H. sapiens with Entrez GeneID 9542 (the promoter of the neuregulin-2 gene). Entrez Gene is the gene-specific database at the National Center of Biotechnology Information (NCBI) .
Figure 9
Figure 9. Start site and end site of an eigenvector.
Determination of the effective extension (region in between the dashed lines) of a delocalized eigenvector overlying regular sequences. Notice the very small components of the eigenvectors aside the regular region. A portion of the sequence is reported both in quaternary and in binary code. Data refer to the promoter of H. sapiens with Entrez GeneID 54808 (the promoter of the dymeclin gene).
Figure 10
Figure 10. Regular and disordered sequences of a promoter.
The regular sequences (highlighted in the black frames) are determined by the delocalized eigenvectors of the Hessian matrix. For the sake of clarity, for each of the three examples shown here we report just two of the eigenvectors, whose total number is 10 (green case), 16 (blue case) and 16 (red case). The sequence of the promoter is reported both in quaternary and in binary code. The curves refer to eigenvectors n. 988, 577, 567, 998, 946, 627 (resp. from the top to the bottom) of promoter with Entrez GeneID 9542 of H. sapiens.

References

    1. The ENCODE Project Consortium (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489: 57–74. - PMC - PubMed
    1. King MC, Wilson AC (1975) Evolution at two levels in humans and chimpanzees. Science 188: 107–116. - PubMed
    1. Carroll S (2008) Evo-devo and the expanding evolutionary synthesis: a genetic theory of morphological evolution. Cell 134: 25–36. - PubMed
    1. Shibata Y, Shefield NC, Fedrigo O, Babbitt CC, Wortham M, et al. (2012) Extensive evolutionary changes in regulatory element activity during human origins are associated with altered gene expression and positive selection. PLoS Genet 8: e1002789. - PMC - PubMed
    1. Maurano MT, Humbert R, Rynes E, Thurman RE, Haugen E, et al. (2012) Systematic localization of common disease-associated variation in regulatory DNA. Science 337: 1190–1195. - PMC - PubMed

Publication types

LinkOut - more resources