Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012;7(4):e35274.
doi: 10.1371/journal.pone.0035274. Epub 2012 Apr 26.

Clusters of ancestrally related genes that show paralogy in whole or in part are a major feature of the genomes of humans and other species

Affiliations

Clusters of ancestrally related genes that show paralogy in whole or in part are a major feature of the genomes of humans and other species

Michael B Walker et al. PLoS One. 2012.

Abstract

Arrangements of genes along chromosomes are a product of evolutionary processes, and we can expect that preferable arrangements will prevail over the span of evolutionary time, often being reflected in the non-random clustering of structurally and/or functionally related genes. Such non-random arrangements can arise by two distinct evolutionary processes: duplications of DNA sequences that give rise to clusters of genes sharing both sequence similarity and common sequence features and the migration together of genes related by function, but not by common descent. To provide a background for distinguishing between the two, which is important for future efforts to unravel the evolutionary processes involved, we here provide a description of the extent to which ancestrally related genes are found in proximity.Towards this purpose, we combined information from five genomic datasets, InterPro, SCOP, PANTHER, Ensembl protein families, and Ensembl gene paralogs. The results are provided in publicly available datasets (http://cgd.jax.org/datasets/clustering/paraclustering.shtml) describing the extent to which ancestrally related genes are in proximity beyond what is expected by chance (i.e. form paraclusters) in the human and nine other vertebrate genomes, as well as the D. melanogaster, C. elegans, A. thaliana, and S. cerevisiae genomes. With the exception of Saccharomyces, paraclusters are a common feature of the genomes we examined. In the human genome they are estimated to include at least 22% of all protein coding genes. Paraclusters are far more prevalent among some gene families than others, are highly species or clade specific and can evolve rapidly, sometimes in response to environmental cues. Altogether, they account for a large portion of the functional clustering previously reported in several genomes.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Frequency distribution of distances between structurally related genes.
(a) Frequencies of distances between structurally related genes across the human genome (blue) as compared to the same data generated from randomly permuted genomes (red). (b) Frequencies of distances to the nearest structurally related gene for all human genes (blue) as compared to the same data generated from randomly permuted genomes (red).
Figure 2
Figure 2. Paracluster sizes.
(a) Total gene count in paraclusters as a function of paracluster size measured as the number of paralogs in each paracluster. Gene counts include (blue) and exclude (red) interstitial genes. The parenthetical text provides the number of paraclusters having the corresponding size. (b) Cumulative frequency distribution of distances (spans) between genes sharing the same paracluster.
Figure 3
Figure 3. Metrics based on whole gene sequence similarity versus common domains.
Frequencies of distances between genes within paraclusters as described by the combination of data from PANTHER, Ensembl family and Ensembl paralogy datasets (red) and as described by the combination of a data from SCOP and InterPro datasets (blue). The numbers in parentheses are the number of paraclusters with only two genes.
Figure 4
Figure 4. Species comparison of genome wide clustering metrics.
Cumulative frequency distribution of the percentage of genes in paraclusters within the genomes of a selected set of species as a function of paracluster sizes.
Figure 5
Figure 5. Extent of clustering within gene families.
Frequencies of Ensembl gene family counts grouped by the percentage of family members contained within paraclusters (only families with 10 or more family members were counted).
Figure 6
Figure 6. Effect of expectation threshold on total clustering metrics.
The number of genes included within paraclusters as a function of the choice of expectation threshold shown for each dataset along with all datasets merged.

Similar articles

Cited by

References

    1. Fisher RA. 1930. The Genetic Theory of Natural Selection, Clarendon Press, Oxford, UK.
    1. Nei M. Modification of linkage intensity by natural selection. Genetics. 1967;57:625–641. - PMC - PubMed
    1. Nei M. Genome evolution: let's stick together. Heredity. 2003;90:411–412. - PubMed
    1. Overbeek R, Fonstein M, D'Souza M, Pusch GD, Maltsev N. The use of gene clusters to infer functional coupling. Proc Natl Acad Sci U S A. 1999;96:2896–2901. - PMC - PubMed
    1. Pan D, Zhang L. Tandemly arrayed genes in vertebrate genomes. Comp Funct Genomics. 2008:545269. - PMC - PubMed

Publication types

Substances