Clusters of ancestrally related genes that show paralogy in whole or in part are a major feature of the genomes of humans and other species

Michael B Walker¹, Benjamin L King, Kenneth Paigen

Affiliations

PMID: 22563380
PMCID: PMC3338513
DOI: 10.1371/journal.pone.0035274

Clusters of ancestrally related genes that show paralogy in whole or in part are a major feature of the genomes of humans and other species

Michael B Walker et al. PLoS One. 2012.

. 2012;7(4):e35274.

doi: 10.1371/journal.pone.0035274. Epub 2012 Apr 26.

Authors

Michael B Walker¹, Benjamin L King, Kenneth Paigen

Affiliation

¹ The Jackson Laboratory, Bar Harbor, Maine, United States of America.

PMID: 22563380
PMCID: PMC3338513
DOI: 10.1371/journal.pone.0035274

Abstract

Arrangements of genes along chromosomes are a product of evolutionary processes, and we can expect that preferable arrangements will prevail over the span of evolutionary time, often being reflected in the non-random clustering of structurally and/or functionally related genes. Such non-random arrangements can arise by two distinct evolutionary processes: duplications of DNA sequences that give rise to clusters of genes sharing both sequence similarity and common sequence features and the migration together of genes related by function, but not by common descent. To provide a background for distinguishing between the two, which is important for future efforts to unravel the evolutionary processes involved, we here provide a description of the extent to which ancestrally related genes are found in proximity.Towards this purpose, we combined information from five genomic datasets, InterPro, SCOP, PANTHER, Ensembl protein families, and Ensembl gene paralogs. The results are provided in publicly available datasets (http://cgd.jax.org/datasets/clustering/paraclustering.shtml) describing the extent to which ancestrally related genes are in proximity beyond what is expected by chance (i.e. form paraclusters) in the human and nine other vertebrate genomes, as well as the D. melanogaster, C. elegans, A. thaliana, and S. cerevisiae genomes. With the exception of Saccharomyces, paraclusters are a common feature of the genomes we examined. In the human genome they are estimated to include at least 22% of all protein coding genes. Paraclusters are far more prevalent among some gene families than others, are highly species or clade specific and can evolve rapidly, sometimes in response to environmental cues. Altogether, they account for a large portion of the functional clustering previously reported in several genomes.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

**Figure 1. Frequency distribution of distances between structurally related genes.**
(a) Frequencies of distances between structurally related genes across the human genome (blue) as compared to the same data generated from randomly permuted genomes (red). (b) Frequencies of distances to the nearest structurally related gene for all human genes (blue) as compared to the same data generated from randomly permuted genomes (red).

**Figure 2. Paracluster sizes.**
(a) Total gene count in paraclusters as a function of paracluster size measured as the number of paralogs in each paracluster. Gene counts include (blue) and exclude (red) interstitial genes. The parenthetical text provides the number of paraclusters having the corresponding size. (b) Cumulative frequency distribution of distances (spans) between genes sharing the same paracluster.

**Figure 3. Metrics based on whole gene sequence similarity versus common domains.**
Frequencies of distances between genes within paraclusters as described by the combination of data from PANTHER, Ensembl family and Ensembl paralogy datasets (red) and as described by the combination of a data from SCOP and InterPro datasets (blue). The numbers in parentheses are the number of paraclusters with only two genes.

**Figure 4. Species comparison of genome wide clustering metrics.**
Cumulative frequency distribution of the percentage of genes in paraclusters within the genomes of a selected set of species as a function of paracluster sizes.

**Figure 5. Extent of clustering within gene families.**
Frequencies of Ensembl gene family counts grouped by the percentage of family members contained within paraclusters (only families with 10 or more family members were counted).

**Figure 6. Effect of expectation threshold on total clustering metrics.**
The number of genes included within paraclusters as a function of the choice of expectation threshold shown for each dataset along with all datasets merged.

See this image and copyright information in PMC

References

1. Fisher RA. 1930. The Genetic Theory of Natural Selection, Clarendon Press, Oxford, UK.
1. Nei M. Modification of linkage intensity by natural selection. Genetics. 1967;57:625–641. - PMC - PubMed
1. Nei M. Genome evolution: let's stick together. Heredity. 2003;90:411–412. - PubMed
1. Overbeek R, Fonstein M, D'Souza M, Pusch GD, Maltsev N. The use of gene clusters to infer functional coupling. Proc Natl Acad Sci U S A. 1999;96:2896–2901. - PMC - PubMed
1. Pan D, Zhang L. Tandemly arrayed genes in vertebrate genomes. Comp Funct Genomics. 2008:545269. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Clusters of ancestrally related genes that show paralogy in whole or in part are a major feature of the genomes of humans and other species

Affiliation

Clusters of ancestrally related genes that show paralogy in whole or in part are a major feature of the genomes of humans and other species

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Molecular Biology Databases