Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2002 Aug 1;30(15):3378-86.
doi: 10.1093/nar/gkf449.

GenomeHistory: a software tool and its application to fully sequenced genomes

Affiliations

GenomeHistory: a software tool and its application to fully sequenced genomes

Gavin C Conant et al. Nucleic Acids Res. .

Abstract

We present a publicly available software tool (http://www.unm.edu/~compbio/software/GenomeHistory) that identifies all pairs of duplicate genes in a genome and then determines the degree of synonymous and non-synonymous divergence between each duplicate pair. Using this tool, we analyze the relations between (i) gene function and the propensity of a gene to duplicate and (ii) the number of genes in a gene family and the family's rate of sequence evolution. We do so for the complete genomes of four eukaryotes (fission and budding yeast, fruit fly and nematode) and one prokaryote (Escherichia coli). For some classes of genes we observe a strong relationship between gene function and a gene's propensity to undergo duplication. Most notably, ribosomal genes and transcription factors appear less likely to undergo gene duplication than other genes. In both fission and budding yeast, we see a strong positive correlation between the selective constraint on a gene and the size of the gene family of which this gene is a member. In contrast, a weakly negative such correlation is seen in multicellular eukaryotes.

PubMed Disclaimer

Figures

Figure 1
Figure 1
(Following page) Distribution of genes among functional categories for five organisms. Genes were divided into three groups: single copy genes, genes with one duplicate and genes with more than one duplicate. Proportions significantly different from the overall distribution at a Bonferroni significance level of 0.05 are marked with arrows. (A) Saccharomyces cerevisiae (2077 total genes); (B) S.pombe (2298 total genes); (C) D.melanogaster (2181 total genes); (D) C.elegans (3417 total genes); (E) E.coli (2609 total genes).
Figure 1
Figure 1
(Following page) Distribution of genes among functional categories for five organisms. Genes were divided into three groups: single copy genes, genes with one duplicate and genes with more than one duplicate. Proportions significantly different from the overall distribution at a Bonferroni significance level of 0.05 are marked with arrows. (A) Saccharomyces cerevisiae (2077 total genes); (B) S.pombe (2298 total genes); (C) D.melanogaster (2181 total genes); (D) C.elegans (3417 total genes); (E) E.coli (2609 total genes).
Figure 1
Figure 1
(Following page) Distribution of genes among functional categories for five organisms. Genes were divided into three groups: single copy genes, genes with one duplicate and genes with more than one duplicate. Proportions significantly different from the overall distribution at a Bonferroni significance level of 0.05 are marked with arrows. (A) Saccharomyces cerevisiae (2077 total genes); (B) S.pombe (2298 total genes); (C) D.melanogaster (2181 total genes); (D) C.elegans (3417 total genes); (E) E.coli (2609 total genes).
Figure 1
Figure 1
(Following page) Distribution of genes among functional categories for five organisms. Genes were divided into three groups: single copy genes, genes with one duplicate and genes with more than one duplicate. Proportions significantly different from the overall distribution at a Bonferroni significance level of 0.05 are marked with arrows. (A) Saccharomyces cerevisiae (2077 total genes); (B) S.pombe (2298 total genes); (C) D.melanogaster (2181 total genes); (D) C.elegans (3417 total genes); (E) E.coli (2609 total genes).
Figure 1
Figure 1
(Following page) Distribution of genes among functional categories for five organisms. Genes were divided into three groups: single copy genes, genes with one duplicate and genes with more than one duplicate. Proportions significantly different from the overall distribution at a Bonferroni significance level of 0.05 are marked with arrows. (A) Saccharomyces cerevisiae (2077 total genes); (B) S.pombe (2298 total genes); (C) D.melanogaster (2181 total genes); (D) C.elegans (3417 total genes); (E) E.coli (2609 total genes).
Figure 2
Figure 2
Average Ka/Ks for genes in different functional categories for S.cerevisiae, S.pombe, D.melanogaster and C.elegans. Blanks indicate cases where no duplicates met the selection criteria (Ks < 3, Ka < 0.75, Ka/Ks < 1).
Figure 3
Figure 3
(Opposite and above) Statistical association between the number of members of a gene family and selective constraints on sequence evolution, as indicated by the ratio Ka/Ks averaged over all family members. (A) Saccharomyces cerevisiae. Seripauperin genes are highlighted based on their sequence similarity (BLASTP E < 10–17) to ORF YJL223C. (B) Schizosaccharomyces pombe. (C) Drosophila melanogaster. (D) Caenorhabditis elegans. Major sperm family proteins highlighted based on similarity to gene MSP-36 (C04G2.4) (BLASTP E < 10–6). (E) Escherichia coli.
Figure 3
Figure 3
(Opposite and above) Statistical association between the number of members of a gene family and selective constraints on sequence evolution, as indicated by the ratio Ka/Ks averaged over all family members. (A) Saccharomyces cerevisiae. Seripauperin genes are highlighted based on their sequence similarity (BLASTP E < 10–17) to ORF YJL223C. (B) Schizosaccharomyces pombe. (C) Drosophila melanogaster. (D) Caenorhabditis elegans. Major sperm family proteins highlighted based on similarity to gene MSP-36 (C04G2.4) (BLASTP E < 10–6). (E) Escherichia coli.
Figure 3
Figure 3
(Opposite and above) Statistical association between the number of members of a gene family and selective constraints on sequence evolution, as indicated by the ratio Ka/Ks averaged over all family members. (A) Saccharomyces cerevisiae. Seripauperin genes are highlighted based on their sequence similarity (BLASTP E < 10–17) to ORF YJL223C. (B) Schizosaccharomyces pombe. (C) Drosophila melanogaster. (D) Caenorhabditis elegans. Major sperm family proteins highlighted based on similarity to gene MSP-36 (C04G2.4) (BLASTP E < 10–6). (E) Escherichia coli.
Figure 3
Figure 3
(Opposite and above) Statistical association between the number of members of a gene family and selective constraints on sequence evolution, as indicated by the ratio Ka/Ks averaged over all family members. (A) Saccharomyces cerevisiae. Seripauperin genes are highlighted based on their sequence similarity (BLASTP E < 10–17) to ORF YJL223C. (B) Schizosaccharomyces pombe. (C) Drosophila melanogaster. (D) Caenorhabditis elegans. Major sperm family proteins highlighted based on similarity to gene MSP-36 (C04G2.4) (BLASTP E < 10–6). (E) Escherichia coli.
Figure 3
Figure 3
(Opposite and above) Statistical association between the number of members of a gene family and selective constraints on sequence evolution, as indicated by the ratio Ka/Ks averaged over all family members. (A) Saccharomyces cerevisiae. Seripauperin genes are highlighted based on their sequence similarity (BLASTP E < 10–17) to ORF YJL223C. (B) Schizosaccharomyces pombe. (C) Drosophila melanogaster. (D) Caenorhabditis elegans. Major sperm family proteins highlighted based on similarity to gene MSP-36 (C04G2.4) (BLASTP E < 10–6). (E) Escherichia coli.

References

    1. Ohno S. (1970) Evolution by Gene Duplication. Springer, New York, NY.
    1. Iwabe N., Kuma,K. and Miyata,T. (1996) Evolution of gene families and relationship with organismal evolution: rapid divergence of tissue-specific genes in the early evolution of chordates. Mol. Biol. Evol., 13, 483–493. - PubMed
    1. Lundin L. (1999) Gene duplications in early metazoan evolution. Cell Dev. Biol., 10, 523–530. - PubMed
    1. Li W.-H. (1980) Rate of gene silencing at duplicate loci: a theoretical study and interpretation of data from tetraploid fish. Genetics, 95, 237–258. - PMC - PubMed
    1. Nei M. and Roychoudhury,A.K. (1973) Probability of fixation of nonfunctional genes at duplicate loci. Am. Nat., 107, 362–372.

Publication types