. 2006 Dec 12;103(50):19027-32.

doi: 10.1073/pnas.0608796103. Epub 2006 Dec 4.

Almost all human genes resulted from ancient duplication

Roy J Britten¹

Affiliations

PMID: 17146051
PMCID: PMC1748171
DOI: 10.1073/pnas.0608796103

Almost all human genes resulted from ancient duplication

Roy J Britten. Proc Natl Acad Sci U S A. 2006.

. 2006 Dec 12;103(50):19027-32.

doi: 10.1073/pnas.0608796103. Epub 2006 Dec 4.

Author

Roy J Britten¹

Affiliation

¹ California Institute of Technology, 101 Dahlia Avenue, Corona del Mar, CA 92625, USA. r.britten@comcast.net

PMID: 17146051
PMCID: PMC1748171
DOI: 10.1073/pnas.0608796103

Abstract

Results of protein sequence comparison at open criterion show a very large number of relationships that have, up to now, gone unreported. The relationships suggest many ancient events of gene duplication. It is well known that gene duplication has been a major process in the evolution of genomes. A collection of human genes that have known functions have been examined for a history of gene duplications detected by means of amino acid sequence similarity by using BLASTp with an expectation of two or less (open criterion). Because the collection of genes in build 35 includes sets of transcript variants, all genes of known function were collected, and only the longest transcription variant was included, yielding a 13,298-member library called KGMV (for known genes maximum variant). When all lengths of matches are accepted, >97% of human genes show significant matches to each other. Many form matches with a large number of other different proteins, showing that most genes are made up from parts of many others as a result of ancient events of duplication. To support the use of the open criterion, all of the members of the KGMV library were twice replaced with random protein sequences of the same length and average composition, and all were compared with each other with BLASTp at expectation two or less. The set of matches averaged 0.35% of that observed for the KGMV set of proteins.

PubMed Disclaimer

Conflict of interest statement

The author declares no conflict of interest.

Figures

**Fig. 1.**
Percentage of length of proteins in longest matches at two criteria. The y axis represents the percent of length in matches. The x axis represents the percentage of the KGMV library. The lower curve expectation is ≤10⁻³. The upper curve expectation is ≤2. The maximum length matched is plotted for each protein, ordered by percentage of length matched, forming a continuous curve because so many thousands of proteins are plotted.

**Fig. 2.**
The percentage of each of the proteins included in all matches at expectation two or less. The y axis represents the percentage of length of each protein. The x axis represents the percentage of the KGMV library. The lower curve (copied from Fig. 1 for comparison) is the percentage of length covered in single matches. The upper curve is the percentage of length covered in all matches. Each curve is a plot of all of the proteins that are matched ordered independently by the percentage of length matched. For the upper curve, an array was made for all the amino acids in the probe, and each amino acid was marked if newly included in a match. The percentage of length matched is the sum of all marked amino acids times 100 divided by the length of the protein.

**Fig. 3.**
The positions of the alignments with protein (EDD1), NP056986, NM_015902. The individual alignments with this probe were scanned, and any match that included alignment with amino acids not previously matched was plotted, starting at the bottom. There are 34 such matches, and, in 5 cases, the same matching protein was included more than once because the alignments reported by BLASTp were significantly different. The heavy lines are matches with expectation of 10⁻³ or less. The next weight of lines have an expectation equal to one or less and >10⁻³. The two thin lines at the top have an expectation equal to two or less and greater than one.

**Fig. 4.**
Precision of match at open criterion. The x axis represents the percentage of the KGMV library. On the y axis, the upper curves represent the percentage of protein length matched, and for the lower curve, the scale represents the percentage of amino acids matched averaged for 100 proteins each. The proteins have been collected in sets of 100 each to reduce scatter. Other than this exception, the upper curves are identical to those in Fig. 2. The reason for the curious shape at the beginning is that the UNIX sort program ordered on the basis of percent amino acid match all those that were matched for 100% of their length. Except for a few with high percentage length matched, the average percentage amino acid match is ≈32%.

**Fig. 5.**
The percentage of random proteins included in all matches; control for open criterion. The description of this figure is exactly as for Fig. 2, except that the lower curve is from an all-to-all comparison of a 13,298-member random amino acid library matching the KGMV library in length and composition (on average). In this example, there were 22,340 matches among the random amino acid sequences at expectation two or less, whereas there were 5,200,000 matches for the upper curve.

**Fig. 6.**
Coverage of individual amino acids of probes in the many matches. The horizontal scale is the percentage of the KGMV library. The upper curve is identical to the upper curve of Fig. 2, and for this curve the vertical scale is the percentage of the length covered. The lower heavy curve describes the individual amino acids covered, and the right-hand scale for this curve is the percentage of individual amino acids included in the many matches.

See this image and copyright information in PMC

Cited by

Pervasive and persistent redundancy among duplicated genes in yeast.
Dean EJ, Davis JC, Davis RW, Petrov DA. Dean EJ, et al. PLoS Genet. 2008 Jul 4;4(7):e1000113. doi: 10.1371/journal.pgen.1000113. PLoS Genet. 2008. PMID: 18604285 Free PMC article.
The evolutionary dynamics of the Saccharomyces cerevisiae protein interaction network after duplication.
Presser A, Elowitz MB, Kellis M, Kishony R. Presser A, et al. Proc Natl Acad Sci U S A. 2008 Jan 22;105(3):950-4. doi: 10.1073/pnas.0707293105. Epub 2008 Jan 16. Proc Natl Acad Sci U S A. 2008. PMID: 18199840 Free PMC article.
The role of duplications in the evolution of genomes highlights the need for evolutionary-based approaches in comparative genomics.
Levasseur A, Pontarotti P. Levasseur A, et al. Biol Direct. 2011 Feb 18;6:11. doi: 10.1186/1745-6150-6-11. Biol Direct. 2011. PMID: 21333002 Free PMC article. Review.
Phylogenetic and functional characterization of the hAT transposon superfamily.
Arensburger P, Hice RH, Zhou L, Smith RC, Tom AC, Wright JA, Knapp J, O'Brochta DA, Craig NL, Atkinson PW. Arensburger P, et al. Genetics. 2011 May;188(1):45-57. doi: 10.1534/genetics.111.126813. Epub 2011 Mar 2. Genetics. 2011. PMID: 21368277 Free PMC article.
An Overview of Duplicated Gene Detection Methods: Why the Duplication Mechanism Has to Be Accounted for in Their Choice.
Lallemand T, Leduc M, Landès C, Rizzon C, Lerat E. Lallemand T, et al. Genes (Basel). 2020 Sep 4;11(9):1046. doi: 10.3390/genes11091046. Genes (Basel). 2020. PMID: 32899740 Free PMC article. Review.

See all "Cited by" articles

References

1. Ohno S. Evolution by Gene Duplication. New York: Springer; 1970.
1. Britten RJ. Carnegie Institution Yearbook 64. Washington, DC: Carnegie Institution; 1965. p. 333.
1. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Nucleic Acids Res. 1997;25:3389–3402. - PMC - PubMed
1. Zimmer EA, Martin SL, Beverley SM, Kan YW, Wilson AC. Proc Natl Acad Sci USA. 1980;77:2156–2162. - PMC - PubMed
1. Castresana J. Nucleic Acids Res. 2002;30:1751–1756. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Almost all human genes resulted from ancient duplication

Affiliation

Almost all human genes resulted from ancient duplication

Author

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources