Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2003;4(2):401.
doi: 10.1186/gb-2003-4-2-401. Epub 2003 Jan 28.

Myriads of protein families, and still counting

Affiliations

Myriads of protein families, and still counting

Victor Kunin et al. Genome Biol. 2003.

Abstract

From the historical record of genome sequencing, we show that the rate of discovery of new families has remained constant over time, indicating that our knowledge of sequence space is far from complete.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The number of unique protein families accumulated from genome projects. Families were obtained by clustering proteins from complete genomes with the TRIBE-MCL algorithm (inflation value 1.1). Species with the largest contributions are indicated. All data and supplementary information are available at [9].
Figure 2
Figure 2
Size distribution of protein families in relation to the time of their discovery. The x-axis represents the time of discovery of the founding member of a family; the y-axis represents frequency (on a logarithmic scale); each circle represents the number of protein families corresponding to the value on the y-axis; and the area of each circle corresponds to family size. It is notable that some of the largest families were founded early, but large families are still being discovered. Recently discovered small families (upper right) are expected to grow with better sampling of protein space.

References

    1. Chothia C. One thousand families for the molecular biologist. Nature. 1992;357:543–544. - PubMed
    1. Vitkup D, Melamud E, Moult J, Sander C. Completeness in structural genomics. Nat Struct Biol. 2001;8:559–566. - PubMed
    1. Fischer D, Eisenberg D. Finding families for genomic ORFans. Bioinformatics. 1999;15:759–762. - PubMed
    1. Enright AJ, Van Dongen S, Ouzounis CA. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 2002;30:1575–1584. - PMC - PubMed
    1. Iliopoulos I, Tsoka S, Andrade MA, Janssen P, Audit B, Tramontano A, Valencia A, Leroy C, Sander C, Ouzounis CA. Genome sequences and great expectations. Genome Biol. 2001;2:interactions0001.1–0001.3. - PMC - PubMed