Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Sep 14;4(9):e6978.
doi: 10.1371/journal.pone.0006978.

Distinct gene number-genome size relationships for eukaryotes and non-eukaryotes: gene content estimation for dinoflagellate genomes

Affiliations

Distinct gene number-genome size relationships for eukaryotes and non-eukaryotes: gene content estimation for dinoflagellate genomes

Yubo Hou et al. PLoS One. .

Abstract

The ability to predict gene content is highly desirable for characterization of not-yet sequenced genomes like those of dinoflagellates. Using data from completely sequenced and annotated genomes from phylogenetically diverse lineages, we investigated the relationship between gene content and genome size using regression analyses. Distinct relationships between log(10)-transformed protein-coding gene number (Y') versus log(10)-transformed genome size (X', genome size in kbp) were found for eukaryotes and non-eukaryotes. Eukaryotes best fit a logarithmic model, Y' = ln(-46.200+22.678X', whereas non-eukaryotes a linear model, Y' = 0.045+0.977X', both with high significance (p<0.001, R(2)>0.91). Total gene number shows similar trends in both groups to their respective protein coding regressions. The distinct correlations reflect lower and decreasing gene-coding percentages as genome size increases in eukaryotes (82%-1%) compared to higher and relatively stable percentages in prokaryotes and viruses (97%-47%). The eukaryotic regression models project that the smallest dinoflagellate genome (3x10(6) kbp) contains 38,188 protein-coding (40,086 total) genes and the largest (245x10(6) kbp) 87,688 protein-coding (92,013 total) genes, corresponding to 1.8% and 0.05% gene-coding percentages. These estimates do not likely represent extraordinarily high functional diversity of the encoded proteome but rather highly redundant genomes as evidenced by high gene copy numbers documented for various dinoflagellate species.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Genome sizes, protein-coding gene numbers, and gene-coding percentages of eukaryotic, bacterial, archaea, viral, and organellar genomes.
(A) Genome size (shaded boxes) and number of protein-coding genes (open boxes). Total gene number is very close to protein-coding gene number and is not shown here. (B) Genome gene-coding percentage (fraction of DNA that constitutes genes). The lower and upper boundaries of the box indicate the first and third quartiles (or 25th and 75th percentiles) of each dataset, and the middle line in the box indicates the median value. The whiskers above and below the box indicate the 90th and 10th percentiles.
Figure 2
Figure 2. Distinct relationships between genome features in sequenced eukaryotes and non-eukaryotes.
All correlations were highly significant (p<0.001). (A) Protein-coding gene number vs. genome size regression lines on log scale. Separate regression lines were yielded for eukaryotes (blue circles) and the non-eukaryotes (prokaryotes, viruses, and organelles; other symbols). (B) Gene-coding percentage vs. genome size on log scale. Note the negative trend for the eukaryotic genomes. The projected gene-coding percentage for the smallest (Symbiodinium sp., 1.80%) and largest dinoflagellate (Prorocentrum micans, 0.05%) genomes calculated based on reported average eukaryotic gene length (1.346 kbp) are shown for comparison. The trend for the non-eukaryotes is almost horizontal except for the outliers from some organelles.
Figure 3
Figure 3. Logarithmic regression model for log10-transformed eukaryotic gene number (y′) versus log10-transformed genome size (x′).
Range of dinoflagellate genome size (3×106–245×106 kbp) is indicated by the shaded areas. The predicted gene numbers for the recognized smallest (38,188) and largest (87,688) dinoflagellate genomes correspond to their gene-coding percentages shown in Fig. 2B.

References

    1. Lynch M, Conery JS. The origins of genome complexity. Science. 2003;302:1401–1404. - PubMed
    1. Konstantinidis KT, Tiedje JM. Trends between gene content and genome size in prokaryotic species with larger genomes. Proc Natl Acad Sci U S A. 2004;101:3160–3165. - PMC - PubMed
    1. Gregory TR. Synergy between sequence and size in large-scale genomics. Nature Rev Genet. 2005;6:699–708. - PubMed
    1. Hackett JD, Anderson DM, Erdner DL, Bhattacharya D. Dinoflagellates: a remarkable evolutionary experiment. Am J Bot. 2004;91:1523–1534. - PubMed
    1. Lin S. The smallest dinoflagellate genome is yet to be found: a comment on LaJeunesse, et al. “Symbiodinium (Pyrrhophyta) genome sizes (DNA content) are smallest among dinoflagellates”. J Phycol. 2006;42:746–748.

Publication types