Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Feb;28(2):326-44.
doi: 10.1105/tpc.15.00877. Epub 2016 Jan 7.

Gene Duplicability of Core Genes Is Highly Consistent across All Angiosperms

Affiliations

Gene Duplicability of Core Genes Is Highly Consistent across All Angiosperms

Zhen Li et al. Plant Cell. 2016 Feb.

Abstract

Gene duplication is an important mechanism for adding to genomic novelty. Hence, which genes undergo duplication and are preserved following duplication is an important question. It has been observed that gene duplicability, or the ability of genes to be retained following duplication, is a nonrandom process, with certain genes being more amenable to survive duplication events than others. Primarily, gene essentiality and the type of duplication (small-scale versus large-scale) have been shown in different species to influence the (long-term) survival of novel genes. However, an overarching view of "gene duplicability" is lacking, mainly due to the fact that previous studies usually focused on individual species and did not account for the influence of genomic context and the time of duplication. Here, we present a large-scale study in which we investigated duplicate retention for 9178 gene families shared between 37 flowering plant species, referred to as angiosperm core gene families. For most gene families, we observe a strikingly consistent pattern of gene duplicability across species, with gene families being either primarily single-copy or multicopy in all species. An intermediate class contains gene families that are often retained in duplicate for periods extending to tens of millions of years after whole-genome duplication, but ultimately appear to be largely restored to singleton status, suggesting that these genes may be dosage balance sensitive. The distinction between single-copy and multicopy gene families is reflected in their functional annotation, with single-copy genes being mainly involved in the maintenance of genome stability and organelle function and multicopy genes in signaling, transport, and metabolism. The intermediate class was overrepresented in regulatory genes, further suggesting that these represent putative dosage-balance-sensitive genes.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Angiosperm Species Tree. Phylogenetic tree depicting the relationships among the 37 angiosperm genomes used in this article. The tree topology was inferred from a concatenated alignment based on 107 almost single-copy gene families (see Methods). Numbers on the branches represent bootstrap supports (* for 100%), internode certainty (IC), and internode certainty all (ICA), respectively. WGD events were inferred from literature (Jiao et al., 2014; Vanneste et al., 2014a) and are depicted by stars. Only WGD duplications were considered that are more recent than the angiosperm common ancestor.
Figure 2.
Figure 2.
Overall Distribution of Single-Copy Percentage for All Angiosperm Core Gene Families. The distribution depicts the degree to which the 9178 core gene families are single copy in the 37 angiosperm species investigated. The x axis represents, for each gene family, the percentage of species with exactly one gene copy with respect to the total number of species in the family. The distribution illustrates a very strong tendency of angiosperm core gene families toward the single-copy state. The mode (87.5%) and the mean (66.8%) of the distribution are indicated by red and green lines, respectively. The observed distribution strongly deviates from the expected distribution under a stochastic duplicate BD model (depicted by dashed lines).
Figure 3.
Figure 3.
Duplicate Gene Retention in Function of Time Since WGD. Each dot represents the fraction of core gene families with retained duplicates following a specific WGD (y axis), as a function of WGD age, expressed in Ks units (x axis). The timing of the WGD events and the particular gene families that retained duplicates following a specific WGD event were inferred by fitting Gaussian mixture models to Ks age distributions for all 37 species separately (see Methods). As such, each point represents a species-specific estimate for a WGD and WGD events shared by multiple descendant species will be represented by multiple data points that cannot be regarded as being independent. SSD-related peaks and dubious WGD peak callings were omitted. Additional information on all the peaks can be found in Supplemental Table 2 and Supplemental Figure 7. A power-law function was fitted to the data (χ2 goodness-of-fit = 0.77, P = 1).
Figure 4.
Figure 4.
Core Gene Families Partition into Three Groups Based on Clustering of the Copy-Number Profile Data. (A) Heat map of the clustered copy number profile matrix. Rows represent species and columns represent the core gene families. Gene families (columns) are sorted according to the three different groups obtained by k-means clustering. Symbols indicate for each species whether WGD events that might have contributed to duplicates in the species fall into the “recent” (rectangle), “K-Pg boundary” (circle), or “ancient” (triangle) category. (B) SCP distributions for the gene families in each of the three different groups. The distribution of the Full Group shows the SCP distribution of all core gene families together (cf. Figure 2).
Figure 5.
Figure 5.
Analyses of Duplication Events of the Three Groups. (A) For each of the clusters in Figure 4, power-law functions were fitted to the corresponding data points representing the fraction of core gene families with retained duplicates following a particular WGD (y axis) as a function of WGD age (x axis), as in Figure 3 (χ2 goodness-of-fit single-copy group = 0.52, P = 1; χ2 goodness-of-fit intermediate group = 1.38, P = 1; χ2 goodness-of-fit multicopy group = 1.83, P = 1). The “full set” curve corresponds to the curve represented in Figure 3. (B) Polar diagram depicting the fraction of duplication events in each gene family group belonging to either “recent,” “K-Pg boundary,” “ancient” WGDs, or “SSD” events. Here, predicted duplication events were inferred based on gene tree–species tree reconciliation. Green and red asterisks denote statistically significant over- and underrepresentation, respectively, of duplicates of a certain class for a specific group, comparing each time the number of associated duplications for each group with that of the full set (gray bar) by Fisher’s exact test. Similar results were obtained using predicted duplication events inferred using Gaussian mixture modeling of Ks distributions (Supplemental Figure 10).
Figure 6.
Figure 6.
Functional Analyses of the Three Different Groups. (A) GOSlim enrichments and underrepresentations calculated for the Arabidopsis genes in each of the three gene family groups in Figure 4. Dot sizes are representative for the statistical significance of over- (green) or underrepresentation (red). (B) Enrichment analysis of the three gene family groups for knockout mutant phenotype annotations (Lloyd and Meinke, 2012). Bars represent overrepresentation (positive values) or underrepresentation (negative values) of knockout phenotypes belonging to any of four functional categories (bar colors). Asterisks denote significance levels as calculated by Fisher’s exact test (***P < 0.001 and **P < 0.05).

Similar articles

Cited by

References

    1. Alvarez-Ponce D., Fares M.A. (2012). Evolutionary rate and duplicability in the Arabidopsis thaliana protein-protein interaction network. Genome Biol. Evol. 4: 1263–1274. - PMC - PubMed
    1. Amborella Genome Project (2013). The Amborella genome and the evolution of flowering plants. Science 342: 1241089. - PubMed
    1. Anisimova M., Gascuel O. (2006). Approximate likelihood-ratio test for branches: A fast, accurate, and powerful alternative. Syst. Biol. 55: 539–552. - PubMed
    1. Antoni R., Gonzalez-Guzman M., Rodriguez L., Peirats-Llobet M., Pizzio G.A., Fernandez M.A., De Winne N., De Jaeger G., Dietrich D., Bennett M.J., Rodriguez P.L. (2013). PYRABACTIN RESISTANCE1-LIKE8 plays an important role for the regulation of abscisic acid signaling in root. Plant Physiol. 161: 931–941. - PMC - PubMed
    1. Armisén D., Lecharny A., Aubourg S. (2008). Unique genes in plants: specificities and conserved features throughout evolution. BMC Evol. Biol. 8: 280. - PMC - PubMed

Publication types