Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Feb 12:12:1400.
doi: 10.12688/f1000research.141786.2. eCollection 2023.

Models for the retention of duplicate genes and their biological underpinnings

Affiliations
Review

Models for the retention of duplicate genes and their biological underpinnings

Raquel Assis et al. F1000Res. .

Abstract

Gene content in genomes changes through several different processes, with gene duplication being an important contributor to such changes. Gene duplication occurs over a range of scales from individual genes to whole genomes, and the dynamics of this process can be context dependent. Still, there are rules by which genes are retained or lost from genomes after duplication, and probabilistic modeling has enabled characterization of these rules, including their context-dependence. Here, we describe the biology and corresponding mathematical models that are used to understand duplicate gene retention and its contribution to the set of biochemical functions encoded in a genome.

Keywords: Markov model; gene duplication; phylogenetic analysis; probabilistic modeling; synteny; theoretical biology.

PubMed Disclaimer

Conflict of interest statement

No competing interests were disclosed.

Figures

Figure 1.
Figure 1.. To model the evolution of gene duplicates, Stark et al. (2017) constructed a Markov chain with state space {0,1, …,z-1,S,P} and generator Q where z is the number of regulatory regions within the gene, and S and P is are the subfunctionalization and the pseudogenization (absorbing) state, respectively.
In the above example of transitions with z=4, the regions hit by null mutations are in red, and the regions protected by selective pressure are in yellow. This figure is adapted from Stark et al. (2017), which was published under an open access license.
Figure 2.
Figure 2.. A region of ten ancestral genes duplicated through the teleost-specific genome duplication (TGD).
Shown in the center in gray are the ten genes as they are found in the genome of the spotted gar ( L. oculatus), which lacks the TGD. The paralogous regions created by the TGD in the eight genomes possessing it are then shown above and below the gar genes. The lines joining pairs of genes indicate that these genes are neighbors in the genome (i.e., they are in synteny). After the TGD, some duplicates survive in all (pink) or some (tan) genomes, while others have been returned to single copy, either from the subgenome with more surviving genes (blue) or than with fewer (green). Numbers at the top of each column/pillar are the orthology confidence estimates from POInT. In other words, this figure gives the confidence for placing the genes in this orthology state relative to the other 2 8-1=255 orthology configurations. Genes are shown with their Ensemble identifiers for reference. This figure is an original figure produced by the authors for this review article.
Figure 3.
Figure 3.. Modeling duplicate gene loss after polyploidy.
A) Following Lewis (2001), a discrete state model M allows an ancestral position to be duplicated ( D), single copy ( S 1 or S 2 ) or a fixed duplicate ( D f ). Transitions between these states occur at rates proportional to model parameters α, ɛ, and γ. Losses occur along an assumed phylogenetic tree t with branch lengths l 1..l t. The extant genomes are phased into a series of homologous columns or pillars: each genome may have one or two homologs present at a pillar (a state for complete homolog absence will be added to future versions of POInT). Different parental subgenomes within an extant genome can be distinguished (orange verses tan) but subgenome identities between the genomes are unknown. B) For N = 2 polyploid genomes, there are 2 N possible orthology relations. At each pillar i, we can compute the likelihood of the observed gene presence and absence data for a given orthology pattern XX using the model M and the tree t: L i xx |M,t. C) Using the synteny relationships, the values L i 00 |M,t .. L i 11 |M,t can be conditioned on L i-1 00 |M,t .. L i-1 11 |M,t with a transition probability matrix Θ. The elements of Θ depend on Θ i,g, where i is the pillar number and g is the genome. If synteny in maintained between pillars i and i+1 for genome g, Θ i,g= Θ M, a global constant estimated by maximum likelihood (0≤Θ M≤1). Otherwise Θ i,g=0.5, meaning the orthology pattern at i is independent of that at i-1. This equation can be applied recursively to compute the likelihood of the entire dataset with standard hidden Markov model approaches : the ⨀ operator represents an element-wise vector product. The tree branch lengths and model parameters are estimated from the data by maximum likelihood using standard numerical techniques. This figure is an original figure produced by the authors for this review article.

References

    1. Katju V, Lynch M: The structure and early evolution of recently arisen gene duplicates in the Caenorhabditis elegans genome. Genetics. 2003 Dec;165(4):1793–1803. 10.1093/genetics/165.4.1793 - DOI - PMC - PubMed
    1. Katju V, Lynch M: On the formation of novel genes by duplication in the Caenorhabditis elegans genome. Mol. Biol. Evol. 2006 May;23(5):1056–1067. 10.1093/molbev/msj114 - DOI - PubMed
    1. Cusack BP, Wolfe KH: Not born equal: increased rate asymmetry in relocated and retrotransposed rodent gene duplicates. Mol. Biol. Evol. 2007 Mar;24(3):679–686. 10.1093/molbev/msl199 - DOI - PubMed
    1. Hakes L, Pinney JW, Lovell SC, et al. : All duplicates are not equal: the difference between small-scale and genome duplication. Genome Biol. 2007;8(10):R209. 10.1186/gb-2007-8-10-r209 - DOI - PMC - PubMed
    1. Cardoso-Moreira M, Arguello JR, Gottipati S, et al. : Evidence for the fixation of gene duplications by positive selection in Drosophila. Genome Res. 2016 Jun;26(6):787–798. 10.1101/gr.199323.115 - DOI - PMC - PubMed

LinkOut - more resources