Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2023 Sep 17:2023.09.17.558119.
doi: 10.1101/2023.09.17.558119.

The diverse evolutionary histories of domesticated metaviral capsid genes in mammals

Affiliations

The diverse evolutionary histories of domesticated metaviral capsid genes in mammals

William S Henriques et al. bioRxiv. .

Update in

Abstract

Selfish genetic elements and their remnants comprise at least half of the human genome. Active transposons duplicate by inserting copies at new sites in a host genome. Following insertion, transposons can acquire mutations that render them inactive; the accrual of additional mutations can render them unrecognizable over time. However, in rare instances, segments of transposons become useful for the host, in a process called gene domestication. Using the first complete human genome assembly and 25 additional vertebrate genomes, we analyzed the evolutionary trajectories and functional potential of genes domesticated from the capsid genes of Metaviridae, a retroviral-like retrotransposon family. Our analysis reveals four families of domesticated capsid genes in placental mammals with varied evolutionary outcomes, ranging from universal retention to lineage-specific duplications or losses and from purifying selection to lineage-specific rapid evolution. The four families of domesticated capsid genes have divergent amino-terminal domains, inherited from four distinct ancestral metaviruses. Structural predictions reveal that many domesticated genes encode a previously unrecognized RNA-binding domain retained in multiple paralogs in mammalian genomes both adjacent to and independent from the capsid domain. Collectively, our study reveals diverse outcomes of domestication of diverse metaviruses, which led to structurally and evolutionarily diverse genes that encode important, but still largely-unknown functions in placental mammals. (207).

Keywords: LTR retrotransposon; PNMA; RNA-binding; SIRH; capsid; exaptation; gene conservation; positive selection.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.. Full-length capsid-like ORFs in the human genome.
The human genome contains metaviral capsid ORFs that are divergent and present in low copy number. We generated a maximum-likelihood phylogenetic tree of 212 full-length capsid-like ORFs in the human genome. Hidden Markov Model profile searches identified capsid from three classes of endogenous retroviruses (grey branches), and metavirus retrotransposons (red branches). Maximum likelihood-based support values at selected nodes were calculated in FastTree2.
Figure 2.
Figure 2.. Metaviral-derived genes show distinct evolutionary trajectories across placental mammals.
Some full-length metaviral-derived capsid genes are universally retained across placental mammals, whereas others have experienced lineage-specific loss, pseudogenization or duplication events. Filled squares represent intact genes and squares containing a cross represent sequences with obvious inactivating mutations (frameshifts and/or premature stops). Gray boxes represent sequences that are truncated due to genome assembly gaps, and ‘-’ symbols represent cases where we could find no matching sequence at all. A known species tree is shown on the left and was obtained by pruning whole-genome trees available via the UCSC genome browser. Open boxes represent marsupial sequences identified in our vertebrate genome scan and are aligned beneath their top BLAST hit, which are not necessarily orthologs. Marsupial domesticated genes have been comprehensively addressed elsewhere (Ono et al. 2011; Iwasaki et al. 2013).
Figure 3.
Figure 3.. Four independent metavirus domestication events include structurally distinct N-terminal domains.
A. A maximum likelihood phylogenetic tree of 781 capsid sequences from 24 vertebrate species. The tree includes 119 domesticated metaviral capsid genes found in diverse mammals (dark purple, highlighted) and 662 metaviral capsid-like ORFs (light pink) from selected non-mammalian vertebrate species (chicken (n=1), alligator (n=13), painted turtle (n=32), anole lizard (n=310), African clawed frog (n=291) and coelacanth (n=15). Maximum likelihood-based support values at selected nodes were calculated in FastTree2. B-E. Domain architecture (not to scale) of human Metaviridae-derived capsid genes and the closest available consensus metaviral sequence from Repbase, organized according to major clades in the tree shown in panel A. Colored boxes indicate domains within each open reading frame identified by HMM profile searches, structural prediction, and structural homology searches.
Figure 4.
Figure 4.. A conserved RNA-binding domain in the PNMA family and related metaviruses.
A. A metaviral-derived RNA-binding domain is present N-terminal to the capsid domain in nine domesticated genes and in isolation in four additional genes (human gene architectures shown). B. AlphaFold structural prediction of PNMA1 (blue), shown alone (left) as well as superimposed on an experimentally determined structure (gray, right) of the telomerase p65 protein’s RNA binding domain (PDB: 7LMA) (RMSD between 41 Cα atoms is 1.2 å and 6.4 Å across all 73 equivalently positioned Cα atoms) C. Retention of RBD-only genes in placental mammals. Filled squares represent intact genes and squares containing a cross represent sequences with obvious inactivating mutations (frameshifts and/or premature stops). Gray boxes represent sequences that are truncated due to genome assembly gaps, and ‘-’ symbols represent cases where we could find no matching sequence at all. A known species tree is shown on the left and was obtained by pruning whole-genome trees available via the UCSC genome browser

Similar articles

References

    1. Abascal F, Zardoya R, Posada D. 2005. ProtTest: selection of best-fit models of protein evolution. Bioinformatics 21:2104–2105. - PubMed
    1. Abed M, Verschueren E, Budayeva H, Liu P, Kirkpatrick DS, Reja R, Kummerfeld SK, Webster JD, Gierke S, Reichelt M, et al. 2019. The Gag protein PEG10 binds to RNA and regulates trophoblast stem cell lineage specification. PLOS ONE 14:e0214110. - PMC - PubMed
    1. Acton O, Grant T, Nicastro G, Ball NJ, Goldstone DC, Robertson LE, Sader K, Nans A, Ramos A, Stoye JP, et al. 2019. Structural basis for Fullerene geometry in a human endogenous retrovirus capsid. Nature Communications 10:5822. - PMC - PubMed
    1. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402. - PMC - PubMed
    1. Anisimova M, Nielsen R, Yang Z. 2003. Effect of recombination on the accuracy of the likelihood method for detecting positive selection at amino acid sites. Genetics 164:1229–1236. - PMC - PubMed

Publication types