Evolution of Virus-like Features and Intrinsically Disordered Regions in Retrotransposon-derived Mammalian Genes

Affiliations

¹ Scientific Institute IRCCS E. MEDEA, Computational Biology Unit, Bosisio Parini 23842, Italy.
² Shmunis School of Biomedicine and Cancer Research, George S Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel.
³ Department of Biotechnology and Biosciences, University of Milan-Bicocca, Milan 20126, Italy.

PMID: 39101471
PMCID: PMC11299033
DOI: 10.1093/molbev/msae154

Evolution of Virus-like Features and Intrinsically Disordered Regions in Retrotransposon-derived Mammalian Genes

Rachele Cagliani et al. Mol Biol Evol. 2024.

. 2024 Aug 2;41(8):msae154.

doi: 10.1093/molbev/msae154.

Affiliations

¹ Scientific Institute IRCCS E. MEDEA, Computational Biology Unit, Bosisio Parini 23842, Italy.
² Shmunis School of Biomedicine and Cancer Research, George S Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel.
³ Department of Biotechnology and Biosciences, University of Milan-Bicocca, Milan 20126, Italy.

PMID: 39101471
PMCID: PMC11299033
DOI: 10.1093/molbev/msae154

Abstract

Several mammalian genes have originated from the domestication of retrotransposons, selfish mobile elements related to retroviruses. Some of the proteins encoded by these genes have maintained virus-like features; including self-processing, capsid structure formation, and the generation of different isoforms through -1 programmed ribosomal frameshifting. Using quantitative approaches in molecular evolution and biophysical analyses, we studied 28 retrotransposon-derived genes, with a focus on the evolution of virus-like features. By analyzing the rate of synonymous substitutions, we show that the -1 programmed ribosomal frameshifting mechanism in three of these genes (PEG10, PNMA3, and PNMA5) is conserved across mammals and originates alternative proteins. These genes were targets of positive selection in primates, and one of the positively selected sites affects a B-cell epitope on the spike domain of the PNMA5 capsid, a finding reminiscent of observations in infectious viruses. More generally, we found that retrotransposon-derived proteins vary in their intrinsically disordered region content and this is directly associated with their evolutionary rates. Most positively selected sites in these proteins are located in intrinsically disordered regions and some of them impact protein posttranslational modifications, such as autocleavage and phosphorylation. Detailed analyses of the biophysical properties of intrinsically disordered regions showed that positive selection preferentially targeted regions with lower conformational entropy. Furthermore, positive selection introduces variation in binary sequence patterns across orthologues, as well as in chain compaction. Our results shed light on the evolutionary trajectories of a unique class of mammalian genes and suggest a novel approach to study how intrinsically disordered region biophysical characteristics are affected by evolution.

Keywords: conformational features; domesticated gene; intrinsically disordered regions; positive selection; retrotransposon.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Fig. 1.**
Domain structures of retrotransposon-derived proteins. The domain structure of the 28 proteins we analyzed is schematically shown. The shaded areas represent IDRs, as per legend. For proteins resulting from −1 PRF products, the 2 regions corresponding to different Reading frames (RF1 and RF2) are shown. The red arrows denote positively selected sites as obtained from positive selection analysis.

**Fig. 2.**
Analysis of −1 PRF. a) GWIPS-viz visualizations of aggregated ribosome footprint data are shown for *PEG10*, *PNMA3*, *PNMA5*, and *Rtl3*. The region downstream of the putative frameshift site (marked by an asterisk) up to the next −1 frame stop codon is denoted as RF2. For the 3 genes, the enlargements show the distribution of synonymous site variation as obtained from Synplot. The brown line indicates relative dS variability calculated as the ratio of the observed over the expected values of dS in a sliding window of 25 codons. The red line shows the corresponding P-value and the dashed line represents the P-value cutoff. The position of the slippery sequence is marked with an asterisk. b) Sequence conservation of the slippery sequence and its flanking bases. The letter size represents the normalized frequency of each base calculated for the *PEG10*, *PNMA3*, *PNMA5*, and *Rtl3* sequences.

**Fig. 3.**
Evolutionary rates in structured regions and IDRs. a) Correlation between average dN/dS and disorder fraction. b) Codon-wise dN/dS computed for codons in structured regions and for disordered codons. Statistical significance was assessed by the Wilcoxon rank-sum test. c) Codon-wise dN/dS computed for structured regions and disordered codons in different domains. Codons in disordered regions, whether or not overlapping with GAG or other domains, were considered in the disordered fraction. Statistical significance was assessed by Kruskal–Wallis rank sum tests followed by pairwise Nemenyi post hoc tests. *P-value < 0.05; **P-value < 0.01.

**Fig. 4.**
Functional effects of positively selected sites. a) Molecular model of the virus-like capsid assembly of human PNMA5. The structural model of a PNMA5 monomer from the AlphaFold protein structure database (Q96PV4, aa 1-328) was imposed onto the virus-like capsid assembly and color-coded as in Fig. 1. This monomer is reported as an enlarged ribbon representation on the side. Positively selected sites are in red. b) Prediction score plots of linear and discontinuous epitopes determined for the GAG-N spike domain of PNMA5. Positive prediction is in green; the positively selected site is shown with a black line. c) An isolated 5-fold capsomere is highlighted onto the PNMA5 virus-like capsid model. PNMA5 structural domains are color-coded as in Fig. 1. The 5-fold capsomere is also presented as ribbon in both front and side views. Positively selected sites of each monomer are presented as red sticks. In the enlargement, the CA_GAG-N:CA_GAG-N” interface of 2 different PNMA5 molecules is shown with the D177 marked in red. d) Ribbon representation of the molecular structure of the ASPRV1 model from AlphaFold (Q53RT3 in the AlphaFold protein structure database, aa 84-341). Domain are color-coded as in Fig. 1 and numbering refers to the reference protein sequence (NP_690005). The auto-cleaved mature protein is highlighted, whereas the propeptide regions are in transparency. The catalytic site is in green, positively selected sites in red, residues that when mutated cause ichthyosis or alter protease activity are in magenta, a mutation found in a dog with ichthyosis is in blue. e) Hydrophobicity profiles of the PEG10 autocleavage site in representative primates. The phylogenetic relationships and taxonomic classification are reported on the left. As a comparison the cleavage site of the Ty3/Gypsy retrotransposon (GenBank: CAA97115.1) is also reported. The positively selected sites are marked in red and their position is shaded. f) A table showing the four occurrences of phosphorylation sites that include positively selected sites whose substitutions (in bold) impact the motif and can disrupt the phosphorylation. The phosphorylated residue itself is underlined. The motifs match known regular expression patterns from the ELM database, whose motif IDs are: ELME000442, ELME000008, ELME000063.

**Fig. 5.**
Comparison of binary patterns and ensemble features across orthologous IDRs. a) The ensemble conformational properties were calculated for IDRs that are (red) or are not (gray) targeted by positive selection. Boxplots show the mean values (among orthologs) of the Flory scaling exponent (ν) and of conformational entropy per residue (S_conf/N). Statistical significance was assessed using Wilcoxon rank-sum tests. n, number of IDRs b) The same as in (a) but using 30 AA regions that contain (red) or do not contain (gray) one or more positively selected sites. n, number of IDRs c) NARDINI analysis of 2 representative IDRs. The z-score matrices are shown for all available orthologs. Negative z-scores imply that the original sequence is more well mixed with respect to the residue groups compared to the scrambled sequences. Positive z-scores indicate nonrandom segregation between 2 types of residues or a blocky distribution of 1 type of residue. z-scores close to 0 indicate random patterning. A pattern is considered to be nonrandom if the associated z-score is lower than −1.5 or higher than 1.5. Types of residues are categorized as follows: Polar (μ), hydrophobic (h), positively charged (+), negatively charged (−), aromatic (π), alanine (A), proline (P), and glycine (G). Species abbreviations are as follows: *Aotus nancymaae* (aotNan), *Callithrix jacchus* (calJac), *Carlito syrichta* (carSyr), *Cebus imitator* (cebImi), *Cercocebus atys* (cerAty), *Chlorocebus sabaeus* (chlSab), *Colobus angolensis* (colAng), *Gorilla gorilla* (gorGor), *Homo sapiens* (homSap), *Hylobates moloch* (hylMol), *Lemur catta* (lemCat), *Macaca fascicularis* (macFas), *Macaca mulatta* (macMul), *Macaca nemestrina* (macNem), *Macaca thibetana* (macThi), *Mandrillus leucophaeus* (manLeu), *Microcebus murinus* (micMur), *Nomascus leucogenys* (nomLeu), *Nycticebus coucang* (nycCou), *Otolemur garnettii* (otoGar), *Pan paniscus* (panPan), *Pan troglodytes* (panTro), *Papio anubis* (papAnu), *Piliocolobus tephrosceles* (pilTep), *Pongo abelii* (ponAbe), *Pongo pygmaeus* (ponPyg), *Propithecus coquereli* (proCoq), *Rhinopithecus bieti* (rhiBie), *Rhinopithecus roxellana* (rhiRox), *Saimiri boliviensis* (saiBol), *Sapajus apella* (sapApe), *Symphalangus syndactylus* (symSyn), *Theropithecus gelada* (theGel), *Trachypithecus francoisi* (traFra). d) Comparison of variation in binary patterns in positively selected and nonpositively selected IDRs. The boxplot shows the average standard deviation of z-scores from NARDINI analysis. Statistical significance was assessed using the Wilcoxon rank-sum test. n, number of IDRs e) Correlation between variance in binary patterns (average standard deviation of z-scores) and ensemble features (standard deviations of ν and S_conf/N). Red dots correspond to positively selected IDRs, gray dots to IDRs that are not positively selected.

See this image and copyright information in PMC

References

1. Afanasyeva A, Bockwoldt M, Cooney CR, Heiland I, Gossmann TI. Human long intrinsically disordered protein regions are frequent targets of positive selection. Genome Res. 2018:28(7):975–982. 10.1101/gr.232645.117. - DOI - PMC - PubMed
1. Alderson TR, Pritišanac I, Kolarić Đ, Moses AM, Forman-Kay JD. Systematic identification of conditionally folded intrinsically disordered regions by AlphaFold2. Proc Natl Acad Sci USA. 2023:120(44):e2304302120. 10.1073/pnas.2304302120. - DOI - PMC - PubMed
1. Almeida MV, Vernaz G, Putman ALK, Miska EA. Taming transposable elements in vertebrates: from epigenetic silencing to domestication. Trends Genet. 2022:38(6):529–553. 10.1016/j.tig.2022.02.009. - DOI - PubMed
1. Almojil D, Bourgeois Y, Falis M, Hariyani I, Wilcox J, Boissinot S. The structural, functional and evolutionary impact of transposable elements in eukaryotes. Genes (Basel). 2021:12(6):918. 10.3390/genes12060918. - DOI - PMC - PubMed
1. Anisimova M, Bielawski JP, Yang Z. Accuracy and power of Bayes prediction of amino acid sites under positive selection. Mol Biol Evol. 2002:19(6):950–958. 10.1093/oxfordjournals.molbev.a004152. - DOI - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Evolution of Virus-like Features and Intrinsically Disordered Regions in Retrotransposon-derived Mammalian Genes

Affiliations

Evolution of Virus-like Features and Intrinsically Disordered Regions in Retrotransposon-derived Mammalian Genes

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources