. 2019 Jul 30;19(1):158.

doi: 10.1186/s12862-019-1464-6.

Reduced alphabet of prebiotic amino acids optimally encodes the conformational space of diverse extant protein folds

Armando D Solis¹

Affiliations

Affiliation

¹ Biological Sciences Department, New York City College of Technology (City Tech), The City University of New York (CUNY), 285 Jay Street, Brooklyn, NY, 11201, USA. asolis@citytech.cuny.edu.

PMID: 31362700
PMCID: PMC6668081
DOI: 10.1186/s12862-019-1464-6

Reduced alphabet of prebiotic amino acids optimally encodes the conformational space of diverse extant protein folds

Armando D Solis. BMC Evol Biol. 2019.

. 2019 Jul 30;19(1):158.

doi: 10.1186/s12862-019-1464-6.

Author

Armando D Solis¹

Affiliation

¹ Biological Sciences Department, New York City College of Technology (City Tech), The City University of New York (CUNY), 285 Jay Street, Brooklyn, NY, 11201, USA. asolis@citytech.cuny.edu.

PMID: 31362700
PMCID: PMC6668081
DOI: 10.1186/s12862-019-1464-6

Abstract

Background: There is wide agreement that only a subset of the twenty standard amino acids existed prebiotically in sufficient concentrations to form functional polypeptides. We ask how this subset, postulated as {A,D,E,G,I,L,P,S,T,V}, could have formed structures stable enough to found metabolic pathways. Inspired by alphabet reduction experiments, we undertook a computational analysis to measure the structural coding behavior of sequences simplified by reduced alphabets. We sought to discern characteristics of the prebiotic set that would endow it with unique properties relevant to structure, stability, and folding.

Results: Drawing on a large dataset of single-domain proteins, we employed an information-theoretic measure to assess how well the prebiotic amino acid set preserves fold information against all other possible ten-amino acid sets. An extensive virtual mutagenesis procedure revealed that the prebiotic set excellently preserves sequence-dependent information regarding both backbone conformation and tertiary contact matrix of proteins. We observed that information retention is fold-class dependent: the prebiotic set sufficiently encodes the structure space of α/β and α + β folds, and to a lesser extent, of all-α and all-β folds. The prebiotic set appeared insufficient to encode the small proteins. Assessing how well the prebiotic set discriminates native vs. incorrect sequence-structure matches, we found that α/β and α + β folds exhibit more pronounced energy gaps with the prebiotic set than with nearly all alternatives.

Conclusions: The prebiotic set optimally encodes local backbone structures that appear in the folded environment and near-optimally encodes the tertiary contact matrix of extant proteins. The fold-class-specific patterns observed from our structural analysis confirm the postulated timeline of fold appearance in proteogenesis derived from proteomic sequence analyses. Polypeptides arising in a prebiotic environment will likely form α/β and α + β-like folds if any at all. We infer that the progressive expansion of the alphabet allowed the increased conformational stability and functional specificity of later folds, including all-α, all-β, and small proteins. Our results suggest that prebiotic sequences are amenable to mutations that significantly lower native conformational energies and increase discrimination amidst incorrect folds. This property may have assisted the genesis of functional proto-enzymes prior to the expansion of the full amino acid alphabet.

Keywords: Information theory; Mutual information; Prebiotic amino acids; Protein backbone conformation; Protein evolution; Protein structure; Proteogenesis; Reduced amino acid alphabets; Residue contacts.

PubMed Disclaimer

Conflict of interest statement

The author declares that he has no competing interests.

Figures

**Fig. 1**
Work flow of the virtual mutagenesis procedure and mutual information optimization. A large data set of protein sequences (whose structures are known) is rewritten using a given reduced alphabet $R_{i}^{10}$ , a 10-member subset of the 20 genetically coded amino acids, and $S_{j}$ , the substitution rule that dictates how the remaining amino acids are to be mutated virtually. For every combination of $R_{i}^{10}$ and $S_{j}$ , mutual information can be computed to assess their effectiveness in preserving structural information in the data set of more than 2000 single-domain proteins. Because there are more than 10¹⁵ different combinations of $R_{i}^{10}$ and $S_{j}$ for which a mutual information can be computed, a Monte Carlo procedure is implemented to search across the different $S_{j}$ efficiently given each of the 184,756 ways to configure $R_{i}^{10}$ . In the end, the percentile rank of the prebiotic set $R_{prebiotic}^{10}$ is computed from the spectrum of mutual information values given by all other alternative 10-letter alphabets

**Fig. 2**
Work flow of the structural descriptor optimization used to parameterize mutual information. The backbone structure is characterized by a pair of virtual alpha carbon dihedral angles, whose two-dimensional space can be discretized by the Voronoi partition into k states or seeds. The number of seeds k and their locations in the dihedral angle space can be optimized by a Monte Carlo search using mutual information as objective function, as illustrated on the left side of the Figure. The tertiary contact structure is characterized by the contact distances between pairs of non-adjacent residues, with the parameters d_max that describe the maximum distance of interaction and m that dictates the number of discrete bins by which the length d_max is partitioned. An exhaustive search is made across various d_max and m with mutual information as objective function. The two sets of information-optimized descriptors (for the backbone and for tertiary contacts) are used to compute the mutual information I_bb and I_total used in the virtual mutagenesis procedure (see Fig. 1), and also used to parameterize the energy function ΔU employed in the threading experiment

**Fig. 3**
The optimal Voronoi tessellation of the virtual alpha carbon dihedral angle pair. The optimal number of polyhedra was found to be 16, and the seeds for each of these are specified in Table 1. This figure includes only 1000 random data points for each of the 16 polyhedra for illustration purposes only. (Significantly more data points are contained in the structural data set used to optimize this space)

**Fig. 4**
Mutual information measurements for different cut-off distances (Å) and different numbers of bins made to characterize the distance-dependent contact interaction. The optimum was found to be d_max = 10.00 Å, and the number of bins m = 50

See this image and copyright information in PMC

Cited by

Protein three-dimensional structures at the origin of life.
Milner-White EJ. Milner-White EJ. Interface Focus. 2019 Dec 6;9(6):20190057. doi: 10.1098/rsfs.2019.0057. Epub 2019 Oct 18. Interface Focus. 2019. PMID: 31641431 Free PMC article. Review.
Reconstruction and Characterization of Thermally Stable and Catalytically Active Proteins Comprising an Alphabet of ~ 13 Amino Acids.
Kimura M, Akanuma S. Kimura M, et al. J Mol Evol. 2020 May;88(4):372-381. doi: 10.1007/s00239-020-09938-0. Epub 2020 Mar 23. J Mol Evol. 2020. PMID: 32201904
Determination of the Amino Acid Recruitment Order in Early Life by Genome-Wide Analysis of Amino Acid Usage Bias.
Zhao M, Ding R, Liu Y, Ji Z, Zhao Y. Zhao M, et al. Biomolecules. 2022 Jan 21;12(2):171. doi: 10.3390/biom12020171. Biomolecules. 2022. PMID: 35204672 Free PMC article.
Probing the Role of Cysteine Thiyl Radicals in Biology: Eminently Dangerous, Difficult to Scavenge.
Moosmann B, Hajieva P. Moosmann B, et al. Antioxidants (Basel). 2022 Apr 29;11(5):885. doi: 10.3390/antiox11050885. Antioxidants (Basel). 2022. PMID: 35624747 Free PMC article. Review.
Early Selection of the Amino Acid Alphabet Was Adaptively Shaped by Biophysical Constraints of Foldability.
Makarov M, Sanchez Rocha AC, Krystufek R, Cherepashuk I, Dzmitruk V, Charnavets T, Faustino AM, Lebl M, Fujishima K, Fried SD, Hlouchova K. Makarov M, et al. J Am Chem Soc. 2023 Mar 8;145(9):5320-5329. doi: 10.1021/jacs.2c12987. Epub 2023 Feb 24. J Am Chem Soc. 2023. PMID: 36826345 Free PMC article.

See all "Cited by" articles

References

1. Miller SL, Urey HC. Organic compound synthesis on the primitive earth. Science. 1959;130(3370):245–251. doi: 10.1126/science.130.3370.245. - DOI - PubMed
1. Zaia DA, Zaia CTB, De Santana H. Which amino acids should be used in prebiotic chemistry studies? Orig Life Evol Biosph. 2008;38(6):469–488. doi: 10.1007/s11084-008-9150-5. - DOI - PubMed
1. Parker ET, Zhou M, Burton AS, Glavin DP, Dworkin JP, Krishnamurthy R, Fernández FM, Bada JL. A plausible simultaneous synthesis of amino acids and simple peptides on the primordial earth. Angew Chem Int Ed. 2014;53(31):8132–8136. doi: 10.1002/anie.201403683. - DOI - PubMed
1. Weber AL, Miller SL. Reasons for the occurrence of the twenty coded protein amino acids. J Mol Evol. 1981;17(5):273–284. doi: 10.1007/BF01795749. - DOI - PubMed
1. Higgs PG, Pudritz RE. A thermodynamic basis for prebiotic amino acid synthesis and the nature of the first genetic code. Astrobiology. 2009;9(5):483–490. doi: 10.1089/ast.2008.0280. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Reduced alphabet of prebiotic amino acids optimally encodes the conformational space of diverse extant protein folds

Affiliation

Reduced alphabet of prebiotic amino acids optimally encodes the conformational space of diverse extant protein folds

Author

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources