Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jul 30;19(1):158.
doi: 10.1186/s12862-019-1464-6.

Reduced alphabet of prebiotic amino acids optimally encodes the conformational space of diverse extant protein folds

Affiliations

Reduced alphabet of prebiotic amino acids optimally encodes the conformational space of diverse extant protein folds

Armando D Solis. BMC Evol Biol. .

Abstract

Background: There is wide agreement that only a subset of the twenty standard amino acids existed prebiotically in sufficient concentrations to form functional polypeptides. We ask how this subset, postulated as {A,D,E,G,I,L,P,S,T,V}, could have formed structures stable enough to found metabolic pathways. Inspired by alphabet reduction experiments, we undertook a computational analysis to measure the structural coding behavior of sequences simplified by reduced alphabets. We sought to discern characteristics of the prebiotic set that would endow it with unique properties relevant to structure, stability, and folding.

Results: Drawing on a large dataset of single-domain proteins, we employed an information-theoretic measure to assess how well the prebiotic amino acid set preserves fold information against all other possible ten-amino acid sets. An extensive virtual mutagenesis procedure revealed that the prebiotic set excellently preserves sequence-dependent information regarding both backbone conformation and tertiary contact matrix of proteins. We observed that information retention is fold-class dependent: the prebiotic set sufficiently encodes the structure space of α/β and α + β folds, and to a lesser extent, of all-α and all-β folds. The prebiotic set appeared insufficient to encode the small proteins. Assessing how well the prebiotic set discriminates native vs. incorrect sequence-structure matches, we found that α/β and α + β folds exhibit more pronounced energy gaps with the prebiotic set than with nearly all alternatives.

Conclusions: The prebiotic set optimally encodes local backbone structures that appear in the folded environment and near-optimally encodes the tertiary contact matrix of extant proteins. The fold-class-specific patterns observed from our structural analysis confirm the postulated timeline of fold appearance in proteogenesis derived from proteomic sequence analyses. Polypeptides arising in a prebiotic environment will likely form α/β and α + β-like folds if any at all. We infer that the progressive expansion of the alphabet allowed the increased conformational stability and functional specificity of later folds, including all-α, all-β, and small proteins. Our results suggest that prebiotic sequences are amenable to mutations that significantly lower native conformational energies and increase discrimination amidst incorrect folds. This property may have assisted the genesis of functional proto-enzymes prior to the expansion of the full amino acid alphabet.

Keywords: Information theory; Mutual information; Prebiotic amino acids; Protein backbone conformation; Protein evolution; Protein structure; Proteogenesis; Reduced amino acid alphabets; Residue contacts.

PubMed Disclaimer

Conflict of interest statement

The author declares that he has no competing interests.

Figures

Fig. 1
Fig. 1
Work flow of the virtual mutagenesis procedure and mutual information optimization. A large data set of protein sequences (whose structures are known) is rewritten using a given reduced alphabet Ri10, a 10-member subset of the 20 genetically coded amino acids, and Sj, the substitution rule that dictates how the remaining amino acids are to be mutated virtually. For every combination of Ri10 and Sj, mutual information can be computed to assess their effectiveness in preserving structural information in the data set of more than 2000 single-domain proteins. Because there are more than 1015 different combinations of Ri10 and Sj for which a mutual information can be computed, a Monte Carlo procedure is implemented to search across the different Sj efficiently given each of the 184,756 ways to configure Ri10. In the end, the percentile rank of the prebiotic set Rprebiotic10 is computed from the spectrum of mutual information values given by all other alternative 10-letter alphabets
Fig. 2
Fig. 2
Work flow of the structural descriptor optimization used to parameterize mutual information. The backbone structure is characterized by a pair of virtual alpha carbon dihedral angles, whose two-dimensional space can be discretized by the Voronoi partition into k states or seeds. The number of seeds k and their locations in the dihedral angle space can be optimized by a Monte Carlo search using mutual information as objective function, as illustrated on the left side of the Figure. The tertiary contact structure is characterized by the contact distances between pairs of non-adjacent residues, with the parameters dmax that describe the maximum distance of interaction and m that dictates the number of discrete bins by which the length dmax is partitioned. An exhaustive search is made across various dmax and m with mutual information as objective function. The two sets of information-optimized descriptors (for the backbone and for tertiary contacts) are used to compute the mutual information Ibb and Itotal used in the virtual mutagenesis procedure (see Fig. 1), and also used to parameterize the energy function ΔU employed in the threading experiment
Fig. 3
Fig. 3
The optimal Voronoi tessellation of the virtual alpha carbon dihedral angle pair. The optimal number of polyhedra was found to be 16, and the seeds for each of these are specified in Table 1. This figure includes only 1000 random data points for each of the 16 polyhedra for illustration purposes only. (Significantly more data points are contained in the structural data set used to optimize this space)
Fig. 4
Fig. 4
Mutual information measurements for different cut-off distances (Å) and different numbers of bins made to characterize the distance-dependent contact interaction. The optimum was found to be dmax = 10.00 Å, and the number of bins m = 50

Similar articles

Cited by

References

    1. Miller SL, Urey HC. Organic compound synthesis on the primitive earth. Science. 1959;130(3370):245–251. doi: 10.1126/science.130.3370.245. - DOI - PubMed
    1. Zaia DA, Zaia CTB, De Santana H. Which amino acids should be used in prebiotic chemistry studies? Orig Life Evol Biosph. 2008;38(6):469–488. doi: 10.1007/s11084-008-9150-5. - DOI - PubMed
    1. Parker ET, Zhou M, Burton AS, Glavin DP, Dworkin JP, Krishnamurthy R, Fernández FM, Bada JL. A plausible simultaneous synthesis of amino acids and simple peptides on the primordial earth. Angew Chem Int Ed. 2014;53(31):8132–8136. doi: 10.1002/anie.201403683. - DOI - PubMed
    1. Weber AL, Miller SL. Reasons for the occurrence of the twenty coded protein amino acids. J Mol Evol. 1981;17(5):273–284. doi: 10.1007/BF01795749. - DOI - PubMed
    1. Higgs PG, Pudritz RE. A thermodynamic basis for prebiotic amino acid synthesis and the nature of the first genetic code. Astrobiology. 2009;9(5):483–490. doi: 10.1089/ast.2008.0280. - DOI - PubMed

Publication types

LinkOut - more resources