Comparative Study

. 2013 Oct;77(4):159-69.

doi: 10.1007/s00239-013-9565-0. Epub 2013 Jun 7.

Unearthing the root of amino acid similarity

James D Stephenson¹, Stephen J Freeland

Affiliations

PMID: 23743923
PMCID: PMC6763418
DOI: 10.1007/s00239-013-9565-0

Comparative Study

Unearthing the root of amino acid similarity

James D Stephenson et al. J Mol Evol. 2013 Oct.

. 2013 Oct;77(4):159-69.

doi: 10.1007/s00239-013-9565-0. Epub 2013 Jun 7.

Authors

James D Stephenson¹, Stephen J Freeland

Affiliation

¹ NASA Astrobiology Institute, University of Hawaii, Honolulu, HI, 96822, USA, jds@ifa.hawaii.edu.

PMID: 23743923
PMCID: PMC6763418
DOI: 10.1007/s00239-013-9565-0

Abstract

Similarities and differences between amino acids define the rates at which they substitute for one another within protein sequences and the patterns by which these sequences form protein structures. However, there exist many ways to measure similarity, whether one considers the molecular attributes of individual amino acids, the roles that they play within proteins, or some nuanced contribution of each. One popular approach to representing these relationships is to divide the 20 amino acids of the standard genetic code into groups, thereby forming a simplified amino acid alphabet. Here, we develop a method to compare or combine different simplified alphabets, and apply it to 34 simplified alphabets from the scientific literature. We use this method to show that while different suggestions vary and agree in non-intuitive ways, they combine to reveal a consensus view of amino acid similarity that is clearly rooted in physico-chemistry.

PubMed Disclaimer

Figures

**Fig. 1**
Simplified amino acid alphabets colored according to the method by which they were derived. Dendrogram derived by least squares from the relative similarities of 34 published simplified amino acid alphabets, labeled by Stephenson. Longer branch lengths indicate lower similarity between two alphabets; *colors* represent method by which each simplified alphabet was derived as described in Table 1

**Fig. 2**
Principal components 1 and 2 of the 34 × 34 simplified alphabet similarity matrix colored by derivation method. a Simplified alphabets are shown as *spheres* and labeled according to the alphabet ID numbering in Table 1. b Variance contribution of the first five principal components of this analysis

**Fig. 3**
Consensus amino acid similarity dendrogram from 34 alphabets. Dendrogram constructed by least squares using the similarity data from all 34 simplified amino acid alphabets. Long branches indicate that an amino acid is rarely grouped with any other as part of a simplification scheme. Short path lengths between amino acids suggest high similarity between them

**Fig. 4**
Amino acid similarity relationships defined by analysis of proteins closely resemble those derived from analysis of individual amino acid chemistry. Dendrograms constructed by least squares using the similarity data from a 29 studies which considered amino acid residues within proteins sequences and structures, versus c 5 simplified alphabets which were derived from individual amino acid physico-chemistry. Long branches indicate that an amino acid is rarely grouped with any other as part of a simplification scheme. Short path lengths between amino acids suggest high similarity between them. Comparing both dendrograms with a redrawn version of a commonly used chemical property Venn diagram b adapted from Livingstone and Barton (1993) uncovers the physico-chemical basis for many of the dendrogram features. The hydrophobic (*blue*), polar (*red*), and both hydrophobic and polar (*purple*) amino acids are colored to highlight this principal basis of organization within each of the dendrograms

**Fig. 5**
Distance between matrices when considering amino acids within proteins and when considering their individual amino acid physico-chemical properties against a background of randomized matrices. Frequency distribution of inter matrix distances between the “individual chemistry” matrix calculated in this study and 1,000,000 random matrices (randomizing rows only) generated from real matrix seeds. The distance between the two matrices (Table 2a, b) was 0.1339

**Fig. 6**
Illustration of the method used to compare simplified amino acid alphabets using a fictional 6-letter alphabet for clarity of example. The groupings described by three simplifications, named studies 1–3, for a fictional 6-letter alphabet are initially described as comma-delimited text (shown above each of the *green* matrices, *left*). The contents of the *green* matrices thus represent each simplified alphabet: within each matrix, a value of 1 indicates that two amino acids are grouped as “similar”; a value of 0 indicates otherwise. The blue matrices are constructed by comparing each element in the green matrices pairwise. This time, a match between the corresponding cells for two green matrices results in a 1 within the *blue* matrix (0 represents a mismatch). Summing the matched values from the *blue* matrices results forms an overall similarity value, as shown in the final rows of the “line total” column. These similarity values can be assembled in a similarity matrix, shown in *red*, which records all pairwise inter-alphabet similarities. In this example, alphabets from studies 1 and 3 are the most similar and from 2 and 3 are the least similar

See this image and copyright information in PMC

Cited by

Candida albicans' inorganic phosphate transport and evolutionary adaptation to phosphate scarcity.
Acosta-Zaldívar M, Qi W, Mishra A, Roy U, King WR, Li Y, Patton-Vogt J, Anderson MZ, Köhler JR. Acosta-Zaldívar M, et al. PLoS Genet. 2024 Aug 13;20(8):e1011156. doi: 10.1371/journal.pgen.1011156. eCollection 2024 Aug. PLoS Genet. 2024. PMID: 39137212 Free PMC article.
Adaptive Properties of the Genetically Encoded Amino Acid Alphabet Are Inherited from Its Subsets.
Ilardo M, Bose R, Meringer M, Rasulev B, Grefenstette N, Stephenson J, Freeland S, Gillams RJ, Butch CJ, Cleaves HJ 2nd. Ilardo M, et al. Sci Rep. 2019 Aug 28;9(1):12468. doi: 10.1038/s41598-019-47574-x. Sci Rep. 2019. PMID: 31462646 Free PMC article.
Interpretable Machine Learning of Amino Acid Patterns in Proteins: A Statistical Ensemble Approach.
Braghetto A, Orlandini E, Baiesi M. Braghetto A, et al. J Chem Theory Comput. 2023 Sep 12;19(17):6011-6022. doi: 10.1021/acs.jctc.3c00383. Epub 2023 Aug 8. J Chem Theory Comput. 2023. PMID: 37552831 Free PMC article.
Experimental solutions to problems defining the origin of codon-directed protein synthesis.
Carter CW Jr, Wills PR. Carter CW Jr, et al. Biosystems. 2019 Sep;183:103979. doi: 10.1016/j.biosystems.2019.103979. Epub 2019 Jun 6. Biosystems. 2019. PMID: 31176803 Free PMC article. Review.
Evolution as a Guide to Designing xeno Amino Acid Alphabets.
Mayer-Bacon C, Agboha N, Muscalli M, Freeland S. Mayer-Bacon C, et al. Int J Mol Sci. 2021 Mar 10;22(6):2787. doi: 10.3390/ijms22062787. Int J Mol Sci. 2021. PMID: 33801827 Free PMC article. Review.

See all "Cited by" articles

References

1. Albayrak A, Out HH, Sezerman UO. Clustering of protein families into functional subtypes using relative complexity measure with reduced amino acid alphabets. BMC Bioinformatics. 2010;11:428. doi: 10.1186/1471-2105-11-428. - DOI - PMC - PubMed
1. Andersen CAF, Brunak S. Representation of protein-sequence information by amino acid subalphabets. AI Magazine. 2004;25:97–104.
1. Benner SA, Cohen MA, Gonnet GH. Amino acid substitution during functionally divergent evolution of protein sequences. Protein Eng. 1994;7:1323–1332. doi: 10.1093/protein/7.11.1323. - DOI - PubMed
1. Betts MJ, Russell RB. Bioinformatics for geneticists. New York: Wiley; 2003. Amino acid properties and consequences of substitutions.
1. Cannata N, Toppo S, Romualdi C, Valle G. Simplifying amino acid alphabets by means of a branch and bound algorithm and substitution matrices. Bioinformatics. 2002;18:1102–1108. doi: 10.1093/bioinformatics/18.8.1102. - DOI - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Unearthing the root of amino acid similarity

Affiliation

Unearthing the root of amino acid similarity

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources