Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2025 Mar;15(3):240302.
doi: 10.1098/rsob.240302. Epub 2025 Mar 19.

Limitations of sequence dissimilarity as a predictor of prokaryotic lineage

Affiliations
Review

Limitations of sequence dissimilarity as a predictor of prokaryotic lineage

Alvar A Lavin et al. Open Biol. 2025 Mar.

Abstract

The molecular clock rests upon the assumption that the observed changes among sequences capture the differentiation of lineages, or kinship, as dissimilarity increases with time. Although it has been questioned over the years, this paradigmatic principle continues to underlie the idea that the polymorphic space of a gene is so vast that it is unattainable in evolutionary time. Thus, the molecular clock has been used to obtain taxonomic annotations, proving to be very effective at delivering testable results. In this article, however, we ask how often this assumption leads to inaccuracies when inferring the lineage of prokaryotic genes. Thus, we open an interesting discussion by simulating, in realistic scenarios, the critical times in which specific 5S rRNA sequences of two distant lineages are exhausting the polymorphic space. We contend that certain genes in one lineage will become increasingly similar to those in another over time, as the space for new variants is finite, mimicking phylogenetic features by convergence or by chance, without implying true kinship.

Keywords: gene polymorphism; molecular clock; molecular evolution; phylogenetic time; prokaryotic evolution; sequence dissimilarity.

PubMed Disclaimer

Conflict of interest statement

We declare we have no competing interests.

Figures

(a)Representation of the molecular clock assumption in which lineages (blue, red and yellow) are generated by accumulation of variation at a constant rate, so (time) is the same dimension as, or is deduced from (dissimilarity between sequences).
Figure 1.
(a) Representation of the molecular clock assumption in which lineages (blue, red and yellow) are generated by accumulation of variation at a constant rate, μ, so t (time) is the same dimension as, or is deduced from, d (dissimilarity between sequences). Every point represents a specific gene sequence in a given time. We usually work with this assumption to infer phylogenetic time from current sequence variants, making taxonomical categories in the process. (b) td. In this article, we argue that the probability of finding a sequence identical to another from a distant lineage must be non-zero—in fact, it should increase over certain critical time. Therefore, inferring time from divergence data would be prone to error (question mark at right bottom). In this work, we explore how to calculate the value of this probability. In addition, horizontal transfer (yellow) may be a diluent of the lineage signal.
a.Representation of the polymorphic space for gene of length bp, which is far smaller (orange) than it appears at first hand (light blue).
Figure 2.
(a) Representation of the polymorphic space for gene of length Lbp, which is far smaller (orange) than it appears at first hand (light blue). Due to first purifying selection that constrains the variability of some sites contingent to variability in others, sometimes quite strictly, which drastically lowers the exponent elevating 4; and second to mutational biases, where the overwhelming difference between probabilities of transitions and transversions per site makes the actual base of the sequence space closer to 1 than to 4. (b) Time does not linearly correlate with divergence; rather, after a certain period, average sequence dissimilarity remains constant. tcrit50 is reached within evolutionary time. In light blue (molecular clock), the polymorphic space is assumed so big that it is considered infinite, or alternatively, the time it takes to exit the linear range (tcrit50) is far superior to any realistic evolutionary time. In orange, we argue that it is not that big and that could be exhausted within prokaryotic evolutionary time.
Steady state distribution of divergences after.
Figure 3.
Steady state distribution of divergences after tcrit90. Past tcrit90 the distribution practically does not change with time, thus providing ample room for distortions in phylogenetic reconstructions assuming a molecular clock. The divergence distribution showed corresponds to Pseudomonas stutzeri 5S rRNA simulated evolution (>tcrit90). Real simulated curves are shown in the electronic supplementary material (figure S1).
None

References

    1. Clarke B. 1970. Darwinian evolution of proteins. Science 168, 1009–1011. (10.1126/science.168.3934.1009) - DOI - PubMed
    1. Sewall W. 1941. The material basis of evolution. Sci. Mon. 53, 165–170.
    1. Vincent M S, Allan C W. 1967. Rates of albumin evolution in primates. Proc. Natl Acad. Sci. USA 58, 142–148. (10.1073/pnas.58.1.142) - DOI - PMC - PubMed
    1. Zuckerkandl E, Pauling L. 1965. Evolutionary divergence and convergence in proteins. Piscataway, NJ: Elsevier. (10.1016/B978-1-4832-2734-4.50017-6) - DOI
    1. Kimura M. 1983. The neutral theory of molecular evolution. Cambridge, UK: Cambridge University Press.

Substances

LinkOut - more resources