Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jan;5(1):e000245.
doi: 10.1099/mgen.0.000245. Epub 2019 Jan 21.

HomoplasyFinder: a simple tool to identify homoplasies on a phylogeny

Affiliations

HomoplasyFinder: a simple tool to identify homoplasies on a phylogeny

Joseph Crispell et al. Microb Genom. 2019 Jan.

Abstract

A homoplasy is a nucleotide identity resulting from a process other than inheritance from a common ancestor. Importantly, by distorting the ancestral relationships between nucleotide sequences, homoplasies can change the structure of the phylogeny. Homoplasies can emerge naturally, especially under high selection pressures and/or high mutation rates, or be created during the generation and processing of sequencing data. Identification of homoplasies is critical, both to understand their influence on the analyses of phylogenetic data and to allow an investigation into how they arose. Here we present HomoplasyFinder, a java application that can be used as a stand-a-lone tool or within the statistical programming environment R. Within R and Java, HomoplasyFinder is shown to be able to automatically, and quickly, identify any homoplasies present in simulated and real phylogenetic data. HomoplasyFinder can easily be incorporated into existing analysis pipelines, either within or outside of R, allowing the user to quickly identify homoplasies to inform downstream analyses and interpretation.

Keywords: HomoplasyFinder; Java; R package; convergence; homoplasy; phylogenetic.

PubMed Disclaimer

Conflict of interest statement

The authors declare that there are no conflicts of interest.

Figures

Fig. 1.
Fig. 1.
Diagrams demonstrating calculating the tree length of one site in a nucleotide alignment. Step 1. Demonstrates how the nucleotides at one site in each sequence are assigned to the tips in a phylogeny. Step 2. Demonstrates defining the nucleotide sets at each internal node, as either the union or intersection of the nucleotide sets of the descendent nodes, and calculating the tree length.
Fig. 2.
Fig. 2.
Identifying homoplasies in simulated data using HomoplasyFinder. (a) The proportion of 1000 simulated phylogenetic datasets with X inserted homoplasies (where the number of inserted homoplasies ranged from 0 to 100 in steps of 1) not identified using HomoplasyFinder - i.e., false negatives. (b) The proportion of 1000 simulated phylogenetic datasets with X non-inserted homoplasies identified by HomoplasyFinder - i.e., false positives. Each point is coloured according to X, which represents either the number of inserted homoplasies not found, or the number of non-inserted homoplasies found.
Fig. 3.
Fig. 3.
Creating non-identifiable homoplasies. An example of how the process of creating simulated, and naturally evolving, homoplasies can result in homoplasies that can’t be detected.
Fig. 4.
Fig. 4.
The proportion of 100 homoplasies inserted into simulated nucleotide sequences that were identified by HomoplasyFinder before (red) and after (blue) recombination events had been applied to the sequences. The simulated sequences had either 0, 1, 10, 100, 1000 or 10 000 recombination events applied to them. The vertical lines represent the range between the lower 2.5 % and upper 97.5 % of the proportions calculated on each of 1000 replicates.
Fig. 5.
Fig. 5.
Identifying homoplasies in M. bovis data using HomoplasyFinder. A phylogenetic tree reconstructed using 298 published M. bovis whole genome sequences [25]. HomoplasyFinder identified homoplasies at six different positions (0.2 % of the 3852 polymorphic positions identified), the nucleotides associated with these positions in each sequence are plotted and coloured according to their type (Adenine=red, Cytosine=blue, Guanine=cyan and Thymine=orange). Where no information for the nucleotide at a particular site in a sequence was available, it is coloured white. The positions, on the M. bovis reference genome [33], associated with the identified homoplasies are reported in the top right and annotated on the internal nodes where a change was necessary. To avoid overlapping one of the labels was slightly moved and a red line points to the node it annotates.
Fig. 6.
Fig. 6.
Comparing HomoplasyFinder to phangorn and treetime. The time taken to identify 10 homoplasies present in simulated phylogenetic datasets by HomoplasyFinder accessed in R and the command line, phangorn in R and by treetime in the command line. A total of 190 different datasets were tested, ranging from 100 to 1000 sequences, in steps of 50 with 10 replicates of each. The number of positions in these sequences ranged from 4000 to 20 000. The points and vertical lines plotted represent the mean, and range, respectively, of the ten replicates.

References

    1. Satya RV, Mukherjee A, Alexe G, Parida L, Bhanot G, et al. Constructing near-perfect phylogenies with multiple homoplasy events. Bioinformatics. 2006;22:e514. doi: 10.1093/bioinformatics/btl262. - DOI - PubMed
    1. Wake DB. Homoplasy: the result of natural selection, or evidence of design limitations? Am Nat. 1991;138:543–567. doi: 10.1086/285234. - DOI
    1. Hassanin A, Lecointre G, Tillier S. The ‘evolutionary signal’ of homoplasy in proteincoding gene sequences and its consequences for a priori weighting in phylogeny. Comptes Rendus de l'Académie des Sciences - Series III - Sciences de la Vie. 1998;321:611–620. doi: 10.1016/S0764-4469(98)80464-2. - DOI - PubMed
    1. Brandley MC, Warren DL, Leaché AD, McGuire JA. Homoplasy and clade support. Syst Biol. 2009;58:184–198. doi: 10.1093/sysbio/syp019. - DOI - PubMed
    1. Frost SD, Volz EM. Modelling tree shape and structure in viral phylodynamics. Philos Trans R Soc Lond B Biol Sci. 2013;368:20120208. doi: 10.1098/rstb.2012.0208. - DOI - PMC - PubMed

Publication types

LinkOut - more resources