Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr 30;42(5):msaf089.
doi: 10.1093/molbev/msaf089.

ERCnet: Phylogenomic Prediction of Interaction Networks in the Presence of Gene Duplication

Affiliations

ERCnet: Phylogenomic Prediction of Interaction Networks in the Presence of Gene Duplication

Evan S Forsythe et al. Mol Biol Evol. .

Abstract

Assigning gene function from genome sequences is a rate-limiting step in molecular biology research. A protein's position within an interaction network can potentially provide insights into its molecular mechanisms. Phylogenetic analysis of evolutionary rate covariation (ERC) in protein sequence has been shown to be effective for large-scale prediction of functional relationships and interactions. However, gene duplication, gene loss, and other sources of phylogenetic incongruence are barriers for analyzing ERC on a genome-wide basis. Here, we developed ERCnet, a bioinformatic program designed to overcome these challenges, facilitating efficient all-versus-all ERC analyses for large protein sequence datasets. We simulated proteome datasets and found that ERCnet achieves combined false positive and negative error rates well below 10% and that our novel "branch-by-branch" length measurements outperforms "root-to-tip" approaches in most cases, offering a valuable new strategy for performing ERC. We also compiled a sample set of 35 angiosperm genomes to test the performance of ERCnet on empirical data, including its sensitivity to user-defined analysis parameters such as input dataset size and branch-length measurement strategy. We investigated the overlap between ERCnet runs with different species samples to understand how species number and composition affect predicted interactions and to identify the protein sets that consistently exhibit ERC across angiosperms. Our systematic exploration of the performance of ERCnet provides a roadmap for design of future ERC analyses to predict functional interactions in a wide array of genomic datasets. ERCnet code is freely available at https://github.com/EvanForsythe/ERCnet.

Keywords: coevolution; evolutionary rate covariation; interaction networks; interactome; protein interactions.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
ERCnet workflow and algorithm development. a) Four-step analytical workflow used to analyze Orthofinder results (input) and generate an ERC-based interaction network and accompanying summary statistics (output). Stars indicate steps that employ parallel computing. b) The major analytical steps of the novel “BLR” procedure used to calculate branch lengths on a branch-by-branch basis, using species tree (ST) and gene tree (GT) information. Rounded arrows indicate iterative processes.
Fig. 2.
Fig. 2.
Simulated protein sequences to assess ERCnet error rates. a) The random phylogenetic tree used to simulate the background rates of protein evolution. b) Tree with the same topology but in which 5 randomly selected branch lengths (bold branches) were multiplied by 10 to simulate coacceleration for 100 of the 1,000 protein families. c–f) False positive/negative error rates of ERCnet runs using different branch length methods (R2T vs. BXB) as well as different correlation calculation methods (Pearson vs. Spearman vs. Kendall) and P-value and R2 cutoffs as significance thresholds. Legend shows the branch length method and correlation methods. “Pearson, Spearman, Kendall” means that ERC hits were only deemed significant if passing filters according to all 3 methods. False positive (c) and negative (d) rates assessed across several P-value cutoff values. R2 cutoff was held constant at ≥0.20. False positive (e) and negative (f) rates assessed across several R2 cutoff values. P-value cutoff was held constant at ≤0.05.
Fig. 3.
Fig. 3.
Angiosperm taxon-sampling datasets used to assess ERCnet performance. (Left) Phylogenetic tree of the full pool of species included for random subsampling. (Right) Presence-absence plot indicating the species included in random datasets of each size. Five replicates were performed for each dataset size. A. trichopoda was included as an outgroup for all replicates and A. thaliana was included as a common ingroup representative for all replicates.
Fig. 4.
Fig. 4.
Proteome coverage and overlapping branches after ERCnet filtering. a) The number of proteins retained for phylogenomic analysis after several phases of quality-control filtering that occur during the first steps of ERCnet. These numbers represents the number of proteins that are tested for interaction during “all-versus-all” ERC analyses at later steps of ERCnet. b, c) The number of overlapping branches (i.e. points on correlation plots) among pairs of proteins during all-versus-all ERC analyses. For the root-to-tip method (b), “branches” refers to paths from root of tree to each tip. For the branch-by-branch method (c), “branches” refers to the common branches determined by our “BLR” method.
Fig. 5.
Fig. 5.
Composition and functional clustering of ERCnet interaction networks. a, b) The number of nodes (points) and edges (x's) in networks obtained using the root-to-tip a) and branch-by-branch method (b). Note the log scale for panel (a). c, d) The assortativity coefficient estimated from. Filled points indicate the assortativity coefficient is significantly greater than the randomized null distribution (z-score from randomized permutation test). Significant positive assortativity indicates clustering of traits across a network. The trait measured here was the predicted targeting (plastid, mitochondrial, other) of the proteins in the network.
Fig. 6.
Fig. 6.
Overlap in ERCnet hits between runs. a, b) Bar plots describing the number of hits plotted by the number of ERCnet replicate runs they appeared in. Blue bars (left) show observed values, and gray bars (right) show averages from 10 randomized replicates. Lines indicate standard error from randomized replicates.
Fig. 7.
Fig. 7.
Performance of ERCnet using parallel processing. (a) Runtime of the analytical steps of ERCnet parallelized with 48 threads. The reconciliation and network analyses steps omitted because they are very fast relative to the phylogenomics steps and pairwise ERC steps and contribute negligibly to the overall runtime. b) Runtime of the Phylogenomics steps of ERCnet on either 4 threads or 48 threads. The highly parallelized 48-thread runs were ∼4-fold faster than 4-thread runs.

Similar articles

Cited by

References

    1. Asar Y, Sauquet H, Ho SYW. Evaluating the accuracy of methods for detecting correlated rates of molecular and morphological evolution. Syst Biol. 2023:72(6):1337–1356. 10.1093/sysbio/syad055. - DOI - PMC - PubMed
    1. Benjamini Y, Hochberg Y. Controlling the false discovery rate : a practical and powerful approach to multiple testing. R Stat Soc. 1995:57(1):289–300. 10.1111/j.2517-6161.1995.tb02031.x. - DOI
    1. Castresana J. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol. 2000:17(4):540–552. 10.1093/oxfordjournals.molbev.a026334. - DOI - PubMed
    1. Clark NL, Alani E, Aquadro CF. Evolutionary rate covariation reveals shared functionality and coexpression of genes. Genome Res. 2012:22(4):714–720. 10.1101/gr.132647.111. - DOI - PMC - PubMed
    1. Clark NL, Aquadro CF. A novel method to detect proteins evolving at correlated rates: identifying new functional relationships between coevolving proteins. Mol Biol Evol. 2010:27(5):1152–1161. 10.1093/molbev/msp324. - DOI - PMC - PubMed

LinkOut - more resources