. 2025 Apr 30;42(5):msaf089.

doi: 10.1093/molbev/msaf089.

ERCnet: Phylogenomic Prediction of Interaction Networks in the Presence of Gene Duplication

Affiliations

¹ Department of Integrative Biology, Oregon State University, Corvallis, OR, USA.
² Biology Program, Oregon State University-Cascades, Bend, OR, USA.
³ Biochemistry and Molecular Biology Program, Oregon State University-Cascades, Bend, OR, USA.
⁴ Department of Biology, Colorado State University, Fort Collins, CO, USA.

PMID: 40247660
PMCID: PMC12062884
DOI: 10.1093/molbev/msaf089

ERCnet: Phylogenomic Prediction of Interaction Networks in the Presence of Gene Duplication

Evan S Forsythe et al. Mol Biol Evol. 2025.

. 2025 Apr 30;42(5):msaf089.

doi: 10.1093/molbev/msaf089.

Authors

Affiliations

¹ Department of Integrative Biology, Oregon State University, Corvallis, OR, USA.
² Biology Program, Oregon State University-Cascades, Bend, OR, USA.
³ Biochemistry and Molecular Biology Program, Oregon State University-Cascades, Bend, OR, USA.
⁴ Department of Biology, Colorado State University, Fort Collins, CO, USA.

PMID: 40247660
PMCID: PMC12062884
DOI: 10.1093/molbev/msaf089

Abstract

Assigning gene function from genome sequences is a rate-limiting step in molecular biology research. A protein's position within an interaction network can potentially provide insights into its molecular mechanisms. Phylogenetic analysis of evolutionary rate covariation (ERC) in protein sequence has been shown to be effective for large-scale prediction of functional relationships and interactions. However, gene duplication, gene loss, and other sources of phylogenetic incongruence are barriers for analyzing ERC on a genome-wide basis. Here, we developed ERCnet, a bioinformatic program designed to overcome these challenges, facilitating efficient all-versus-all ERC analyses for large protein sequence datasets. We simulated proteome datasets and found that ERCnet achieves combined false positive and negative error rates well below 10% and that our novel "branch-by-branch" length measurements outperforms "root-to-tip" approaches in most cases, offering a valuable new strategy for performing ERC. We also compiled a sample set of 35 angiosperm genomes to test the performance of ERCnet on empirical data, including its sensitivity to user-defined analysis parameters such as input dataset size and branch-length measurement strategy. We investigated the overlap between ERCnet runs with different species samples to understand how species number and composition affect predicted interactions and to identify the protein sets that consistently exhibit ERC across angiosperms. Our systematic exploration of the performance of ERCnet provides a roadmap for design of future ERC analyses to predict functional interactions in a wide array of genomic datasets. ERCnet code is freely available at https://github.com/EvanForsythe/ERCnet.

Keywords: coevolution; evolutionary rate covariation; interaction networks; interactome; protein interactions.

PubMed Disclaimer

Figures

**Fig. 1.**
*ERCnet* workflow and algorithm development. a) Four-step analytical workflow used to analyze *Orthofinder* results (input) and generate an ERC-based interaction network and accompanying summary statistics (output). Stars indicate steps that employ parallel computing. b) The major analytical steps of the novel “BLR” procedure used to calculate branch lengths on a branch-by-branch basis, using species tree (ST) and gene tree (GT) information. Rounded arrows indicate iterative processes.

**Fig. 2.**
Simulated protein sequences to assess *ERCnet* error rates. a) The random phylogenetic tree used to simulate the background rates of protein evolution. b) Tree with the same topology but in which 5 randomly selected branch lengths (bold branches) were multiplied by 10 to simulate coacceleration for 100 of the 1,000 protein families. c–f) False positive/negative error rates of *ERCnet* runs using different branch length methods (R2T vs. BXB) as well as different correlation calculation methods (Pearson vs. Spearman vs. Kendall) and P-value and R² cutoffs as significance thresholds. Legend shows the branch length method and correlation methods. “Pearson, Spearman, Kendall” means that ERC hits were only deemed significant if passing filters according to all 3 methods. False positive (c) and negative (d) rates assessed across several P-value cutoff values. R² cutoff was held constant at ≥0.20. False positive (e) and negative (f) rates assessed across several R² cutoff values. P-value cutoff was held constant at ≤0.05.

**Fig. 3.**
Angiosperm taxon-sampling datasets used to assess *ERCnet* performance. (Left) Phylogenetic tree of the full pool of species included for random subsampling. (Right) Presence-absence plot indicating the species included in random datasets of each size. Five replicates were performed for each dataset size. *A. trichopoda* was included as an outgroup for all replicates and *A. thaliana* was included as a common ingroup representative for all replicates.

**Fig. 4.**
Proteome coverage and overlapping branches after *ERCnet* filtering. a) The number of proteins retained for phylogenomic analysis after several phases of quality-control filtering that occur during the first steps of *ERCnet*. These numbers represents the number of proteins that are tested for interaction during “all-versus-all” ERC analyses at later steps of *ERCnet.* b, c) The number of overlapping branches (i.e. points on correlation plots) among pairs of proteins during all-versus-all ERC analyses. For the root-to-tip method (b), “branches” refers to paths from root of tree to each tip. For the branch-by-branch method (c), “branches” refers to the common branches determined by our “BLR” method.

**Fig. 5.**
Composition and functional clustering of *ERCnet* interaction networks. a, b) The number of nodes (points) and edges (x's) in networks obtained using the root-to-tip a) and branch-by-branch method (b). Note the log scale for panel (a). c, d) The assortativity coefficient estimated from. Filled points indicate the assortativity coefficient is significantly greater than the randomized null distribution (z-score from randomized permutation test). Significant positive assortativity indicates clustering of traits across a network. The trait measured here was the predicted targeting (plastid, mitochondrial, other) of the proteins in the network.

**Fig. 6.**
Overlap in *ERCnet* hits between runs. a, b) Bar plots describing the number of hits plotted by the number of *ERCnet* replicate runs they appeared in. Blue bars (left) show observed values, and gray bars (right) show averages from 10 randomized replicates. Lines indicate standard error from randomized replicates.

**Fig. 7.**
Performance of *ERCnet* using parallel processing. (a) Runtime of the analytical steps of *ERCnet* parallelized with 48 threads. The reconciliation and network analyses steps omitted because they are very fast relative to the phylogenomics steps and pairwise ERC steps and contribute negligibly to the overall runtime. b) Runtime of the *Phylogenomics* steps of *ERCnet* on either 4 threads or 48 threads. The highly parallelized 48-thread runs were ∼4-fold faster than 4-thread runs.

See this image and copyright information in PMC

Cited by

From Trees to Traits: A Review of Advances in PhyloG2P Methods and Future Directions.
Macdonald AR, James ME, Mitchell JD, Holland BR. Macdonald AR, et al. Genome Biol Evol. 2025 Sep 2;17(9):evaf150. doi: 10.1093/gbe/evaf150. Genome Biol Evol. 2025. PMID: 40907979 Free PMC article. Review.

References

1. Asar Y, Sauquet H, Ho SYW. Evaluating the accuracy of methods for detecting correlated rates of molecular and morphological evolution. Syst Biol. 2023:72(6):1337–1356. 10.1093/sysbio/syad055. - DOI - PMC - PubMed
1. Benjamini Y, Hochberg Y. Controlling the false discovery rate : a practical and powerful approach to multiple testing. R Stat Soc. 1995:57(1):289–300. 10.1111/j.2517-6161.1995.tb02031.x. - DOI
1. Castresana J. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol. 2000:17(4):540–552. 10.1093/oxfordjournals.molbev.a026334. - DOI - PubMed
1. Clark NL, Alani E, Aquadro CF. Evolutionary rate covariation reveals shared functionality and coexpression of genes. Genome Res. 2012:22(4):714–720. 10.1101/gr.132647.111. - DOI - PMC - PubMed
1. Clark NL, Aquadro CF. A novel method to detect proteins evolving at correlated rates: identifying new functional relationships between coevolving proteins. Mol Biol Evol. 2010:27(5):1152–1161. 10.1093/molbev/msp324. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- PubMed Central
- Silverchair Information Systems

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

ERCnet: Phylogenomic Prediction of Interaction Networks in the Presence of Gene Duplication

Affiliations

ERCnet: Phylogenomic Prediction of Interaction Networks in the Presence of Gene Duplication

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

LinkOut - more resources

Full Text Sources

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Related information

LinkOut - more resources

Full Text Sources