Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jun 17;122(24):e2416646122.
doi: 10.1073/pnas.2416646122. Epub 2025 Jun 10.

Bias-aware training and evaluation of link prediction algorithms in network biology

Affiliations

Bias-aware training and evaluation of link prediction algorithms in network biology

Serhan Yılmaz et al. Proc Natl Acad Sci U S A. .

Abstract

For biomedical applications, new link prediction algorithms are continuously being developed. These algorithms are typically evaluated computationally, using test sets generated by sampling the edges uniformly at random. However, as we demonstrate, this evaluation approach introduces a bias toward "rich nodes," i.e., those with higher degrees in the network. More concerningly, this bias persists even when different network snapshots are used for evaluation, as recommended in the machine learning community. This creates a cycle in research where newly developed algorithms generate more knowledge on well-studied biological entities while understudied entities are commonly overlooked. To overcome this issue, we propose a weighted validation setting specifically focusing on low-degree nodes and present AWARE strategies to facilitate bias-aware training and evaluation of link prediction algorithms. These strategies can help researchers gain better insights from computational evaluations and promote the development of new algorithms focusing on novel findings and understudied proteins.

Keywords: bias; graph machine learning; network biology; protein–protein interaction; validation.

PubMed Disclaimer

Conflict of interest statement

Competing interests statement:The authors declare no competing interest.

References

    1. Yue X., et al. , Graph embedding on biomedical networks: Methods, applications and evaluations. Bioinformatics 36, 1241–1251 (2020). - PMC - PubMed
    1. Liang X., et al. , Lrssl: Predict and interpret drug-disease associations based on data integration using sparse subspace learning. Bioinformatics 33, 1187–1196 (2017). - PubMed
    1. Stanfield Z., Coşkun M., Koyutürk M., Drug response prediction as a link prediction problem. Sci. Rep. 7, 1–13 (2017). - PMC - PubMed
    1. Erten S., Bebek G., Ewing R. M., Koyutürk M., Da da: Degree-aware algorithms for network-based disease gene prioritization. BioData Min. 4, 1–20 (2011). - PMC - PubMed
    1. Zhang W., Chen Y., Li D., Yue X., Manifold regularized matrix factorization for drug-drug interaction prediction. J. Biomed. Inform. 88, 90–97 (2018). - PubMed

LinkOut - more resources