NCBI Orthologs: Public Resource and Scalable Method for Computing High-Precision Orthologs Across Eukaryotic Genomes
- PMID: 40996513
- DOI: 10.1007/s00239-025-10268-2
NCBI Orthologs: Public Resource and Scalable Method for Computing High-Precision Orthologs Across Eukaryotic Genomes
Abstract
Orthologs are fundamental for enabling comparative genomics analyses that further our understanding of eukaryotic biology. The unprecedented increase in the availability of high-quality eukaryotic genomes necessitates scalable and accurate methods for orthology inference. The National Center for Biotechnology Information (NCBI) developed "NCBI Orthologs", a resource and a computational pipeline designed to meet this challenge within the NCBI RefSeq framework. This system integrates protein similarity, nucleotide alignment, and microsynteny to achieve high-precision ortholog assignments across diverse eukaryotes. The pipeline leverages high-quality RefSeq annotations and processes genomes individually, ensuring scalability. Resulting ortholog data, organized into gene-level anchored sets, enables propagation of functional annotation information and facilitates comparative genomics. Critically, these data are integrated into the NCBI Gene resource, providing users with access from various entry points. The NCBI Datasets resource provides an intuitive interface to explore orthologous relationships on the web and allows bulk data download via the web, command-line tools, and an API. We detail the methodology, including anchor species selection and the decision tree used to arrive at high-confidence one-to-one orthology relationships. NCBI Orthologs is a valuable resource for facilitating functional annotation efforts and enhancing our understanding of eukaryotic gene evolution.
Keywords: Comparative genomics resource; EGAP; NCBI; Orthologs; Orthology; RefSeq; Synteny.
© 2025. This is a U.S. Government work and not under copyright protection in the US; foreign copyright protection may apply.
Conflict of interest statement
Declarations. Conflict of interest: The authors declare that there are no conflicts of interest.
References
-
- Alliance of Genome Resources Consortium (2024) Updates to the alliance of genome resources central infrastructure. Genetics 227:iyae049 - DOI
-
- Altenhoff A, Nevers Y, Tran V, Jyothi D, Martin M, Cosentino S, Majidian S, Marcet-Houben M, Fuentes-Palacios D, Persson E et al (2024a) New developments for the quest for orthologs benchmark service. NAR Genomics Bioinform 6:lqae167 - DOI
-
- Altenhoff AM, Warwick Vesztrocy A, Bernard C, Train CM, Nicheperovich A, Prieto Banos S, Julca I, Moi D, Nevers Y, Majidian S et al (2024b) OMA orthology in 2024: improved prokaryote coverage, ancestral and extant GO enrichment, a revamped synteny viewer and more in the OMA ecosystem. Nucleic Acids Res 52:D513–D521 - PubMed - DOI
LinkOut - more resources
Full Text Sources