Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Dec 6;12(12):2258-2266.
doi: 10.1093/gbe/evaa211.

Benchmarking Orthogroup Inference Accuracy: Revisiting Orthobench

Affiliations

Benchmarking Orthogroup Inference Accuracy: Revisiting Orthobench

David M Emms et al. Genome Biol Evol. .

Abstract

Orthobench is the standard benchmark to assess the accuracy of orthogroup inference methods. It contains 70 expert-curated reference orthogroups (RefOGs) that span the Bilateria and cover a range of different challenges for orthogroup inference. Here, we leveraged improvements in tree inference algorithms and computational resources to reinterrogate these RefOGs and carry out an extensive phylogenetic delineation of their composition. This phylogenetic revision altered the membership of 31 of the 70 RefOGs, with 24 subject to extensive revision and 7 that required minor changes. We further used these revised and updated RefOGs to provide an assessment of the orthogroup inference accuracy of widely used orthogroup inference methods. Finally, we provide an open-source benchmarking suite to support the future development and use of the Orthobench benchmark.

Keywords: benchmark; orthogroup; orthology.

PubMed Disclaimer

Figures

<sc>Fig</sc>. 1.
Fig. 1.
Evaluation and revision of RefOGs from Orthobench. (A) Summary of the corrections made to the RefOG data set. (B) Reasons for major corrections to RefOGs from the previous study. (C) The species tree. Green shaded area shows the 12 Bilaterian species for which the Bilateria-level orthogroups (RefOGs) were defined. One outgroup species, which appears in the gene trees in the figure, is also shown. (D) Example of a major improvement for which clades had been missing from the original RefOG tree: RefOG 63 gene tree as determined in the original study. (E) Gene tree from this study showing the corrected RefOG 63 orthogroup shaded green. Phylogenetic analysis revealed that the original RefOG32 comprises two separate orthogroups that diverged at a gene duplication even preceding the divergence of the vertebrates. (F) Example of a major improvement for which extra clades of genes had been included in the original RefOG: RefOG 32 gene tree as determined in the original study. (G) Gene tree from this study showing the corrected RefOG 32 orthogroup. Phylogenetic analysis revealed that these genes diverged from the remaining genes in the tree at a gene duplication event predating the divergence of the Deuterostomes and Protostomes. Gene trees show previously identified orthogroup containing the newly delimited orthogroup from this study (green shaded clade). Genes/species are colored according to species. Corresponding genes identified as members of the orthogroup in both studies are underlined (including when identifiers have been updated). Red dot = 100% bootstrap support.
<sc>Fig</sc>. 2.
Fig. 2.
The benchmark results for the methods tested. (A) Precision, recall, and F-score. (B) Number of orthogroups predicted exactly, with no extra or missing genes.
<sc>Fig</sc>. 3.
Fig. 3.
Breakdown of the precision (P), recall (R), and F-score (F) of the methods under different levels of technical challenges to orthogroup inference. (AC) RefOG size, N. (A) Low, N < 15; (B) medium, 15 ≤ N < 31; and (C) high, N ≥ 31. (DF) Evolutionary rate measured my mean sequence identity, I. (D) Low evolutionary rate, I > 73.8%; (E) medium, 62.4% < I ≤ 73.8%; and (F) high, I ≤ 62.4%. (GI) Alignment quality, Q = norMD. (G) Low, Q < 0.88; (H) medium, 0.88 ≤ Q ≤ 1.15; and (I) high, Q > 1.15. (JL) Number of domains, D. (J) Low, D = 1; (K) medium, 2 ≤ D ≤ 3; and (L) high, D > 3.

References

    1. Altenhoff AM, et al.2016. Standardized benchmarking in the quest for orthologs. Nat Methods. 13(5):425–430. - PMC - PubMed
    1. Altenhoff AM, et al.2020. The Quest for Orthologs benchmark service and consensus calls in 2020. Nucleic Acids Res. 48(W1):W538–W545. - PMC - PubMed
    1. Buchfink B, Xie C, Huson DH.. 2015. Fast and sensitive protein alignment using DIAMOND. Nat Methods. 12(1):59–60. - PubMed
    1. Camacho C, et al.2009. BLAST+: architecture and applications. BMC Bioinf. 10(1):421. - PMC - PubMed
    1. Capella-Gutierrez S, Silla-Martinez JM, Gabaldon T.. 2009. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25(15):1972–1973. - PMC - PubMed

Publication types