Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Mar 28;23(1):85.
doi: 10.1186/s13059-022-02652-8.

SHOOT: phylogenetic gene search and ortholog inference

Affiliations

SHOOT: phylogenetic gene search and ortholog inference

David Mark Emms et al. Genome Biol. .

Abstract

Determining the evolutionary relationships between genes is fundamental to comparative biological research. Here, we present SHOOT. SHOOT searches a user query sequence against a database of phylogenetic trees and returns a tree with the query sequence correctly placed within it. We show that SHOOT performs this analysis with comparable speed to a BLAST search. We demonstrate that SHOOT phylogenetic placements are as accurate as conventional tree inference, and it can identify orthologs with high accuracy. In summary, SHOOT is a fast and accurate tool for phylogenetic analyses of novel query sequences. It is available online at www.shoot.bio .

Keywords: Orthology inference; Phylogenetic tree inference; Sequence similarity search.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
The workflow for the two separate stages of SHOOT. A The database preparation stage. B The sequence search stage. MSA, multiple sequence alignment; HG, homologous group. Individual shapes represent individual protein sequences
Fig. 2
Fig. 2
Runtime and closest homolog identification accuracy for SHOOT, BLAST, and DIAMOND. A Mean runtime and 95% confidence interval for 1000 searches of randomly sampled sequences against the same database of 984,137 protein sequences from 78 species. B Accuracy at identifying the closest related database gene to a randomly selected query sequence. C Mean average precision at k (MAP@k)
Fig. 3
Fig. 3
F-score, precision, and recall at identifying orthologs in Homo sapiens for 100 query genes in each of Mus musculus, Gallus gallus, Danio rerio, Ciona intestinalis, Drosophila melanogaster, and Saccharomyces cerevisiae for BLAST best hit (BH), BLAST reciprocal best hit (RBH), and SHOOT
Fig. 4
Fig. 4
Runtime and accuracy for gene placement using the Sub-trees and Unsplit tree method for the largest 22 gene trees from the UniProt database. A Runtime. B Normalized Robinson-Foulds (RF) for SHOOT placement vs original placement using Sub-trees. C For Unsplit trees

References

    1. Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970;48:443–453. doi: 10.1016/0022-2836(70)90057-4. - DOI - PubMed
    1. Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Biol. 1981;147:195–197. doi: 10.1016/0022-2836(81)90087-5. - DOI - PubMed
    1. Lipman DJ, Pearson WR. Rapid and sensitive protein similarity searches. Science. 1985;227:1435–1441. doi: 10.1126/science.2983426. - DOI - PubMed
    1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. - DOI - PubMed
    1. Edgar RC. Search and clustering orders of magnitude faster than BLAST. Bioinformatics. 2010;26:2460–2461. doi: 10.1093/bioinformatics/btq461. - DOI - PubMed

Publication types