Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Mar;34(3):179-183.
doi: 10.1016/j.pt.2017.11.007. Epub 2017 Dec 14.

Tackling Hypotheticals in Helminth Genomes

Affiliations

Tackling Hypotheticals in Helminth Genomes

International Molecular Helminthology Annotation Network (IMHAN) et al. Trends Parasitol. 2018 Mar.

Abstract

Advancements in genome sequencing have led to the rapid accumulation of uncharacterized 'hypothetical proteins' in the public databases. Here we provide a community perspective and some best-practice approaches for the accurate functional annotation of uncharacterized genomic sequences.

Keywords: CRISPR; RNAi; annotation; genomes; helminth; hypothetical genes.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.. Approaches for Functional Annotation of Uncharacterized Genes.
The most efficient means of investigating genes encoded in helminth genomes with the ‘hypothetical’ function annotation is to initially search the currently available sequence databases (typically, NCBI nonredundant database [https://www.ncbi.nlm.nih.gov)] for sequence similarity, using BLAST. This should be followed up by searching structural and specialized databases, for example: protein databases (such as UniProt), enzyme databases (such as BRENDA), and metabolic databases [such as KEGG and Gene Ontology (GO)], for metabolic pathway reconstruction [2]. Several linux-based tools can be used to precisely predict enzyme function, such as DETECT, PRIAM, EFICAz2, and InterProScan. Another in silico method used to improve functional annotation is phylogenomics [3], where hypothetical proteins from phylogenetically related species are compared. Once putative function is determined, cloning and sequencing of full-length cDNAs, proteomics (such as mass spectrometry), and RNA-Seq data can be used to experimentally validate annotations. Additional techniques, such as gene transformation and CRISPR/Cas-9 gene silencing, can also be applied 5, 6, 7, 8, 9, 10. The above mentioned tools and techniques should be used in concert with extensive literature mining to manually curate genomic content. The resulting genes/protein sequences should be deposited in public databases such as COMBREX and WormBase. As the research community accumulates information regarding experimentally verified and published genes/proteins along with species and strain identifications, a ‘Gold Standard’ database can emerge.

References

    1. Schnoes AM, et al. Annotation error in public databases: misannotation of molecular function in enzyme superfamilies. PLoS Comput. Biol, 5 (2009), Article e1000605 - PMC - PubMed
    1. Leale G, et al. Inferring unknown biological functions by integration of GO annotations and gene expression data. IEEE/ACM Trans. Comput. Biol. Bioinform, 99 (2016), pp. 1–19 arXiv:1608.03672 - PubMed
    1. Silva LL, et al. The Schistosoma mansoni phylome: using evolutionary genomics to gain insight into a parasite’s biology. BMC Genomics, 13 (2012), p. 617 - PMC - PubMed
    1. Green ML, Karp PD. A Bayesian method for identifying missing enzymes in predicted metabolic pathway databases. BMC Bioinformatics, 5 (2004), p. 76 - PMC - PubMed
    1. Štefanić S, et al. RNA interference in Schistosoma mansoni Schistosomula: selectivity, sensitivity and operation for larger-scale screening. PLoS Negl. Trop. Dis., 4 (2010), Article e850 - PMC - PubMed

Substances