Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jan 15;379(1894):20220443.
doi: 10.1098/rstb.2022.0443. Epub 2023 Nov 27.

A standard workflow for community-driven manual curation of Strongyloides genome annotations

Affiliations

A standard workflow for community-driven manual curation of Strongyloides genome annotations

Astra S Bryant et al. Philos Trans R Soc Lond B Biol Sci. .

Abstract

Advances in the functional genomics and bioinformatics toolkits for Strongyloides species have positioned these species as genetically tractable model systems for gastrointestinal parasitic nematodes. As community interest in mechanistic studies of Strongyloides species continues to grow, publicly accessible reference genomes and associated genome annotations are critical resources for researchers. Genome annotations for multiple Strongyloides species are broadly available via the WormBase and WormBase ParaSite online repositories. However, a recent phylogenetic analysis of the receptor-type guanylate cyclase (rGC) gene family in two Strongyloides species highlights the potential for errors in a large percentage of current Strongyloides gene models. Here, we present three examples of gene annotation updates within the Strongyloides rGC gene family; each example illustrates a type of error that may occur frequently within the annotation data for Strongyloides genomes. We also extend our analysis to 405 previously curated Strongyloides genes to confirm that gene model errors are found at high rates across gene families. Finally, we introduce a standard manual curation workflow for assessing gene annotation quality and generating corrections, and we discuss how it may be used to facilitate community-driven curation of parasitic nematode biodata. This article is part of the Theo Murphy meeting issue 'Strongyloides: omics to worm-free populations'.

Keywords: Strongyloides; WormBase; community curation; comparative genomics; genome annotation; nematodes.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1.
Figure 1.
Updated annotations to separate incorrectly fused genes can reveal hidden protein homologues. (a) WormBase BLASTP results for Ce-DAF-11 searched against the S. ratti proteome as well as the S. ratti SRAE_1000175800 protein sequence searched against the C. elegans proteome. Protein sequences are from WormBase release WS286. Although Ce-DAF-11 appears most similar to SRAE_1000175800, the reciprocal search identifies matches to Ce-CHD-3 and Ce-LET-418, not Ce-DAF-11. (b) Intron–exon diagram and protein motifs of the SRAE_1000175800 gene annotation from WormBase release WS286 (corresponding to WormBase ParaSite version 16). Protein motifs common to receptor-type guanylate cyclases are present in the 5′ end of the gene model (features 1–3); the 3′ end of the gene includes additional protein motifs (features 4–8). Scale bar is 500 bp. (c) Intron–exon diagrams and protein motifs of Ce-daf-11, Ce-chd-3, and Ce-let-418. Scale bar is 500 bp. (d) Intron–exon diagrams and protein motifs of updated SRAE_1000175800 and SRAE_1000175850. Scale bar is 500 bp. (e) WormBase BLASTP results for updated S. ratti protein sequences searched against the C. elegans proteome. The updated SRAE_1000175800 protein sequence is most similar to Ce-DAF-11; the new SRAE_1000175850 protein sequence retains the original match to Ce-CHD-3 and Ce-LET-418. Asterisks indicate updated gene models.
Figure 2.
Figure 2.
Curating single gene models can reveal missing protein domains and novel start codons. (a) Intron–exon diagram and protein motifs of the original SRAE_2000430600 (upper, from WormBase version WS286) and the updated SRAE_2000430600 (lower, identified as Sr-gcy-23.1). In the original SRAE_2000430600 gene annotation, three protein motifs common to receptor-type guanylate cyclases are present: (i) a ligand-binding region; (ii) a protein kinase domain; and (iii) a guanylate cyclase domain. The updated version also contains an additional domain: (iv) transmembrane domain, which appears in the site formerly annotated as a third intron. Scale bars are 500 bp. (b) Sequencing of SRAE_2000430600 cDNA confirms the proposed update to the gene model. Nucleotide sequences show the original annotation (top), the updated annotation (middle), and the cDNA sequencing results (bottom). Asterisks indicate agreement between all three sequence sources. Blue italic text indicates the erroneous third intron included in the original gene annotation. Pink bold text indicates nucleotides that are predicted to encode a transmembrane domain. (c) Intron–exon diagrams of the SSTP_0000846800 gene model (upper, identified as Ss-gcy-23.3) and the one-to-one S. ratti homologue, SRAE_X00020900 (lower). For the SSTP_0000846800 gene model, the black region indicates the original gene model (from WormBase ParaSite version 16); the black arrow indicates location of the original start codon. The purple region shows a 5′ extension that shifts the ATG start codon upstream by 258 bp; the purple arrow indicates the new start codon site. The updated SSTP_0000846800 gene model was released in WormBase ParaSite version 17. Scale bars are 500 bp. (d) RNA-Sequencing (RNA-Seq) tracks aligned relative to the SSTP_0000846800 exon 1 gene model, showing abundant RNA-Seq reads aligning to the 5′ extension. RNA-Seq data show transcript abundance from three replicate samples of S. stercoralis third-stage infective larvae (iL3); genome-aligned tracks were exported from the WormBase ParaSite Region in Detail view in the Location widget [14,20]. Note that SSTP_0000846800 is located on the reverse strand, thus the orientation of the gene is flipped relative to panel (c).
Figure 3.
Figure 3.
Standard workflow for manual curation of Strongyloides genome annotations. This workflow consists of four partially iterative steps: assembling sequences of interest, assessing annotation quality, generating annotation updates and submitting changes to reference databases.

Similar articles

Cited by

References

    1. Beknazarova M, Whiley H, Ross K. 2016. Strongyloidiasis: a disease of socioeconomic disadvantage. Int. J. Environ. Res. Public Health 13, 517. (10.3390/ijerph13050517) - DOI - PMC - PubMed
    1. Hotez PJ, Brindley PJ, Bethony JM, King CH, Pearce EJ, Jacobson J. 2008. Helminth infections: the great neglected tropical diseases. J. Clin. Invest. 118, 1311-1321. (10.1172/JCI34261) - DOI - PMC - PubMed
    1. Bisoffi Z, et al. . 2013. Strongyloides stercoralis: a plea for action. PLoS Negl. Trop. Dis. 7, 7-10. (10.1371/journal.pntd.0002214) - DOI - PMC - PubMed
    1. Tamarozzi F, Martello E, Giorli G, Fittipaldo A, Staffolani S, Montresor A, Bisoffi Z, Buonfrate D. 2019. Morbidity associated with chronic Strongyloides stercoralis infection: a systematic review and meta-analysis. Am. J. Trop. Med. Hyg. 100, 1305-1311. (10.4269/ajtmh.18-0895) - DOI - PMC - PubMed
    1. Gang SS, Castelletto ML, Bryant AS, Yang E, Mancuso N, Lopez JB, Pellegrini M, Hallem EA. 2017. Targeted mutagenesis in a human-parasitic nematode. PLoS Pathog. 13, e1006675. (10.1371/journal.ppat.1006675) - DOI - PMC - PubMed