Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Oct 12;21(1):708.
doi: 10.1186/s12864-020-07100-0.

Comparative genomics and community curation further improve gene annotations in the nematode Pristionchus pacificus

Affiliations

Comparative genomics and community curation further improve gene annotations in the nematode Pristionchus pacificus

Marina Athanasouli et al. BMC Genomics. .

Abstract

Background: Nematode model organisms such as Caenorhabditis elegans and Pristionchus pacificus are powerful systems for studying the evolution of gene function at a mechanistic level. However, the identification of P. pacificus orthologs of candidate genes known from C. elegans is complicated by the discrepancy in the quality of gene annotations, a common problem in nematode and invertebrate genomics.

Results: Here, we combine comparative genomic screens for suspicious gene models with community-based curation to further improve the quality of gene annotations in P. pacificus. We extend previous curations of one-to-one orthologs to larger gene families and also orphan genes. Cross-species comparisons of protein lengths, screens for atypical domain combinations and species-specific orphan genes resulted in 4311 candidate genes that were subject to community-based curation. Corrections for 2946 gene models were implemented in a new version of the P. pacificus gene annotations. The new set of gene annotations contains 28,896 genes and has a single copy ortholog completeness level of 97.6%.

Conclusions: Our work demonstrates the effectiveness of comparative genomic screens to identify suspicious gene models and the scalability of community-based approaches to improve the quality of thousands of gene models. Similar community-based approaches can help to improve the quality of gene annotations in other invertebrate species, including parasitic nematodes.

Keywords: Caenorhabditis elegans; Evolution; Genome; Orphan genes; Parasitic nematodes.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Comparison of protein lengths between one-to-one orthologs. a One-to-one orthologous genes between C. elegans and P. pacificus have highly similar protein lengths (Pearson’s r = 0.83). b Size distributions of one-to-one orthologs show a peak at around 300 amino acids. c P. pacificus genes with more than two-fold length difference were considered for manual curation. d The P. pacificus one-to-one ortholog (PPA0494) of C. elegans lev-8, is more than twice as long as LEV-8. BLAST analysis showed that the N-terminal region has similarity to another C. elegans gene (Y37B6BL.37) suggesting that it represents an artificial gene fusion. e Manual inspection of the PPA0494 in the genome browser shows that there are two assembled RNA-seq transcripts (red) that cover most of the original gene model and further support that PPA0494 is an artificially fused gene model
Fig. 2
Fig. 2
Identification of candidates for manual curation. a The boxplots show the length distributions of members of 25 highly abundant gene families. The lower 10% and the upper 20% of each gene family were selected for manual inspection. b Individual screens for suspicious gene models reveal between 336 to 1077 specific candidates indicating their highly complementary. c Manual classification of P. pacificus SSOGs shows numerous genes that overlap gene models on the opposite strand. The category “Others” denotes genes that were not systematically classified as they were part of previous curations
Fig. 3
Fig. 3
Examples of unsupported SSOGs. a The P. pacificus SSOG PPA46345 overlaps exons of two other gene models that are well supported by transcriptome assemblies from strand-specific RNA-seq and Iso-seq data. b The P. pacificus SSOG PPA4618 overlaps the UTR of a well supported gene model. The absence of strand-specific transcriptomic support indicates that P. pacificus SSOGs PPA46345 and PPA4618 are likely gene prediction artifacts

References

    1. Sommer RJ, Carta L, Kim S-Y, Sternberg PW. Morphological, genetic and molecular description of Pristionchus pacificus sp. n.(Nematoda: Neodiplogasteridae) Fundam Appl Nematol. 1996;19:511–522.
    1. Sommer RJ. The future of evo–devo: model systems and evolutionary theory. Nat Rev Genet. 2009;10:416–422. doi: 10.1038/nrg2567. - DOI - PubMed
    1. Kieninger MR, Ivers NA, Rödelsperger C, Markov GV, Sommer RJ, Ragsdale EJ. The nuclear hormone receptor NHR-40 acts downstream of the sulfatase EUD-1 as part of a developmental plasticity switch in Pristionchus. Curr Biol. 2016;26:2174–2179. doi: 10.1016/j.cub.2016.06.018. - DOI - PubMed
    1. Sieriebriennikov B, Prabh N, Dardiry M, Witte H, Röseler W, Kieninger MR, et al. A developmental switch generating phenotypic plasticity is part of a conserved multi-gene locus. Cell Rep. 2018;23:2835–43.e4. doi: 10.1016/j.celrep.2018.05.008. - DOI - PubMed
    1. Sieriebriennikov B, Sun S, Lightfoot JW, Witte H, Moreno E, et al. Conserved nuclear hormone receptors controlling a novel plastic trait target fast-evolving genes expressed in a single cell. PLoS Genet. 2020;16:e1008687. doi: 10.1371/journal.pgen.1008687. - DOI - PMC - PubMed

LinkOut - more resources