Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul 2;39(7):msac153.
doi: 10.1093/molbev/msac153.

Pseudofinder: Detection of Pseudogenes in Prokaryotic Genomes

Affiliations

Pseudofinder: Detection of Pseudogenes in Prokaryotic Genomes

Mitchell J Syberg-Olsen et al. Mol Biol Evol. .

Abstract

Prokaryotic genomes are usually densely packed with intact and functional genes. However, in certain contexts, such as after recent ecological shifts or extreme population bottlenecks, broken and nonfunctional gene fragments can quickly accumulate and form a substantial fraction of the genome. Identification of these broken genes, called pseudogenes, is a critical step for understanding the evolutionary forces acting upon, and the functional potential encoded within, prokaryotic genomes. Here, we present Pseudofinder, an open-source software dedicated to pseudogene identification and analysis in bacterial and archaeal genomes. We demonstrate that Pseudofinder's multi-pronged, reference-based approach can detect a wide variety of pseudogenes, including those that are highly degraded and typically missed by gene-calling pipelines, as well newly formed pseudogenes containing only one or a few inactivating mutations. Additionally, Pseudofinder can detect genes that lack inactivating substitutions but experiencing relaxed selection. Implementation of Pseudofinder in annotation pipelines will allow more precise estimations of the functional potential of sequenced microbes, while also generating new hypotheses related to the evolutionary dynamics of bacterial and archaeal genomes.

Keywords: dN/dS; annotation; archaea; bacteria; genome; prediction; pseudogene.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Summary of benchmarking results, comparing pseudogene predictions by Pseudofinder to those of two other softwares: PGAP and DFAST (run with two different gene-callers). (A and B) ‘Upset’ plots (Conway et al. 2017), showing the overlap and differences between the three pipelines in pseudogenes predicted from Shewanella (A) and Sodalis (B). Each bar in the barplot represents the total number of pseudogenes that overlap between the pipelines denoted with dots below. (C) Barplots showing the types of pseudogenes that were predicted only by Pseudofinder in Shewanella and Sodalis (i.e., Pseudofinder-specific pseudogenes). Italicized numbers at the bottom of each bar indicate the number of Pseudofinder-specific pseudogenes predicted in each genome.
Fig. 2.
Fig. 2.
Pseudofinder workflow: the main Annotate branch is shown in the top part of the workflow, where predicted coding and intergenic regions are compared against proteins from a reference database, allowing the software to identify truncated and run-on ORFs, fragmented genes, and highly degraded gene remnants that lack identifiable gene features. The Sleuth branch is shown in the bottom part of the workflow, where genes from a closely related reference genome are compared against the genome-of-interest to identify gene inactivations at a finer scale; these inactivations, or gene breakages, can include significant frameshift-inducing indels (i.e., indels that results in substantial changes to the protein sequence), nonsense substitutions, loss of start and stop codons, and relaxed selection (elevated dN/dS, measured using PAML, Yang 2007). Information obtained from these two branches are then consolidated and provided to the user in the form of GFF and FASTA files for downstream processing. Pseudofinder also provides multiple ways for users to visualize the results, including a PDF-formatted genome diagram/map, as well as an HTML-formatted files for interactive exploration of pseudogene predictions.

Similar articles

Cited by

References

    1. Alves LQ, Ruivo R, Fonseca MM, Lopes-Marques M, Ribeiro P, Castro LFC. 2020. PseudoChecker: an integrated online platform for gene inactivation inference. Nucleic Acids Res. 48(W1):W321–W331. doi:10.1093/nar/gkaa408 - DOI - PMC - PubMed
    1. Buchfink B, Xie C, Huson D H. 2015. Fast and sensitive protein alignment using DIAMOND. Nat Methods. 12:59–60. doi:10.1038/nmeth.3176 - DOI - PubMed
    1. Burke GR, Moran NA. 2011. Massive genomic decay in Serratia symbiotica, a recently evolved symbiont of aphids. Genome Biol Evol. 3:195–208. doi:10.1093/gbe/evr002 - DOI - PMC - PubMed
    1. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. 2009. BLAST+: architecture and applications. BMC Bioinform. 10:421. doi:10.1186/1471-2105-10-421 - DOI - PMC - PubMed
    1. Campbell MS, Holt C, Moore B, Yandell M. 2014. Genome annotation and curation using MAKER and MAKER-P. Curr Protoc Bioinform. 48:4.11.1-39. doi:10.1002/0471250953.bi0411s48 - DOI - PMC - PubMed

Publication types