Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Mar 29;40(4):btae175.
doi: 10.1093/bioinformatics/btae175.

POCP-nf: an automatic Nextflow pipeline for calculating the percentage of conserved proteins in bacterial taxonomy

Affiliations

POCP-nf: an automatic Nextflow pipeline for calculating the percentage of conserved proteins in bacterial taxonomy

Martin Hölzer. Bioinformatics. .

Abstract

Summary: Sequence technology advancements have led to an exponential increase in bacterial genomes, necessitating robust taxonomic classification methods. The Percentage Of Conserved Proteins (POCP), proposed initially by Qin et al. (2014), is a valuable metric for assessing prokaryote genus boundaries. Here, I introduce a computational pipeline for automated POCP calculation, aiming to enhance reproducibility and ease of use in taxonomic studies.

Availability and implementation: The POCP-nf pipeline uses DIAMOND for faster protein alignments, achieving similar sensitivity to BLASTP. The pipeline is implemented in Nextflow with Conda and Docker support and is freely available on GitHub under https://github.com/hoelzer/pocp. The open-source code can be easily adapted for various prokaryotic genome and protein datasets. Detailed documentation and usage instructions are provided in the repository.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Figure 1.
Figure 1.
Pairwise POCP values from the original study of Pannekoek et al. (2016) (upper triangle) and recalculated with POCP-nf (lower triangle) of Chlamydia strains and outgroups. The average difference of percentage is 1.72% between all POCP values. Chlamydia (C.) abortus (cab), C.avium (cav), C.caviae (cca), C.felis (cfe), C.gallinacea (cga), C.ibidis (cib), C.muridarum (cmu), C.pecorum (cpe), C.pneumoniae (cpn), C.psittaci (cps), C.trachomatis (ctr), Parachlamydia acanthamoebae (pac), Simkania negevensis (sne), Waddlia chondrophila (wch), Candidatus Rubidus massiliensis (cru)

References

    1. Afgan E, Nekrutenko A, Grüning BA et al. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2022 update. Nucleic Acids Res 2022;50:W345–51. - PMC - PubMed
    1. Aliyu H, Lebre P, Blom J et al. Phylogenomic re-assessment of the thermophilic genus Geobacillus. Syst Appl Microbiol 2016;39:527–33. - PubMed
    1. Altschul SF, Madden TL, Schäffer AA et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997;25:3389–402. - PMC - PubMed
    1. Amulyasai B, Anusha R, Sasikala C et al. Phylogenomic analysis of a metagenome-assembled genome indicates a new taxon of an anoxygenic phototroph bacterium in the family Chromatiaceae and the proposal of “Candidatus thioaporhodococcus” gen. nov. Arch Microbiol 2022;204:688. - PubMed
    1. Azpiazu-Muniozguren M, García M, Laorden L et al. Anianabacter salinae gen. nov., sp. nov. ASV31T, a facultative alkaliphilic and extremely halotolerant bacterium isolated from brine of a millennial continental saltern. Diversity 2022;14:1009.