Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Nov 8;49(19):10868-10878.
doi: 10.1093/nar/gkab883.

Identification and classification of antiviral defence systems in bacteria and archaea with PADLOC reveals new system types

Affiliations

Identification and classification of antiviral defence systems in bacteria and archaea with PADLOC reveals new system types

Leighton J Payne et al. Nucleic Acids Res. .

Abstract

To provide protection against viral infection and limit the uptake of mobile genetic elements, bacteria and archaea have evolved many diverse defence systems. The discovery and application of CRISPR-Cas adaptive immune systems has spurred recent interest in the identification and classification of new types of defence systems. Many new defence systems have recently been reported but there is a lack of accessible tools available to identify homologs of these systems in different genomes. Here, we report the Prokaryotic Antiviral Defence LOCator (PADLOC), a flexible and scalable open-source tool for defence system identification. With PADLOC, defence system genes are identified using HMM-based homologue searches, followed by validation of system completeness using gene presence/absence and synteny criteria specified by customisable system classifications. We show that PADLOC identifies defence systems with high accuracy and sensitivity. Our modular approach to organising the HMMs and system classifications allows additional defence systems to be easily integrated into the PADLOC database. To demonstrate application of PADLOC to biological questions, we used PADLOC to identify six new subtypes of known defence systems and a putative novel defence system comprised of a helicase, methylase and ATPase. PADLOC is available as a standalone package (https://github.com/padlocbio/padloc) and as a webserver (https://padloc.otago.ac.nz).

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Workflow of data preparation and PADLOC functioning. (A) Preparation of data for PADLOC. For each type of defence system protein, sequences were retrieved and clustered into homologue groups. An HMM was built from each group of proteins, and the names of the HMMs (e.g. GajA_1) and their corresponding protein families (e.g. GajA) were recorded in a reference table (hmm_meta.txt), which allows a single family of defence system proteins to be represented by multiple HMMs. A simple classification file ([system].yaml) was written to represent each defence system, describing the typical genetic architecture of the system. (B) Automated functional workflow of PADLOC. HMMER is used to identify genes encoding defence protein homologues in the input genome. Each system classification is then analysed individually, filtering the HMM hits for genes relevant to the current type of system being searched. HMM hits are grouped into gene clusters based on the synteny requirements specified in the system classification. Each cluster is then checked against the system classification to determine whether the system requirements are fulfilled. Yellow genes represent Gabija; green, red or blue genes represent genes from other defence systems; genes with two colours (i.e. yellow/blue) represent genes matched by HMMs from two different defence systems.
Figure 2.
Figure 2.
Analysis of proteins associated with Doron systems reveals new system types. (A) Workflow of defence system variant identification. PADLOC was used to identify loci encoding Doron system proteins. The proteins encoded by up to three genes either side of each Doron system locus were clustered into families. The frequency of association of each protein family to each Doron system protein was analysed. Loci with frequent associations, conserved locus architecture and found in diverse genetic contexts were considered as new subtypes. (B) Network of defence gene associations after filtering for abundance greater than 50 distinct loci, association frequency greater than 0.5, conservation of genetic architecture and context variability. Arrow direction represents association frequency of protein ‘A’ (start of arrow) with protein ‘B’ (end of arrow). (C) Descriptions and schematic diagrams of the new Doron system types and their most similar canonical Doron system types. Domains: RT, reverse transcriptase; DUF, domain of unknown function; TIR, Toll-interleukin receptor; TM, Transmembrane; Hyp, hypothetical protein. Proposed system names: Hma; helicase, methylase and ATPase. Descriptions of proteins that differ between canonical Doron systems and the new types are shown in bold. Refer to Supplementary Table S3 for details of loci examples.
Figure 3.
Figure 3.
The new Doron system subtypes provide protection against phage infection. (A) The efficiency of plaquing (EOP) for E. coli BL21-AI possessing representative Zorya type III, Hachiman type II or Lamassu type II systems from Stenotrophomonas nitritireducens DSM 12575, Sphingopyxis witflariensis DSM 14551 and Janthinobacterium agaricidamnosum DSM, respectively, relative to the empty vector control. Graphs show the mean of three biological replicates with individual data points overlaid. Two-sided t-test; * P < 0.05, ** P < 0 .01, *** P < 0.001. (B) Liquid culture infection time courses for BL21-AI strains possessing the Doron defence system variants, infected with phage PVP-SE1. Growth curves represent the mean of three biological replicates, the shaded area corresponds to the standard error of the mean.
Figure 4.
Figure 4.
Abundance of defence systems identified with PADLOC in bacteria and archaea. All genomes from RefSeq v201 Archaea and Bacteria were searched with PADLOC. The values in the boxes represent, for each phylum, the average percentage of genomes in each species encoding a system, grouped using GTDB taxonomy (44); system prevalence is weighted in this way to limit biases in phyla that contain many closely related genomes of the same species. The colouring in each box provides a visual representation of these values. Shown are phyla with more than five genomes and at least one type of system. A species-level comparison is provided in Supplementary Figure S6 and the full data are provided in Supplementary Table S5.

References

    1. Hampton H.G., Watson B.N.J., Fineran P.C.. The arms race between bacteria and their phage foes. Nature. 2020; 577:327–336. - PubMed
    1. Koonin E.V., Makarova K.S., Wolf Y.I.. Evolutionary genomics of defense systems in archaea and bacteria. Annu. Rev. Microbiol. 2017; 71:233–261. - PMC - PubMed
    1. Samson J.E., Magadán A.H., Sabri M., Moineau S.. Revenge of the phages: defeating bacterial defences. Nat. Rev. Microbiol. 2013; 11:675–687. - PubMed
    1. Davidson A.R., Lu W.-T., Stanley S.Y., Wang J., Mejdani M., Trost C.N., Hicks B.T., Lee J., Sontheimer E.J.. Anti-CRISPRs: protein inhibitors of crispr-cas systems. Annu. Rev. Biochem. 2020; 89:309–332. - PMC - PubMed
    1. Anzalone A.V., Koblan L.W., Liu D.R.. Genome editing with CRISPR–Cas nucleases, base editors, transposases and prime editors. Nat. Biotechnol. 2020; 38:824–844. - PubMed

Publication types

MeSH terms

LinkOut - more resources