Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Oct 20;19(10):e3001124.
doi: 10.1371/journal.pbio.3001124. eCollection 2021 Oct.

RefPlantNLR is a comprehensive collection of experimentally validated plant disease resistance proteins from the NLR family

Affiliations

RefPlantNLR is a comprehensive collection of experimentally validated plant disease resistance proteins from the NLR family

Jiorgos Kourelis et al. PLoS Biol. .

Abstract

Reference datasets are critical in computational biology. They help define canonical biological features and are essential for benchmarking studies. Here, we describe a comprehensive reference dataset of experimentally validated plant nucleotide-binding leucine-rich repeat (NLR) immune receptors. RefPlantNLR consists of 481 NLRs from 31 genera belonging to 11 orders of flowering plants. This reference dataset has several applications. We used RefPlantNLR to determine the canonical features of functionally validated plant NLRs and to benchmark 5 NLR annotation tools. This revealed that although NLR annotation tools tend to retrieve the majority of NLRs, they frequently produce domain architectures that are inconsistent with the RefPlantNLR annotation. Guided by this analysis, we developed a new pipeline, NLRtracker, which extracts and annotates NLRs from protein or transcript files based on the core features found in the RefPlantNLR dataset. The RefPlantNLR dataset should also prove useful for guiding comparative analyses of NLRs across the wide spectrum of plant diversity and identifying understudied taxa. We hope that the RefPlantNLR resource will contribute to moving the field beyond a uniform view of NLR structure and function.

PubMed Disclaimer

Conflict of interest statement

I have read the journal’s policy and the authors of this manuscript have the following competing interests: The authors receive funding from industry on NLR biology.

Figures

Fig 1
Fig 1. Number of experimentally validated RefPlantNLR sequences per plant genus.
(A) Domain architecture of typical plant NLRs. The structural features and conserved motifs of the NB-ARC are indicated. (B) The number of experimentally validated NLRs per plant genus (N = 481), and (C) the per genus reduced redundancy set at a 90% sequence similarity threshold (N = 303) are plotted as a stacked bar graph. (D) The class of pathogen to which NLRs in the RefPlantNLR dataset confer a response. Some NLRs may be involved in the response against multiple classes of pathogens, while others have a helper role or are found to be involved in allelic variation in autoimmune/hybrid necrosis responses, and (E) the per genus reduced redundancy set at a 90% sequence similarity threshold are plotted as a stacked bar graph. The number of experimentally validated NLRs belonging to the monophyletic TIR-NLR, CC-NLR, CCR-NLR, or CCG10-NLR subclade members is indicated. Underlying data and R code to reproduce the figures in S5 Data. CC, coiled-coil; HD, helical domain of apoptotic protease-activating factors; LRR, leucine-rich repeat; NB, P-loop containing NTPase domain; NLR, nucleotide-binding leucine-rich repeat; TIR, Toll/interleukin-1 receptor; WD, winged helix domain.
Fig 2
Fig 2. Length distribution RefPlantNLR amino acid sequence and extracted NB-ARC domains.
Length distribution of the RefPlantNLR sequences. (A) Histogram of RefPlantNLR amino acid sequence length (binwidth 50aa, N = 481). (B) Histogram of the unique RefPlantNLR extracted NB-ARC domain (SUPERFAMILY signature SSF52540) amino acid sequence length (binwidth 5aa, N = 406). (C) Histogram of amino acid sequence length of the reduced redundancy RefPlantNLR set at a 90% amino acid similarity threshold (binwidth 50aa, N = 303). (D) Histogram of the extracted NB-ARC domain from the reduced redundancy RefPlantNLR set (binwidth 5aa, N = 296). Color coding according to NLR subfamily. Underlying data and R code to reproduce the figures in S5 Appendix. CC, coiled-coil; NB-ARC, nucleotide-binding adaptor shared by APAF-1, certain R gene products, and CED-4; NLR, nucleotide-binding leucine-rich repeat; TIR, Toll/interleukin-1 receptor.
Fig 3
Fig 3. Domain architecture of the RefPlantNLRs.
Bar chart of the domain architecture of (A) RefPlantNLRs (N = 481), or (B) the per genus reduced redundancy RefPlantNLR set at an overall 90% amino acid similarity per genus (N = 303). C) Schematic representation of domain architecture. Used InterPro signatures for each of the domains are highlighted in the Material and methods. There is currently no InterProScan signature or motif for the CCG10 N-terminal domain. Underlying data and R code to reproduce the figures in S5 Appendix. CC, coiled-coil; LRR, leucine-rich repeat; NB-ARC, nucleotide-binding adaptor shared by APAF-1, certain R gene products, and CED-4; NLR, nucleotide-binding leucine-rich repeat; TIR, Toll/interleukin-1 receptor.
Fig 4
Fig 4. Phylogenetic diversity of RefPlantNLR sequences.
The tree, based on the NB-ARC domain, was inferred using the Maximum Likelihood method based on the JTT model [44]. The tree with the highest log likelihood is shown. NLRs with identical NB-ARC domains are collapsed, while for those with multiple NB-ARC domains, the NB-ARC are numbered according to order in the protein. The tree was rooted on the non-plant NLR outgroup. The TIR-NLR, CC-NLR, CCR-NLR, and CCG10-NLR subclades are indicated. Domain architecture is shown as in Fig 3. CC, coiled-coil; C-JID, C-terminal jelly roll/Ig-like domain; JTT, Jones–Taylor–Thornton; LRR, leucine-rich repeat; NB-ARC, nucleotide-binding adaptor shared by APAF-1, certain R gene products, and CED-4; NLR, nucleotide-binding leucine-rich repeat; TIR, Toll/interleukin-1 receptor.
Fig 5
Fig 5. Benchmarking NLR annotation tools using RefPlantNLR.
Benchmarking of NLR annotation tools using the RefPlantNLR dataset for which a CDS entry was available (N = 457). (A) UpSet plot showing intersection of RefPlantNLR entries retrieved by each annotation tool. (B) Domain architecture analysis produced by each NLR annotation tool per NLR subclass. Correct domain architecture is consistent with RefPlantNLR annotation, incorrect is inconsistent with RefPlantNLR annotation. Other is retrieved by NLR annotation tool but not reliably classified as NLR. Underlying data and R code to reproduce the figures in S1 Appendix. CDS, coding sequence; NLR, nucleotide-binding leucine-rich repeat.
Fig 6
Fig 6. NLRtracker is the most sensitive and accurate NLR extraction tool on the Arabidopsis, tomato, and rice RefSeq genomes.
Benchmarking of NLR annotation tools using the Arabidopsis, rice, and tomato RefSeq genomes. (A) NLRtracker pipeline. InterProScan and predefined NLR motifs are used to group sequences into different categories. (B) Number of NLRs retrieved in each NLR subclass per species. Underlying data and R code to reproduce the figures in S3 Appendix. CC, coiled-coil; LRR, leucine-rich repeat; NB-ARC, nucleotide-binding adaptor shared by APAF-1, certain R gene products, and CED-4; NLR, nucleotide-binding leucine-rich repeat; TIR, Toll/interleukin-1 receptor.

References

    1. Weber LM, Saelens W, Cannoodt R, Soneson C, Hapfelmeier A, Gardner PP, et al.. Essential guidelines for computational method benchmarking. Genome Biol. 2019;20:125. doi: 10.1186/s13059-019-1738-8 - DOI - PMC - PubMed
    1. Schaafsma GCP, Vihinen M. Representativeness of variation benchmark datasets. BMC Bioinformatics. 2018;19:461. doi: 10.1186/s12859-018-2478-6 - DOI - PMC - PubMed
    1. Jones JDG, Vance RE, Dangl JL. Intracellular innate immune surveillance devices in plants and animals. Science. 2016;354. doi: 10.1126/science.aaf6395 - DOI - PubMed
    1. Kourelis J, van der Hoorn RAL. Defended to the nines: 25 years of resistance gene cloning identifies nine mechanisms for R protein function. Plant Cell. 2018;30:285–99. doi: 10.1105/tpc.17.00579 - DOI - PMC - PubMed
    1. de Araújo AC, Fonseca FCDA, Cotta MG, Alves GSC, Miller RNG. Plant NLR receptor proteins and their potential in the development of durable genetic resistance to biotic stresses. Biotechnol Res Innov. 2020. [cited 2020 Jun 15]. doi: 10.1016/j.biori.2020.01.002 - DOI

Publication types