Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Aug 7;25(16):8614.
doi: 10.3390/ijms25168614.

Community Resource: Large-Scale Proteogenomics to Refine Wheat Genome Annotations

Affiliations

Community Resource: Large-Scale Proteogenomics to Refine Wheat Genome Annotations

Delphine Vincent et al. Int J Mol Sci. .

Abstract

Triticum aestivum is an important crop whose reference genome (International Wheat Genome Sequencing Consortium (IWGSC) RefSeq v2.1) offers a valuable resource for understanding wheat genetic structure, improving agronomic traits, and developing new cultivars. A key aspect of gene model annotation is protein-level evidence of gene expression obtained from proteomics studies, followed up by proteogenomics to physically map proteins to the genome. In this research, we have retrieved the largest recent wheat proteomics datasets publicly available and applied the Basic Local Alignment Search Tool (tBLASTn) algorithm to map the 861,759 identified unique peptides against IWGSC RefSeq v2.1. Of the 92,719 hits, 83,015 unique peptides aligned along 33,612 High Confidence (HC) genes, thus validating 31.4% of all wheat HC gene models. Furthermore, 6685 unique peptides were mapped against 3702 Low Confidence (LC) gene models, and we argue that these gene models should be considered for HC status. The remaining 2934 orphan peptides can be used for novel gene discovery, as exemplified here on chromosome 4D. We demonstrated that tBLASTn could not map peptides exhibiting mid-sequence frame shift. We supply all our proteogenomics results, Galaxy workflow and Python code, as well as Browser Extensible Data (BED) files as a resource for the wheat community via the Apollo Jbrowse, and GitHub repositories. Our workflow could be applied to other proteomics datasets to expand this resource with proteins and peptides from biotically and abiotically stressed samples. This would help tease out wheat gene expression under various environmental conditions, both spatially and temporally.

Keywords: Triticum aestivum; bottom-up proteogenomics; gene models; genome annotation; multiple sequence alignment; proteomics.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

Figure 1
Figure 1
Experimental design of wheat bottom-up (BU) proteogenomics analysis (figure partially created in BioRender). This research was based on organs obtained from plants grown, sampled, and stored in optimal conditions [14,15,16].
Figure 2
Figure 2
Comparison of tBLASTn and manual alignments along TraesCS4D03G0026600.1 HC gene viewed in Apollo Jbrowse: (A) Alignment along full gene. Boxed areas are zoomed-in in panels (B,C). tBLASTn hits are purple and manual alignment is pink. (B) Zoom-in of genomic region spanning intron between second and third exon. AA sequence is highlighted where frame shift occurs. (C) Zoom-in of genomic region spanning intron between 3rd and 4th exon. AA sequence is highlighted where frame shift occurs.
Figure 3
Figure 3
Circos plot of peptides aligned along wheat chromosomes for each tissue (figure partially created in Galaxy Australia): (A) Full mapping along all 21 T. aestivum chromosomes. (B) Zoomed-in view of chromosome 2D to emphasize low peptide alignment around centromeric region.
Figure 4
Figure 4
Examples of peptide alignments to refine wheat genome annotation using Apollo Jbrowse Australia: (A) Validation of HC gene TraesCS5A03G451700. Inset shows track legend across all panels. (B) Promotion of LC gene TraesCS7D03G1260000LC to HC status. (C) Promotion of LC gene TraesCS3B03G1041100LC to HC status and amendment of underlying HC gene TraesCS3B03G1041200. Boxed area is zoomed-in on Panel (D). Panel (C) zoomed-in on intron. (E) Novel gene discovery exemplified at genomic position Chr4D:505997969..506000955.
Figure 5
Figure 5
Data visualization of tBLASTn outputs using JupyterLab Python 3 Seaborn and Matplotlib libraries: (A) Correlation matrix. (B) Boxplot of mismatch vs. pident. (C) Cumulative histograms of gaps and gapopens. (D) Scatter plot of length vs. pident per peptide type and gapopen size. (E) Lm plot of length vs. score per peptide type. (F) Violin plot of gapopen vs. pident per peptide type. (G) Violin plot of tissue vs. score. (H) Density plot of chromosome 4D sstart vs. score per peptide type.

Similar articles

References

    1. Shewry P.R. Wheat. J. Exp. Bot. 2009;60:1537–1553. doi: 10.1093/jxb/erp058. - DOI - PubMed
    1. El Baidouri M., Murat F., Veyssiere M., Molinier M., Flores R., Burlot L., Alaux M., Quesneville H., Pont C., Salse J. Reconciling the Evolutionary Origin of Bread Wheat (Triticum aestivum) New Phytol. 2017;213:1477–1486. doi: 10.1111/nph.14113. - DOI - PubMed
    1. Venske E., Dos Santos R.S., Busanello C., Gustafson P., Costa De Oliveira A. Bread Wheat: A Role Model for Plant Domestication and Breeding. Hereditas. 2019;156:16. doi: 10.1186/s41065-019-0093-9. - DOI - PMC - PubMed
    1. Bentley A.R., Donovan J., Sonder K., Baudron F., Lewis J.M., Voss R., Rutsaert P., Poole N., Kamoun S., Saunders D.G.O., et al. Near- to Long-Term Measures to Stabilize Global Wheat Supplies and Food Security. Nat. Food. 2022;3:483–486. doi: 10.1038/s43016-022-00559-y. - DOI - PubMed
    1. The International Wheat Genome Sequencing Consortium (IWGSC) Appels R., Eversole K., Stein N., Feuillet C., Keller B., Rogers J., Pozniak C.J., Choulet F., Distelfeld A., et al. Shifting the Limits in Wheat Research and Breeding Using a Fully Annotated Reference Genome. Science. 2018;361:eaar7191. doi: 10.1126/science.aar7191. - DOI - PubMed

LinkOut - more resources