Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jul 3;15(9):1787-1801.
doi: 10.7150/ijbs.32142. eCollection 2019.

SWPepNovo: An Efficient De Novo Peptide Sequencing Tool for Large-scale MS/MS Spectra Analysis

Affiliations

SWPepNovo: An Efficient De Novo Peptide Sequencing Tool for Large-scale MS/MS Spectra Analysis

Chuang Li et al. Int J Biol Sci. .

Abstract

Tandem mass spectrometry (MS/MS)-based de novo peptide sequencing is a powerful method for high-throughput protein analysis. However, the explosively increasing size of MS/MS spectra dataset inevitably and exponentially raises the computational demand of existing de novo peptide sequencing methods, which is an issue urgently to be solved in computational biology. This paper introduces an efficient tool based on SW26010 many-core processor, namely SWPepNovo, to process the large-scale peptide MS/MS spectra using a parallel peptide spectrum matches (PSMs) algorithm. Our design employs a two-level parallelization mechanism: (1) the task-level parallelism between MPEs using MPI based on a data transformation method and a dynamic feedback task scheduling algorithm, (2) the thread-level parallelism across CPEs using asynchronous task transfer and multithreading. Moreover, three optimization strategies, including vectorization, double buffering and memory access optimizations, have been employed to overcome both the compute-bound and the memory-bound bottlenecks in the parallel PSMs algorithm. The results of experiments conducted on multiple spectra datasets demonstrate the performance of SWPepNovo against three state-of-the-art tools for peptide sequencing, including PepNovo+, PEAKS and DeepNovo-DIA. The SWPepNovo also shows high scalability in experiments on extremely large datasets sized up to 11.22 GB. The software and the parameter settings are available at https://github.com/ChuangLi99/SWPepNovo.

Keywords: Large-scale MS/MS spectra analysis; SW26010; de novo peptide sequencing; high performance computing.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interest exists.

Figures

Figure 1
Figure 1
Workflow of the de novo peptide sequencing.
Figure 2
Figure 2
An example of MS\MS spectrum
Figure 3
Figure 3
The type of fragmentation ions.
Figure 4
Figure 4
The architecture of the SW26010 manycore processor.
Figure 5
Figure 5
The algorithm framework of our implementation.
Figure 6
Figure 6
The MS/MS data transformation.
Figure 7
Figure 7
A flowchart of the task distribution framework.
Figure 8
Figure 8
Feedback and adjustment mechanism based task dynamic scheduling process.
Figure 9
Figure 9
Asynchronous data transfer strategy. When an chunk in one buffer is scored, the subsequent chunk is being fetched to the other buffer using DMA-fetching intrinsics.
Figure 10
Figure 10
The dynamic parallel programming model
Figure 11
Figure 11
Asynchronous task-loading strategy.
Figure 12
Figure 12
De novo sequencing speeds (spectra/second) of SWPepNovo, PepNovo+ and PEAKS.
Figure 13
Figure 13
Performance of SWPepNovo.
Figure 14
Figure 14
Performance of SWPepNovo on multi-nodes.
Figure 15
Figure 15
Performance of SWPepNovo on datasets sized 0.51-11.22GB.

References

    1. Gross J H. Tandem mass spectrometry. Mass Spectrometry. Springer, Cham; 2017.
    1. Allmer J. Algorithms for the de novo sequencing of peptides from tandem mass spectra. Expert review of proteomics. 2011;8(5):645–657. - PubMed
    1. Eng J K, McCormack A L, Yates J R. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. Journal of the American Society for Mass Spectrometry. 1994;5(11):976–989. - PubMed
    1. Craig R, Beavis R C. TANDEM: matching proteins with tandem mass spectra. Bioinformatics. 2004;20(9):1466–1467. - PubMed
    1. Hirosawa M, Hoshida M, Ishikawa M. et al. MASCOT: multiple alignment system for protein sequences based on three-way dynamic programming. Bioinformatics. 1993;9(2):161–167. - PubMed

Publication types