Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jan 7;50(D1):D1522-D1527.
doi: 10.1093/nar/gkab1081.

iProX in 2021: connecting proteomics data sharing with big data

Affiliations

iProX in 2021: connecting proteomics data sharing with big data

Tao Chen et al. Nucleic Acids Res. .

Abstract

The rapid development of proteomics studies has resulted in large volumes of experimental data. The emergence of big data platform provides the opportunity to handle these large amounts of data. The integrated proteome resource, iProX (https://www.iprox.cn), which was initiated in 2017, has been greatly improved with an up-to-date big data platform implemented in 2021. Here, we describe the main iProX developments since its first publication in Nucleic Acids Research in 2019. First, a hyper-converged architecture with high scalability supports the submission process. A hadoop cluster can store large amounts of proteomics datasets, and a distributed, RESTful-styled Elastic Search engine can query millions of records within one second. Also, several new features, including the Universal Spectrum Identifier (USI) mechanism proposed by ProteomeXchange, RESTful Web Service API, and a high-efficiency reanalysis pipeline, have been added to iProX for better open data sharing. By the end of August 2021, 1526 datasets had been submitted to iProX, reaching a total data volume of 92.42TB. With the implementation of the big data platform, iProX can support PB-level data storage, hundreds of billions of spectra records, and second-level latency service capabilities that meet the requirements of the fast growing field of proteomics.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Summary of the datasets publicly released in iProX (as of the end of August 2021). (A) Cumulative data size and number of submitted datasets per month to (ranging from November 2017 to August 2021). (B) Top 10 released datasets with the largest size. (C) Cumulative numbers of submitted datasets per year. Some datasets in iProX were generated by the samples from multiple species, thus, the sum of the numbers of different species is a little higher than the number of all public datasets. (D) Distribution of the species of datasets publicly available in iProX.
Figure 2.
Figure 2.
Hadoop-based big data architecture and infrastructure of iProX.
Figure 3.
Figure 3.
New features implemented into iProX 2021.

References

    1. Marx V. Biology: the big challenges of big data. Nature. 2013; 498:255–260. - PubMed
    1. Leonelli S. The challenges of big data biology. Elife. 2019; 8:e47381. - PMC - PubMed
    1. Deutsch E.W., Bandeira N., Sharma V., Perez-Riverol Y., Carver J.J., Kundu D.J., García-Seisdedos D., Jarnuczak A.F., Hewapathirana S., Pullman B.S.et al. .. The ProteomeXchange consortium in 2020: enabling ‘big data’ approaches in proteomics. Nucleic Acids Res. 2020; 48:D1145–D1152. - PMC - PubMed
    1. Vizcaíno J.A., Deutsch E.W., Wang R., Csordas A., Reisinger F., Ríos D., Dianes J.A., Sun Z., Farrah T., Bandeira N.et al. .. ProteomeXchange provides globally coordinated proteomics data submission and dissemination. Nat. Biotechnol. 2014; 32:223–226. - PMC - PubMed
    1. Perez-Riverol Y., Csordas A., Bai J., Bernal-Llinares M., Hewapathirana S., Kundu D.J., Inuganti A., Griss J., Mayer G., Eisenacher M.et al. .. The PRIDE database and related tools and resources in 2019: improving support for quantification data. Nucleic Acids Res. 2019; 47:D442–D450. - PMC - PubMed

Publication types