Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jun 7;18(6):2686-2692.
doi: 10.1021/acs.jproteome.9b00064. Epub 2019 May 23.

Proteomics Standards Initiative Extended FASTA Format

Affiliations

Proteomics Standards Initiative Extended FASTA Format

Pierre-Alain Binz et al. J Proteome Res. .

Abstract

Mass-spectrometry-based proteomics enables the high-throughput identification and quantification of proteins, including sequence variants and post-translational modifications (PTMs) in biological samples. However, most workflows require that such variations be included in the search space used to analyze the data, and doing so remains challenging with most analysis tools. In order to facilitate the search for known sequence variants and PTMs, the Proteomics Standards Initiative (PSI) has designed and implemented the PSI extended FASTA format (PEFF). PEFF is based on the very popular FASTA format but adds a uniform mechanism for encoding substantially more metadata about the sequence collection as well as individual entries, including support for encoding known sequence variants, PTMs, and proteoforms. The format is very nearly backward compatible, and as such, existing FASTA parsers will require little or no changes to be able to read PEFF files as FASTA files, although without supporting any of the extra capabilities of PEFF. PEFF is defined by a full specification document, controlled vocabulary terms, a set of example files, software libraries, and a file validator. Popular software and resources are starting to support PEFF, including the sequence search engine Comet and the knowledge bases neXtProt and UniProtKB. Widespread implementation of PEFF is expected to further enable proteogenomics and top-down proteomics applications by providing a standardized mechanism for encoding protein sequences and their known variations. All the related documentation, including the detailed file format specification and example files, are available at http://www.psidev.info/peff .

Keywords: FASTA; PEFF; PSI; Proteomics Standards Initiative; file formats; mass spectrometry; proteogenomics; proteomics; standards.

PubMed Disclaimer

Conflict of interest statement

Author Information

The authors declare no competing financial interest.

Figures

Figure 1
Figure 1
Overview of the PEFF schema. The file header section encodes metadata about the file itself and about the one or more sequence databases contained in the file. The individual sequence entries section encodes each of the individual sequences and the metadata associated with each entry.
Figure 2
Figure 2
Simplified depiction of how annotation identifiers can be referenced by other annotations to link them, such as for disulfide bonds and for proteoform definitions. Each annotation has a non-negative integer identifier, and other annotations may link to them. This example (somewhat simplified for clarity of presentation) for human insulin encodes: A) PTMs and disulfide bonds that link two PTMs; and B) a final proteoform that include two separate processed chains that are linked together via disulfide bonds.

References

    1. Nilsson T; Mann M; Aebersold R; Yates JR; Bairoch A; Bergeron JJM Mass Spectrometry in High-Throughput Proteomics: Ready for the Big Time. Nat. Methods 2010, 7 (9), 681–685. 10.1038/nmeth0910-681. - DOI - PubMed
    1. Aebersold R; Mann M Mass Spectrometry-Based Proteomics. Nature 2003, 422 (6928), 198–207. 10.1038/nature01511. - DOI - PubMed
    1. Deutsch EW; Lam H; Aebersold R Data Analysis and Bioinformatics Tools for Tandem Mass Spectrometry in Proteomics. Physiol. Genomics 2008, 33 (1), 18–25. 10.1152/physiolgenomics.00298.2007. - DOI - PubMed
    1. Nesvizhskii AI A Survey of Computational Methods and Error Rate Estimation Procedures for Peptide and Protein Identification in Shotgun Proteomics. J. Proteomics 2010, 73 (11), 2092–2123. 10.1016/j.jprot.2010.08.009. - DOI - PMC - PubMed
    1. Tsur D; Tanner S; Zandi E; Bafna V; Pevzner PA Identification of Post-Translational Modifications by Blind Search of Mass Spectra. Nat. Biotechnol 2005, 23 (12), 1562–1567. 10.1038/nbt1168. - DOI - PubMed

Publication types