Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jul 3;45(W1):W458-W463.
doi: 10.1093/nar/gkx248.

GibbsCluster: unsupervised clustering and alignment of peptide sequences

Affiliations

GibbsCluster: unsupervised clustering and alignment of peptide sequences

Massimo Andreatta et al. Nucleic Acids Res. .

Abstract

Receptor interactions with short linear peptide fragments (ligands) are at the base of many biological signaling processes. Conserved and information-rich amino acid patterns, commonly called sequence motifs, shape and regulate these interactions. Because of the properties of a receptor-ligand system or of the assay used to interrogate it, experimental data often contain multiple sequence motifs. GibbsCluster is a powerful tool for unsupervised motif discovery because it can simultaneously cluster and align peptide data. The GibbsCluster 2.0 presented here is an improved version incorporating insertion and deletions accounting for variations in motif length in the peptide input. In basic terms, the program takes as input a set of peptide sequences and clusters them into meaningful groups. It returns the optimal number of clusters it identified, together with the sequence alignment and sequence motif characterizing each cluster. Several parameters are available to customize cluster analysis, including adjustable penalties for small clusters and overlapping groups and a trash cluster to remove outliers. As an example application, we used the server to deconvolute multiple specificities in large-scale peptidome data generated by mass spectrometry. The server is available at http://www.cbs.dtu.dk/services/GibbsCluster-2.0.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Clustering results for the Fibroblast dataset. The solution with highest KLD consists of four clusters, and the corresponding sequence motifs are shown as sequence logos.
Figure 2.
Figure 2.
Comparison of unsupervised clustering to HLA restrictions assigned by NetMHCpan on Fibroblast data. Left: distribution of percentile rank scores predicted by NetMHCpan for the allele dominating each cluster; for the trash cluster, the best predicted rank score to any of the six alleles was used. Right: sequence logos from literature (made with MHCcluster (16)) of the alleles found in each cluster; group 2 is composed mostly of ligands predicted by NetMHCpan to be restricted to three different alleles with similar binding motifs.
Figure 3.
Figure 3.
Length profile of peptides in the optimal Fibroblast clustering solution. Solid lines represent, for each group, the percentage of peptides with a given length over the total number of peptides in the group. The stacked bar plot in the background is the corresponding length frequency (number of ligands of a given length in a given group divided by the total number of peptides of that length in all groups).

References

    1. Pawson T., Nash P.. Assembly of cell regulatory systems through protein interaction domains. Science. 2003; 300:445–452. - PubMed
    1. Gfeller D., Butty F., Wierzbicka M., Verschueren E., Vanhee P., Huang H., Ernst A., Dar N., Stagljar I., Serrano L et al. The multiple-specificity landscape of modular peptide recognition domains. Mol. Syst. Biol. 2011; 7:484–485. - PMC - PubMed
    1. Gfeller D. Uncovering new aspects of protein interactions through analysis of specificity landscapes in peptide recognition domains. FEBS Lett. 2012; 586:2764–2772. - PubMed
    1. Caron E., Kowalewski D.J., Chiek Koh C., Sturm T., Schuster H., Aebersold R.. Analysis of major histocompatibility complex (MHC) immunopeptidomes using mass spectrometry. Mol. Cell Proteomics. 2015; 14:3105–3117. - PMC - PubMed
    1. Andreatta M., Lund O., Nielsen M.. Simultaneous alignment and clustering of peptide data using a Gibbs sampling approach. Bioinformatics. 2013; 29:8–14. - PubMed

Publication types