Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jul 23;433(15):167071.
doi: 10.1016/j.jmb.2021.167071. Epub 2021 May 28.

Motifier: An IgOme Profiler Based on Peptide Motifs Using Machine Learning

Affiliations

Motifier: An IgOme Profiler Based on Peptide Motifs Using Machine Learning

Haim Ashkenazy et al. J Mol Biol. .

Abstract

Antibodies provide a comprehensive record of the encounters with threats and insults to the immune system. The ability to examine the repertoire of antibodies in serum and discover those that best represent "discriminating features" characteristic of various clinical situations, is potentially very useful. Recently, phage display technologies combined with Next-Generation Sequencing (NGS) produced a powerful experimental methodology, coined "Deep-Panning", in which the spectrum of serum antibodies is probed. In order to extract meaningful biological insights from the tens of millions of affinity-selected peptides generated by Deep-Panning, advanced bioinformatics algorithms are a must. In this study, we describe Motifier, a computational pipeline comprised of a set of algorithms that systematically generates discriminatory peptide motifs based on the affinity-selected peptides identified by Deep-Panning. These motifs are shown to effectively characterize antibody binding activities and through the implementation of machine-learning protocols are shown to accurately classify complex antibody mixtures representing various biological conditions.

Keywords: deep-panning; next-generation phage display; phage display; random peptide libraries.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

Figure 1.
Figure 1.. A schematic depiction of the combined experimental, computational platform for IgOme profiling and classification.
The experimental part (Steps 1–3) entails the screening of the samples representing two (or more) biological conditions. In this case, sera from infected (+) vs. non-infected (−) individuals are used to screen a combinatorial phage display peptide library (Step 1). Sample-index barcodes are introduced by PCR (Step 2, pink and green “barcodes”). Then, the affinity-selected phage-displayed peptides are sequenced by NGS (Step 3). This is followed by computational analysis using the “Motifier” pipeline. Motifier consists of three main modules (Steps 4–6). First, reads undergo quality filtering, de-multiplexation, and in-silico translation (Step 4) yielding a curated set of affinity-selected peptides for each sample. Then, (Step 5) peptide-motifs (position-specific scoring matrices) are inferred using a clustering algorithm (for each biological condition), followed by the unification of similar motifs, from repeats or multiple samples representing the same biological condition. The third module implements machine-learning modeling and classification. Each motif dictates a feature for machine learning, in which the value for the feature measures the congruence between a set of peptides in a sample to that motif. Discriminatory motifs are those for which there are different levels of congruency between biological conditions. A random-forest classifier is then trained, to classify unlabeled sera based on their peptides (Step 6). The output of the platform is: (I) a set of discriminatory motifs that can be used for further experimental analysis; and (II) a random-forest model that is able to classify new unseen samples of affinity-selected peptides. For further details see Methods and Results.
Figure 2.
Figure 2.. Comparison of unique peptides and the motifs they support among different samples.
All the unique peptides for each repeat were listed. For each peptide, we counted how many replicates share it, and recorded the percentage of peptides sharing 1, 2, 3 or 4 samples (Panel A). It is clear that there is very weak overlap between the replicates as the percentage of peptides shared among 2, 3, or 4 different replicates (Y axis) is less than 5% for mAbs b12 and 17b, and no more than 10% for mAbs 21c and Herceptin. The vast majority of (unique) peptides were found in only one out of the four replicates. We also computed the percentage of motifs that are highly similar among different samples. To this end, we clustered similar motifs to a united motif (see Methods). A united motif is considered to be supported by a sample if it includes motifs from that sample. Shown in panel B is the distribution of (united) motifs supported by i different samples (i = 1, 2, 3, 4). In contrast to Panel A, there is a strong motif-overlap among the different sample replicates (Panel B).
Figure 3.
Figure 3.. Motif inference.
Shown is an example for the motif inference process, from a set of peptide clusters in different mAb 21c replicates. A motif is generated from clustered peptides in each sample. A final united motif is inferred from the sample-derived motifs through the process of motif unification as described in the Methods.
Figure 4.
Figure 4.. Four mAbs experiment: motif significance represented as heat-maps, before and after machine learning.
Four mAbs were used to affinity-select peptides and motifs were inferred for which p-values were calculated for each sample. Each column corresponds to a motif, represented by its consensus, each row corresponds to a given sample, and the i,j entry is a p-value quantifying the congruence of sample i with motif j. (A) The 531 statistically significant motifs that were used as input to the machine learning; (B) single-feature analysis yielded 107 motifs, each of which classifies the samples with 100% accuracy in 4-fold cross validation. Selected consensus sequences of the motifs are shown.
Figure 4.
Figure 4.. Four mAbs experiment: motif significance represented as heat-maps, before and after machine learning.
Four mAbs were used to affinity-select peptides and motifs were inferred for which p-values were calculated for each sample. Each column corresponds to a motif, represented by its consensus, each row corresponds to a given sample, and the i,j entry is a p-value quantifying the congruence of sample i with motif j. (A) The 531 statistically significant motifs that were used as input to the machine learning; (B) single-feature analysis yielded 107 motifs, each of which classifies the samples with 100% accuracy in 4-fold cross validation. Selected consensus sequences of the motifs are shown.
Figure 5.
Figure 5.. mAb binding to phage-displayed selected peptides.
mAbs b12, 17b, and 21c bind different epitopes on HIV-1 gp120 that partially overlap (Fig. S1). In order to confirm that member-peptides of the clustered motifs for each mAb are actually recognized and bind their corresponding antibodies, three peptides from each mAb-motif were cloned and expressed as Protein VIII fusions on filamentous phages using the fth1 vector. The phage-displayed peptides were then used in ELISA tests (see Methods). The motifs and peptide sequences along with their corresponding copy numbers (per million) are shown. The O.D. values for the nine peptides for the three gp120 specific mAbs are given. Note that except for cross reactivity of the MIYDDLFK peptide of the mAb 21c, all other peptides proved highly specific for their corresponding mAbs. A tenth peptide derived from a Herceptin motif (YASTIVVDLDHT) as well as the fth1 vector alone served as negative controls and bound less than 0.1 O.D. The HIV-1 envelope protein, gp120, served as the positive control for the three mAbs being studied and produced signals greater than 2.5 O.D. for each mAb.
Figure 6.
Figure 6.. HIV-1 positive vs. negative sera: motif significance represented as heat-maps, before and after machine learning.
HIV-1 positive (S1–S5) and negative (S6–S10) serum samples were used to affinity-select peptides. Motifs were inferred for the HIV-1 positive samples (S5 was not used for motif inference and model training) and p-values were calculated for all samples. Heat-maps were generated in which each column corresponds to a motif, each row corresponds to a given sample, and the i,j entry is a p-value quantifying the congruence of sample i with motif j. (A) The 383 statistically significant motifs that were used as input to the machine learning; (B) single-feature analysis yielded nine motifs, each of which classifies the samples with at least 90% accuracy in 4-fold cross validation. Consensus sequences of the motifs are shown.
Figure 6.
Figure 6.. HIV-1 positive vs. negative sera: motif significance represented as heat-maps, before and after machine learning.
HIV-1 positive (S1–S5) and negative (S6–S10) serum samples were used to affinity-select peptides. Motifs were inferred for the HIV-1 positive samples (S5 was not used for motif inference and model training) and p-values were calculated for all samples. Heat-maps were generated in which each column corresponds to a motif, each row corresponds to a given sample, and the i,j entry is a p-value quantifying the congruence of sample i with motif j. (A) The 383 statistically significant motifs that were used as input to the machine learning; (B) single-feature analysis yielded nine motifs, each of which classifies the samples with at least 90% accuracy in 4-fold cross validation. Consensus sequences of the motifs are shown.

Similar articles

Cited by

References

    1. Smith GP, Filamentous fusion phage: novel expression vectors that display cloned antigens on the virion surface., Science. 228 (1985) 1315–7. - PubMed
    1. Sundell GN, Ivarsson Y, Interaction analysis through proteomic phage display., Biomed Res. Int 2014 (2014) 176172. 10.1155/2014/176172. - DOI - PMC - PubMed
    1. Hamzeh-Mivehroud M, Alizadeh AA, Morris MB, Bret Church W, Dastmalchi S, Phage display as a technology delivering on the promise of peptide drug discovery, Drug Discov. Today 18 (2013) 1144–1157. 10.1016/j.drudis.2013.09.001. - DOI - PubMed
    1. Potocnakova L, Bhide M, Pulzova LB, An Introduction to B-Cell Epitope Mapping and In Silico Epitope Prediction, J. Immunol. Res 2016 (2016) 1–11. 10.1155/2016/6760830. - DOI - PMC - PubMed
    1. Gershoni JM, Roitburd-Berman A, Siman-Tov DD, Tarnovitski Freund N, Weiss Y, Epitope Mapping, BioDrugs. 21 (2007) 145–156. 10.2165/00063030-200721030-00002. - DOI - PMC - PubMed

Publication types