Transfer Learning with Kernel Methods

Adityanarayanan Radhakrishnan^#^{1

2}, Max Ruiz Luyten^#¹, Neha Prasad¹, Caroline Uhler^{3

4}

Affiliations

¹ Massachusetts Institute of Technology, Cambridge, MA, USA.
² Broad Institute of MIT and Harvard, Cambridge, MA, USA.
³ Massachusetts Institute of Technology, Cambridge, MA, USA. cuhler@mit.edu.
⁴ Broad Institute of MIT and Harvard, Cambridge, MA, USA. cuhler@mit.edu.

^# Contributed equally.

PMID: 37689796
PMCID: PMC10492830
DOI: 10.1038/s41467-023-41215-8

Transfer Learning with Kernel Methods

Adityanarayanan Radhakrishnan et al. Nat Commun. 2023.

. 2023 Sep 9;14(1):5570.

doi: 10.1038/s41467-023-41215-8.

Authors

Adityanarayanan Radhakrishnan^#^{1

2}, Max Ruiz Luyten^#¹, Neha Prasad¹, Caroline Uhler^{3

4}

Affiliations

¹ Massachusetts Institute of Technology, Cambridge, MA, USA.
² Broad Institute of MIT and Harvard, Cambridge, MA, USA.
³ Massachusetts Institute of Technology, Cambridge, MA, USA. cuhler@mit.edu.
⁴ Broad Institute of MIT and Harvard, Cambridge, MA, USA. cuhler@mit.edu.

^# Contributed equally.

PMID: 37689796
PMCID: PMC10492830
DOI: 10.1038/s41467-023-41215-8

Abstract

Transfer learning refers to the process of adapting a model trained on a source task to a target task. While kernel methods are conceptually and computationally simple models that are competitive on a variety of tasks, it has been unclear how to develop scalable kernel-based transfer learning methods across general source and target tasks with possibly differing label dimensions. In this work, we propose a transfer learning framework for kernel methods by projecting and translating the source model to the target task. We demonstrate the effectiveness of our framework in applications to image classification and virtual drug screening. For both applications, we identify simple scaling laws that characterize the performance of transfer-learned kernels as a function of the number of target examples. We explain this phenomenon in a simplified linear setting, where we are able to derive the exact scaling laws.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Fig. 1. Our framework for transfer learning with kernel methods for supervised learning tasks.**
After training a kernel method on a source task, we transfer the source model to the target task via a combination of projection and translation operations. a Projection involves training a second kernel method on the predictions of the source model on the target data, as is shown for image classification between natural images and house numbers. b Projection is effective when the predictions of the source model on target examples provide useful information about target labels; e.g., a model trained to classify natural images may be able to distinguish the images of zeros from ones by using the similarity of zeros to balls and ones to poles. c Translation involves adding a correction term to the source model, as is shown for predicting the effect of a drug on a cell line. d Translation is effective when the predictions of the source model can be additively corrected to match labels in the target data; e.g., the predictions of a model trained to predict the effect of drugs on one cell line may be additively adjustable to predict the effect on new cell lines.

**Fig. 2. Analysis of transfer learning with kernels trained on ImageNet32 to CIFAR10, Oxford 102 Flowers, DTD, and a subset of SVHN.**
All curves in (b, c) are averaged over 3 random seeds. a Comparison of the transferred kernel predictor test accuracy (green) to the test accuracy of the baseline kernel predictors trained directly on the target tasks (red). In all cases, the transferred kernel predictors outperform the baseline predictors and the difference in performance is as high as 10%. b Test accuracy of the transferred and baseline predictors as a function of the number of target examples. These curves, which quantitatively describe the benefit of collecting more target examples, follow simple logarithmic trends (R² > . 95). c Performance of the transferred kernel methods decreases when increasing the number of source classes but keeping the total number of source examples fixed. Corresponding plots for DTD and SVHN are in SI Fig. S2.

**Fig. 3. Transferring kernel methods from CIFAR10 to adapt to 19 different corruptions in CIFAR10-C.**
a Test accuracy of baseline kernel method (red), using source predictor given by directly applying the kernel trained on CIFAR10 to CIFAR10-C (gray), and transferred kernel method (green). The transferred kernel method outperforms the other models on all 19 corruptions and even improves on the baseline kernel method when the source predictor exhibits a decrease in performance. Additional results are presented in SI Fig. S6. b Performance of the transferred and baseline kernel predictors as a function of the number of target examples. The transferred kernel method can outperform both source and baseline predictors even when transferred using as little as 200 target examples.

**Fig. 4. Transferring the NTK trained to predict gene expression for given drug and cell line combinations in CMAP to new drug and cell line combinations.**
**a, b** The transfer learned NTK (green) outperforms imputation by mean over cell line (gray) and previous NTK baseline predictors from across R², cosine similarity, and Pearson r metrics. All results are averaged over the performance on 5 cell lines and are stratified by whether or not the target data contains drugs that are present in the source data. Error bars indicate standard deviation. **c, d** The transferred kernel method performance follows a logarithmic trend (R² > . 9) as a function of the number of target examples and exhibits a better scaling coefficient than the baselines. The results are averaged over 5 cell lines. **e, f** Visualization of the performance of the transferred NTK in relation to the top two principal components (denoted PC1 and PC2) of gene expression for target drug and cell line combinations. The performance of the NTK is generally lower for cell and drug combinations that are further from the control gene expression for a given cell line. Visualizations for the remaining 3 cell lines are presented in SI Fig. S8.

See this image and copyright information in PMC

References

1. Razavian, A. S., Azizpour, H., Sullivan, J. & Carlsson, S. CNN features off-the-shelf: An astounding baseline for recognition. In IEEE Conference on Computer Vision and Pattern Recognition Workshops (2014).
1. Donahue, J. et al. Decaf: A deep convolutional activation feature for generic visual recognition. In International Conference on Machine Learning (2014).
1. Peters, M. E. et al. Deep contextualized word representations. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) (2018).
1. Raffel C, et al. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 2020;21:1–67. - PubMed
1. Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (2019).

Grants and funding

DP2 AT012345/AT/NCCIH NIH HHS/United States

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Transfer Learning with Kernel Methods

Affiliations

Transfer Learning with Kernel Methods

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources