DTL: Parameter- and Memory-Efficient Disentangled Vision Learning
- PMID: 41032539
- DOI: 10.1109/TPAMI.2025.3616318
DTL: Parameter- and Memory-Efficient Disentangled Vision Learning
Abstract
The cost of finetuning a pretrained model on downstream tasks steadily increases as they grow larger. Parameter-efficient transfer learning (PETL) is proposed to reduce this cost by changing only a tiny subset of trainable parameters. But, the GPU memory footprint during training is not effectively reduced in PETL. This issue happens because trainable parameters from these methods are generally tightly entangled with the backbone, such that a lot of intermediate states have to be stored for back propagation. To alleviate this issue, we introduce Disentangled Transfer Learning (DTL), which disentangles the trainable parameters from the backbone using a lightweight Compact Side Network (CSN). By progressively extracting task-specific information with a few low-rank linear mappings and appropriately adding the information back to the backbone, CSN effectively realizes knowledge transfer in various downstream recognition tasks. We further extend DTL to more difficult tasks such as object detection and semantic segmentation by employing a more sparse architectural design. Extensive experiments validate the effectiveness of DTL, which not only reduces a large amount of GPU memory usage and trainable parameters, but also outperforms existing PETL methods by a significant margin in accuracy.
LinkOut - more resources
Full Text Sources