[ICCV 2025 Highlight] Cross-Architecture Distillation Made Simple with Redundancy Suppression

This repository contains a PyTorch implementation of Redundancy Suppression Distillation (RSD) introduced in the paper Cross-Architecture Distillation Made Simple with Redundancy Suppression (ICCV 2025).

RSD is a simple method for cross-architecture knowledge distillation, where the knowledge transfer is cast into a redundant information suppression formulation. Existing methods introduce sophisticated modules, architecture-tailored designs, and excessive parameters, which impair their efficiency and applicability. We propose to extract the architecture-agnostic knowledge in heterogeneous representations by reducing the redundant architecture-exclusive information. To this end, we present a simple RSD loss, which comprises cross-architecture invariance maximisation and feature decorrelation objectives. To prevent the student from entirely losing its architecture-specific capabilities, we further design a lightweight module that decouples the RSD objective from the student's internal representations.

Preparation

Clone the repository to your local workspace:

git clone https://github.com/VISION-SJTU/RSD.git

Configure the environment:

conda create --name rsd python=3.8
conda activate rsd
pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt

Note that other torch versions may also work.

Prepare the dataset

The CIFAR-100 dataset will be automatically downloaded to ./data/cifar100/.

Prepare the pretrained teacher models:

Download the pretrained models to ./pretrained/

Teacher	Acc. (%)	Pretrained Models
Swin-T	89.26	swin_tiny_patch4_window7_224_cifar100.pth
ViT-S	92.44	vit_small_patch16_224_cifar100.pth
Mixer-B/16	87.62	mixer_b16_224_cifar100.pth
ConvNeXt-T	88.42	convnext_tiny_cifar100.pth

Training

We provide the scripts and models for CIFAR-100 experiments. To train a ResNet18 student using Swin-T teacher on CIFAR-100 on a single node with 2 GPUs, run:

python -m torch.distributed.launch --nproc_per_node=2 train.py /path/to/cifar100 --config configs/cifar/cnn.yaml --model resnet18 --teacher swin_tiny_patch4_window7_224 --teacher-pretrained /path/to/teacher_checkpoint --num-classes 100 --distiller ofa --ofa-eps 1.0

You may also train with the bash command:

bash train.sh 2

Evaluation

The distilled student model will be automatically evaluated on the validation set during training. Manual evaluation is also supported. For example, to evaluate the pretrained Swin-T model, run:

python validate.py data/cifar100 --dataset cifar100 --num-classes 100 --model swin_tiny_patch4_window7_224 --checkpoint pretrained/swin_tiny_patch4_window7_224_cifar100.pth```

Custom usage

You may easily customise the code for your own method and experiments.

Method: to implement your own knowledge distillation method, follow the examples in the ./distillers folder.

Architecture: to support arbitrary model architectures, follow the examples in the ./custom_model folder. If intermediate features of the new model are required for KD, rewrite its forward() method following examples in the ./custom_forward folder.

Acknowledgement

This project is developed using the timm and the mdistiller library, and is based on OFA-KD (NeurIPS 2024).

Reference

If you find this project useful, please consider citing it:

@inproceedings{zhang2025rsd,
  author    = {Weijia Zhang and Yuehao Liu and Wu Ran and Chao Ma},
  title     = {Cross-Architecture Distillation Made Simple with Redundancy Suppression},
  booktitle = {ICCV},
  year      = {2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[ICCV 2025 Highlight] Cross-Architecture Distillation Made Simple with Redundancy Suppression

Preparation

Training

Evaluation

Custom usage

Acknowledgement

Reference

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors 1

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.idea		.idea
assets		assets
configs		configs
custom_forward		custom_forward
custom_model		custom_model
distillers		distillers
README.md		README.md
requirements.txt		requirements.txt
train.py		train.py
train.sh		train.sh
utils.py		utils.py
validate.py		validate.py

Folders and files

Latest commit

History

Repository files navigation

[ICCV 2025 Highlight] Cross-Architecture Distillation Made Simple with Redundancy Suppression

Preparation

Training

Evaluation

Custom usage

Acknowledgement

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 1

Languages

Packages