Question about contrastive distillation loss

Hi,

I have a few questions about the simclr code.

1. https://github.com/DonkeyShot21/cassle/blob/b5b0929c3b468cd41740a529d58e92ee4e6ace61/cassle/losses/simclr.py#L21 
It seems that the predicted features (p) are not in the negatives, which is different from what's suggested in the paper (appendix B). I understand that you switch p and z here (for a symmetric loss?)
https://github.com/DonkeyShot21/cassle/blob/b5b0929c3b468cd41740a529d58e92ee4e6ace61/cassle/distillers/contrastive.py#L65-L68
but there is still no comparisons between different samples in p. 

2. In the paper the distillation loss is applied to the two views independently. Based on the code above, does it mean that we should use them jointly to reproduce the result?

2. https://github.com/DonkeyShot21/cassle/blob/b5b0929c3b468cd41740a529d58e92ee4e6ace61/cassle/losses/simclr.py#L30-L33
The four lines of code here seem to make logit_mask an all-ones matrix. In my understanding we should assign the diagonals to False. Am I missing something?

TIA

	distill_loss = (
	simclr_distill_loss_func(p1, p2, frozen_z1, frozen_z2, self.distill_temperature)
	+ simclr_distill_loss_func(frozen_z1, frozen_z2, p1, p2, self.distill_temperature)
	) / 2

	logit_mask = torch.ones_like(pos_mask, device=device)
	logit_mask.fill_diagonal_(True)
	logit_mask[:, b:].fill_diagonal_(True)
	logit_mask[b:, :].fill_diagonal_(True)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about contrastive distillation loss #17

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Question about contrastive distillation loss #17

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions