Skip to content

Dpsgd celebA#312

Open
henrikfo wants to merge 20 commits intomainfrom
dpsgd_celebA
Open

Dpsgd celebA#312
henrikfo wants to merge 20 commits intomainfrom
dpsgd_celebA

Conversation

@henrikfo
Copy link
Copy Markdown
Collaborator

@henrikfo henrikfo commented Apr 15, 2025

Description

Summary of changes

  • Updated the celebA example to work with new changes, schemas etc.
  • Added DP-SGD friendly handler for the celebA example
  • Fixed a typo in cifar_handler_dpsgd.py
  • Adding a way to replace inplace internal function for model building (while opacus is broken)

Resolved Issues

Related Pull Requests

@review-notebook-app
Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@johanos1 johanos1 changed the title Dpsgd celebA Dpsgd celebA [WiP] Apr 16, 2025
@henrikfo henrikfo changed the title Dpsgd celebA [WiP] Dpsgd celebA Apr 28, 2025
@henrikfo henrikfo requested review from fazelehh and johanos1 April 28, 2025 20:39
@fazelehh fazelehh requested review from TheColdIce and removed request for johanos1 February 12, 2026 22:15
Copy link
Copy Markdown
Collaborator

@TheColdIce TheColdIce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job! Great that the code is very similar to the cifar dpsgd example, this will make it easy to build a potential generell module for dpsgd. However, I have some concerns:

1. yaml files
These needs to be cleaned up a bit!

2. virtual_batch_size
Why is virtual_batch_size needed? Why not only use batch_size?

3. learning
When using dpsgd the model seems to not learn anything? This is the results from one run:
Image
I changed params a bit, but I was not successful in getting good results. Maybe I just ran it with bad privacy params?

4. opacus
Why have you restricted opacus to 1.5.3?
I get an warning when run the code with this version and when I use the latest version (1.5.4) I get an error (see warning and error below).
It seems the problem is that nn.GroupNorm returns a view and/or the relu activations are an inplace operation. This could lead to gradients being miscalculated. It seems the error is in opacus and/or torch/torchvision. A fix/test would be to implement resnet from scratch and make sure the groupnorm and relu behaves as needed. Or just wait for opacus to fix the error. I ran an example of theirs with the latest version and I got the same error. It could be that this warning/error is connected to the learning problem. Let me know what you think!

warning: FutureWarning: Using a non-full backward hook when the forward contains multiple autograd Nodes is deprecated and will be removed in future versions. This hook will be missing some grad_input. Please use register_full_backward_hook to get the documented behavior.
  self._maybe_warn_non_full_backward_hook(args, result, grad_fn)

error: RuntimeError: Output 0 of BackwardHookFunctionBackward is a view and is being modified inplace. This view was created inside a custom Function (or because an input was returned as-is) and the autograd logic to handle view+inplace would override the custom backward associated with the custom Function, leading to incorrect gradients. This behavior is forbidden. You can fix this by cloning the output of the custom Function.

Comment thread examples/mia/celebA_HQ/celebA_hq_dpsgd_handler.py Outdated
Comment thread examples/mia/celebA_HQ/celebA_hq_dpsgd_handler.py Outdated
Comment thread examples/mia/celebA_HQ/celebA_hq_dpsgd_handler.py Outdated
Comment thread examples/mia/celebA_HQ/celebA_hq_dpsgd_handler.py
@henrikfo
Copy link
Copy Markdown
Collaborator Author

henrikfo commented Mar 4, 2026

@TheColdIce

1. yaml files
These needs to be cleaned up a bit!

Yes I agree, should the DP examples be in its own folder maybe?

2. virtual_batch_size
Why is virtual_batch_size needed? Why not only use batch_size?

The batch_size is used in the backprop. The virtual batch size is mini-batch to accumulate together a "normal" batch to avoid huge RAM usage due to memory hungry mechanics of opacus. This is best practise and should be used, Imo, to let the user know how to setup their own DP example with some hand holding

3. learning
When using dpsgd the model seems to not learn anything? This is the results from one run:

For me I had to change the params to; learning_rate: 0.1, weight_decay: 0.0
for some observable training results, both for the DP and non-DP runs.

4. opacus
Why have you restricted opacus to 1.5.3?

Now it makes sense why it was restricted to 1.5.3. Instead of just throwing a warning, it completely blocks the example and breaks it, even though opacus themselves have that exact model contruction for a resnet18 model in one of their own examples.
I updated the model construction and replaced the inplace functionality in the latest commit.

@henrikfo henrikfo mentioned this pull request Mar 4, 2026
7 tasks
This was linked to issues Mar 5, 2026
@henrikfo henrikfo mentioned this pull request Mar 6, 2026
criterion: nn.Module = None,
optimizer: optim.Optimizer = None,
epochs: int = None,
dpsgd_path = "./target_GRUD_dpsgd/dpsgd_dic.pkl"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing type declaration

@TheColdIce
Copy link
Copy Markdown
Collaborator

@henrikfo I looked through your latest commits (ad15a50 to aba9663), nice solve with the opacus problem.

Regarding the hardcoded path problem, I noticed that cifar and celeb seems to have a similar problem: when I change the data paths in the train_config and the audit file I get an error that the target cant be found, leakpro is still looking for target_dpsgd/. I think leakpro dont assign the correct dpsgd_metadata_path to CifarInputHandlerDPsgd and CelebAInputHandlerDPsgd so they use the default: ./target_dpsgd/dpsgd_dic.pkl. I wasn't able to test LOS, the code needs to be merged with main before the LOS example can run.

Minor fix: there is some empty cells in celebA_hq_dpsgd.ipynb that can be removed.

Regarding the config files I do not think the dpsgd example should be in a separate folder. I would actually see that there was one notebook, one train_config file and one audit. First the user would train the model, then run the attacks, then implement a PET on the model and the run the attacks again. While this change wouldn't be to difficult I dont think this needs to be done in this PR. What do you think?

@henrikfo henrikfo requested a review from TheColdIce March 17, 2026 09:44
@henrikfo
Copy link
Copy Markdown
Collaborator Author

@henrikfo I looked through your latest commits (ad15a50 to aba9663), nice solve with the opacus problem.

Thanks, I hopefully it will not break when they make even newer releases!

Regarding the hardcoded path problem, I noticed that cifar and celeb seems to have a similar problem: when I change the data paths in the train_config and the audit file I get an error that the target cant be found, leakpro is still looking for target_dpsgd/. I think leakpro dont assign the correct dpsgd_metadata_path to CifarInputHandlerDPsgd and CelebAInputHandlerDPsgd so they use the default: ./target_dpsgd/dpsgd_dic.pkl. I wasn't able to test LOS, the code needs to be merged with main before the LOS example can run.

I updated the static path. What do you think of it now?

Minor fix: there is some empty cells in celebA_hq_dpsgd.ipynb that can be removed.

Removed!

Regarding the config files I do not think the dpsgd example should be in a separate folder. I would actually see that there was one notebook, one train_config file and one audit. First the user would train the model, then run the attacks, then implement a PET on the model and the run the attacks again. While this change wouldn't be to difficult I dont think this needs to be done in this PR. What do you think?

Yeah maybe you're right! I just feel that it's awkward with one train and audit .yaml for these two cases. Like virtual batch size is not used for the regular example. Maybe there is some clean way of doing it?

@TheColdIce
Copy link
Copy Markdown
Collaborator

I updated the static path. What do you think of it now?

Nice! Looks good.

Yeah maybe you're right! I just feel that it's awkward with one train and audit .yaml for these two cases. Like virtual batch size is not used for the regular example. Maybe there is some clean way of doing it?

Okej, if the train and audit .yaml files are very different maybe it's better to keep it separated... I'll think about it, but I think we can merge this PR in the meantime.

TheColdIce
TheColdIce previously approved these changes May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Opacus 1.5.4 breaks inplace activation support Opacus package celebA handler outdated hard coded dpsgd path DP-Optimizer method not supported

2 participants