Adversarial Attacking MNIST - Intriguing Properties of Neural Networks Implementation

This project is a PyTorch implementation of the adversarial example generation techniques described in the paper "Intriguing Properties of Neural Networks" by Szegedy et al with a slight difference. The implementation focuses on generating adversarial examples for MNIST digit classification using both supervised (targeted) and unsupervised (untargeted) approaches.

The difference is in the optimization method and objective function used.

Original paper using the objective of below approximated using L-BFGS. We want to make the model wrongly predict $l$.

$$ \mathcal{L}(r) = c\Vert r\Vert + \text{loss}_f(x+r, l)\quad\text{Subject to } x+r\in[0, 1]^m $$

Here I force the constraint through sigmoid function and use Adam optimizer. I also implement unsupervised approach where the objective is just to make the model wrong.

$$ \mathcal{L}(r) = c\Vert r\Vert -\text{loss}_f(\text{Sigmoid}(x+r), f(x)) $$

Lots of ideas and improvement haven't been implemented written #TODO

Features

Multiple Attack Modes:
- Supervised (targeted) attacks: Generate adversarial examples that fool the model into predicting a specific target class
- Unsupervised (untargeted) attacks: Generate adversarial examples that maximize model's prediction error
Model Support:
- Fully Connected Network (FCNet, tested)
- Convolutional Neural Network (CNNet, not tested)
- Easy to extend to other architectures
Fast Customization through Notebook:
- Edit interactive_result.ipynb for faster customization.

Installation

Directly clone and install through pip:

git clone https://github.com/eryawww/adversarial_attacking_mnist
cd adversarial_attacking_mnist
pip install -r requirements.txt

Usage

Generate Adversarial Examples, download data, train mnist model, generate adversarial:

python main.py

Configure Parameters: Edit hyper-parameters/hyperparameter.yaml:

lr: 0.5          # Learning rate
c: 20            # Perturbation weight
max_iterations: 10000  # Number of iterations

Faster Customization using Notebooks: Edit interactive_result.ipynb for faster customization.

Project Structure

.
├── adversarial_attack/         # Main package
│   ├── generate.py            # Adversarial example generation
│   ├── mnist_model.py         # Neural network models
│   └── visualize.py           # Visualization utilities
├── data/                      # MNIST dataset storage
├── hyper-parameters/          # Configuration files
│   ├── hyperparameter.yaml    # Default parameters
│   └── sweep.yaml             # W&B sweep configuration
├── models/                    # Pretrained model weights
├── tests/                     # Test suite
└── main.py                    # Entry point

Ablation

The decision of choosing Sigmoid + Adam are based on evidence of

Direct Clamping $\text{torch.clamp}(x+r, 0, 1)$, the gradient did not flow properly through all pixels.
Projected Gradient Descent (PGD), projecting (clamping) r after the gradient update. Gradient only flows to high value pixels, able to generate some example.
Sigmoid + Adam $\text{Sigmoid}(x+r)$, the gradient flows properly through all pixels, this technique utilize a lot of modification into the black pixels, able to generate almost all examples.

Results

The implementation successfully generates adversarial examples that:

Have minimal visual difference from original images (human will not notice)
Consistently fool the target models, although number 0 is the hardest to fool, else is easy to fool

Other results available in result directory.

TODO

Improve the strength of the attack
See the minimum distortion distance for each pair of input-target number to gather additional insight
Investigate the gradient update, why the smaller pixel still hasn't changed much
Test the Cnn model
Investigate defensive mechanism (adversarial training, batch normalization, dropout)

Conclusion

Adversarial examples could be generated easily. In this example, I realize that the model are easily to predict number 6 with minimal distortion whatever the initial image is, suggesting the model is overfitting to 6. Although Cnn model is not tested, I expect it will be harder (especially in combination with dropout) based on unsystematic experiments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Adversarial Attacking MNIST - Intriguing Properties of Neural Networks Implementation

Features

Installation

Usage

Project Structure

Ablation

Results

TODO

Conclusion

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
adversarial_attack		adversarial_attack
hyper-parameters		hyper-parameters
models		models
result		result
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
TODO.todo		TODO.todo
interactive_result.ipynb		interactive_result.ipynb
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Adversarial Attacking MNIST - Intriguing Properties of Neural Networks Implementation

Features

Installation

Usage

Project Structure

Ablation

Results

TODO

Conclusion

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages