-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Numerical Stability in Pytorch affect the effectiveness of attack? #63
Comments
@rowedenny Thanks for raising this issue! I think it is definitely a potential problem, but I haven't encountered it personally. I tried to generate this problem with the following script
and got the following output
it seems that the problem only exist in extreme cases. So I'm curious on what are your logits values that cause this problem? and which attack did you use. |
another observation is that, if the |
Thanks for your quick response. I use the FGSM and eps =0.3. I also follow the tutorial of cleverhans, but I also see the same issue. Though this issue is discovered by Carlini, they did not mention in cleverhans? So I did concern about how to evaluate the robustness of my model trustfully. In addition, if it is necessary, I would like to provide my model to reproduce the case.
|
@rowedenny Would be nice if you can provide your model checkpoint, I'd be interested to see what makes problem happen. BTW, i also tried FGSM, it has similar properties to PGD attacks on this specific model
|
Much appreciation. I even doubt whether I am the only one who really encounters with this issue. I honestly appreciate your help if you would like to investigate this issue and resolve my concern. I have attached the model checkpoint and a jupyter notebook in the following google drive. You only need to update the MODEL_PATH to your local directory of the model checkpoint I can consistently reproduce this issue. |
@rowedenny confirming that I can reproduce the problem. In your model, the gap between max and min logit values are larger than mine, maybe you trained it for very long time? It seems that something like It could be worth adding something like this in advertorch, with some considerations in desgin. |
Yes. My model on MNIST is trained with 100 epochs, and also if I remember correctly, the tutorial on Cleverhans train with 6 epochs, and that may explain why they did not see this issue. I guess this is the similar case for advertorch.
I am not sure this may affect the attacks, like C&W. Since if we scale the logits, but the attacker is not "aware" the scale, and then the effectiveness of the attack could be affected? It's my great honor talking with you, and I am pleased that finally, your confirmation confirms that I am not the only one who sees this. |
@rowedenny Thanks for the kind words! and for bringing up this interesting observation. This also reminds me another paper, Defensive Distillation is Not Robust to Adversarial Examples, where they actually use the logit scaling method to attack defensive distillation, which I think has similar problems to your model. The first author, C in C&W, is actually also who raised the DEEPSEC issue you quoted at the beginning. I'll dig a bit more on this and see if there's a way to add some functionalities in advertorch for this. |
I think a quick fixup is to shift the logit. Since minus maximum will lead the largest number be zero, so may I propose to check if the torch.max(logit, dim=1) == 1 and then if not, just do the minus maimum? |
Not sure if this is the optimal solution, but it at least sounds like a reasonable choice to have. I would suggest to add a "loss function wrapper" (could be a decorator) in advertorch.utils, such that it does this preprocessing before the logits go into a common loss function, say CrossEntropyLoss. |
Hi @gwding, I think we can inherit the ideas from cleverhens that checks whether the prediction is from logits, or probability after softmax. I assume it is a similar case here. Please refer to the following code https://github.com/tensorflow/cleverhans/blob/master/cleverhans/model.py#L228 B.T.W I drop you an email and look forward to hearing from you. |
Hello,
Thanks for the amazing work on Pytorch, since it is very similar to cleverhans, I recommend it to a lot of my friends.
I am using this lib to evaluate the robustness of my model, and I somehow see this issue from another repo
https://github.com/kleincup/DEEPSEC/issues/3
Basically, it reveals some unknown numerical stability of Pytorch. So I test whether it may also occur in this lib by comparing the output of the model with/without
logits = logits - torch.max(logits, dim=1, keepdim=True)[0]
For four-convolutional layers and 2 full-connection layer network on MNIST, the classification accuracy of the adversary on FGSM with eps=0.3 is 24.9%, but when I add the line above to fix the numerical stability, the classification accuracy of the adversary drops to 2.0%
To my best knowledge, the latter seems to be more reasonable.
I understand it is not related to the implementation of advertorch, yet what I see that the output for the models in the test_utils.py, none of them takes numerical stability into consideration. So if this issue may also occur in advertorch, may I propose to fix this in the tutorial?
The text was updated successfully, but these errors were encountered: