RayS: A Ray Searching Method for Hard-label Adversarial Attack (KDD2020)
"RayS: A Ray Searching Method for Hard-label Adversarial Attack"
Jinghui Chen, Quanquan Gu
https://arxiv.org/abs/2006.12792
This repository contains our PyTorch implementation of RayS: A Ray Searching Method for Hard-label Adversarial Attack in the paper RayS: A Ray Searching Method for Hard-label Adversarial Attack (accepted by KDD 2020).
RayS is a hard-label adversarial attack which only requires the target model's hard-label output (prediction label).
It is gradient-free, hyper-parameter free, and is also independent of adversarial losses such as CrossEntropy or C&W.
Therefore, RayS can be used as a good sanity check for possible "falsely robust" models (models that may overfit to certain types of gradient-based attacks and adversarial losses).
RayS also proposed a new model robustness metric: ADBD
(average decision boundary distance), which reflects examples' average distance to their closest decision boundary.
We tested the robustness of recently proposed robust models which are trained on the CIFAR-10 dataset with the maximum L_inf norm perturbation strength epsilon=0.031
(8/255). The robustness is evaluated on the entire CIFAR-10 testset (10000 examples).
Note:
*
denotes model using extra data for training.Robust Acc (RayS)
represents robust accuracy under RayS attack for L_inf norm perturbation strength epsilon=0.031
(8/255). For truly robust models, this value could be larger than the reported value (using white-box attacks) due to the hard-label limitation. For the current best robust accuracy evaluation, please refers to AutoAttack, which uses an ensemble of four white-box/black-box attacks.ADBD
represents our proposed Average Decision Boundary Distance metric, which is independent to the perturbation strength epsilon
. It reflects the overall model robustness through the lens of decision boundary distance. ADBD
can be served as a complement to the traditional robust accuracy metric. Furthermore, ADBD
only depends on hard-label output and can be adopted for cases where back-propgation or even soft-labels are not available.Method | Natural Acc | Robust Acc (Reported) |
Robust Acc (RayS) |
ADBD |
---|---|---|---|---|
WAR (Wu et al., 2020)* |
85.6 | 59.8 | 63.2 | 0.0480 |
RST (Carmon et al., 2019)* |
89.7 | 62.5 | 64.6 | 0.0465 |
HYDRA (Sehwag et al., 2020)* |
89.0 | 57.2 | 62.1 | 0.0450 |
MART (Wang et al., 2020)* |
87.5 | 65.0 | 62.2 | 0.0439 |
UAT++ (Alayrac et al., 2019)* |
86.5 | 56.3 | 62.1 | 0.0426 |
Pretraining (Hendrycks et al., 2019)* |
87.1 | 57.4 | 60.1 | 0.0419 |
Robust-overfitting (Rice et al., 2020) |
85.3 | 58.0 | 58.6 | 0.0404 |
TRADES (Zhang et al., 2019b) |
85.4 | 56.4 | 57.3 | 0.0403 |
Backward Smoothing (Chen et al., 2020) |
85.3 | 54.9 | 55.1 | 0.0403 |
Adversarial Training (retrained) (Madry et al., 2018) |
87.4 | 50.6 | 54.0 | 0.0377 |
MMA (Ding et al., 2020) |
84.4 | 47.2 | 47.7 | 0.0345 |
Adversarial Training (original) (Madry et al., 2018) |
87.1 | 47.0 | 50.7 | 0.0344 |
Fast Adversarial Training (Wong et al., 2020) |
83.8 | 46.1 | 50.1 | 0.0334 |
Adv-Interp (Zhang & Xu, 2020) |
91.0 | 68.7 | 46.9 | 0.0305 |
Feature-Scatter (Zhang & Wang, 2019) |
91.3 | 60.6 | 44.5 | 0.0301 |
SENSE (Kim & Wang, 2020) |
91.9 | 57.2 | 43.9 | 0.0288 |
Please contact us if you want to add your model to the leaderboard.
Import RayS attack by
from general_torch_model import GeneralTorchModel
torch_model = GeneralTorchModel(model, n_class=10, im_mean=None, im_std=None)
from RayS import RayS
attack = RayS(torch_model, epsilon=args.epsilon)
where:
torch_model
is the PyTorch model under GeneralTorchModel warpper; For models using transformed images (exceed the range of [0,1]), simply set im_mean=[0.5, 0.5, 0.5]
and im_std=[0.5, 0.5, 0.5]
for instance,epsilon
is the maximum adversarial perturbation strength.To actually run RayS attack, use
x_adv, queries, adbd, succ = attack(data, label, query_limit)
it returns:
x_adv
: the adversarial examples found by RayS,queries
: the number of queries used for finding the adversarial examples,adbd
: the average decision boundary distance for each example,succ
: indicate whether each example being successfully attacked. - python3 attack_robust.py --dataset rob_cifar_trades --query 40000 --batch 1000 --epsilon 0.031
--num 1000
argument to limit the number of examples to be attacked as 1000. Default num
is set as 10000 (the whole CIFAR10 testset).To evaluate TensorFlow models with RayS attack:
from general_tf_model import GeneralTFModel
tf_model = GeneralTFModel(model.logits, model.x_input, sess, n_class=10, im_mean=None, im_std=None)
from RayS import RayS
attack = RayS(tf_model, epsilon=args.epsilon)
where:
model.logits
: logits tensor return by the Tensorflow model,model.x_input
: placeholder for model input (NHWC format),sess
: TF session .The remaining part is the same as evaluating PyTorch models.
- python3 attack_natural.py --dataset inception --epsilon 0.05
- python3 attack_natural.py --dataset resnet --epsilon 0.05
- python3 attack_natural.py --dataset cifar --epsilon 0.031
- python3 attack_natural.py --dataset mnist --epsilon 0.3
Please check our paper for technical details and full results.
@inproceedings{chen2020rays,
title={RayS: A Ray Searching Method for Hard-label Adversarial Attack},
author={Chen, Jinghui and Gu, Quanquan},
booktitle={Proceedings of the 26rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
year={2020}
}
If you have any question regarding RayS attack or the ADBD leaderboard above, please contact [email protected], enjoy!