Certifiable Distributional Robustness with Principled Adversarial Training, a paper accepted for ICLR 2018, is a collaborative effort of Aman Sinha, Hongseok Namkoong, and John Duchi. In this paper, the authors state the vulnerability of neural networks to adversarial examples and further take the perspective of a distributionally robust optimization which guarantees performance under adversarial input perturbations.
Certifiable Distributional Robustness with Applying Principled Adversarial Training
What problem is the paper trying to solve?
Recent works have shown that neural networks are vulnerable to adversarial examples; seemingly imperceptible perturbations to data can lead to misbehavior of the model, such as misclassifications of the output. Many researchers proposed adversarial attack and defense mechanisms to counter these vulnerabilities. While these works provide an initial foundation for adversarial training, there are no guarantees on whether proposed white-box attacks can find the most adversarial perturbation and whether there is a class of attacks such defenses can successfully prevent. On the other hand, verification of deep networks using SMT (satisfiability modulo theories) solvers provides formal guarantees on robustness but is NP-hard in general. This approach requires prohibitive computational expense even on small networks.
The authors take the perspective of distributionally robust optimization and provide an adversarial training procedure with provable guarantees on its computational and statistical performance.
Paper summary
This paper proposes a principled methodology to induce distributional robustness in trained neural nets with the purpose of mitigating the impact of adversarial examples.
The idea is to train the model to perform well not only with respect to the unknown population distribution, but to perform well on the worst-case distribution in a Wasserstein ball around the population distribution. In particular, the authors adopt the Wasserstein distance to define the ambiguity sets. This allows them to use strong duality results from the literature on distributionally robust optimization and express the empirical minimax problem as a regularized ERM (empirical risk minimization) with a different cost.
Key takeaways
- The paper provides a method for efficiently guaranteeing distributional robustness with a simple form of adversarial data perturbation.
- The method values strong statistical guarantees and fast optimization rates for a large class of problems.
- Empirical evaluations indicate that the proposed methods are in fact robust to perturbations in the data, and they outperform less-principled adversarial training techniques.
- The major benefit of this approach is its simplicity and wide applicability across many models and machine-learning scenarios.
Reviewer comments summary
Overall Score: 27/30
Average Score: 9
The reviewers have strongly accepted this paper and have stated that it is of a great quality and originality. They said that this paper is an interesting attempt, but some of the key claims seem to be inaccurate and miss comparison to proper baselines.
Another reviewer said, the paper applies recently developed ideas in the literature of robust optimization, in particular distributionally robust optimization with Wasserstein metric, and showed that under this framework for smooth loss functions when not too much robustness is requested, then the resulting optimization problem is of the same difficulty level as the original one (where the adversarial attack is not concerned).
The paper has also received some criticisms but at the end of all it is majorly liked by many of the reviewers.