TL;DR: By regularizing for similar input gradients, we can transfer adversarial robustness from a teacher to a student classifier even with different training dataset and model architecture. Abstract: Adversarial perturbations are imperceptible …
TL;DR: We propose a comprehensive defense to detect and neutralize backdoor poisoning attacks of different sizes. Abstract: Deep learning models have recently shown to be vulnerable to backdoor poisoning, an insidious attack where the victim model …