paint-brush
What are Adversarial AI Attacks and How Do We Combat Them?by@modzy
238 reads

What are Adversarial AI Attacks and How Do We Combat Them?

by ModzyMay 25th, 2021
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Modzy is developing a software platform for organizations and developers to responsibly deploy, monitor, and get value from AI - at scale. The field of adversarial machine learning focuses on addressing this problem by developing high-performing deep learning models that are also robust against this type of attack. Adversarial AI attacks can be divided into two categories: white-box attacks, black-box and poisoning attacks. Modzy's robust solutions are based on the Lyapunov Theory of Robustness and Stability of Nonlinear Systems [4, 5]
featured image - What are Adversarial AI Attacks and How Do We Combat Them?
Modzy HackerNoon profile picture

Deep learning is the main force behind the recent advances in the field of artificial intelligence (AI). Deep learning models are capable of performing on par with, if not exceeding, human levels, at a variety of different tasks and objectives. However, deep neural networks are vulnerable to subtle adversarial perturbations applied to their inputs – adversarial AI. These adversarial perturbations, which can be imperceptible to the human eye, can easily mislead a trained deep neural network into making wrong decisions.

The field of adversarial machine learning focuses on addressing this problem by developing high-performing deep learning models that are also robust against this type of adversarial attack. At Modzy, we’re conducting cutting-edge research to improve upon past approaches that defend against adversarial attacks, ensuring our models maintain peak performance and robustness when faced with adversarial AI. 

What you need to know

Although impressive breakthroughs have been made by leveraging deep neural networks in many fields such as image classification and object detection, Szegedy et al. [1] discovered that these models can easily be fooled by adversarial attacks. For example, an image manipulated by an adversary with only a few modified pixels can easily fool an image classifier into confidently predicting the wrong class for that image (Figure, [2]).

This exposes an unpleasant fact which is that deep learning models do not process information in a manner that is similar to humans. This phenomenon undermines the practicality of many current deep learning models that are trained solely for accuracy and performance, but not for robustness against these types of attacks.

The research community is quite active in pursuing possible solutions to this problem. On the adversarial side, many attacking methods which utilize the vulnerabilities of trained deep neural networks have been proposed [2,3]. On the defense side, these attacking schemes are used to propose new training and design methodologies for deep neural networks in order to produce deep learning models that are relatively more robust against adversarial AI attacks.

As an example, adversarial training has been proposed as a possible solution to enhance robustness. Adversarial training involves training a deep neural network on a larger training dataset that includes both original and adversarially perturbed inputs [2]. However, due to a lack of understanding of the adversarial phenomena described above, none of the current solutions proposed in the research community are capable of addressing this problem in a generalizable sense across different domains.

Adversarial AI attacks can be divided into two categories: 

  1. white-box attacks  
  2. black-box attacks

Mathematically speaking, all deep neural networks are trained to optimize their behavior in relationship to a specific task, such as language translation or image classification. During training, this desired behavior is usually formulated as an optimization problem that minimizes a loss value according to a specific formula that measures deviations from the desired behavior.

The adversarial examples are inputs that do the opposite. They maximize this loss value and consequently maximize deviations from the desired behavior. Finding these adversarial examples requires knowledge of the inner workings of the deep neural network.

The strong assumption under the white-box attack framework is that the adversary has full knowledge of the inner workings of the deep neural network and can utilize this knowledge to design adversarial inputs. Under the black-box attack framework, the adversary has a limited knowledge of the architecture of the deep neural network and can only estimate the behavior of the model and devise adversarial examples based on its estimation.

Another type of attack, a poisoning attack, purely focuses on manipulating the training dataset so that any deep learning model trained dataset using it yields a sub-par performance during inference. 

A New Understanding of Adversarial AI Attacks

At Modzy, we developed a new understanding of adversarial AI attacks on deep neural networks by utilizing the Lyapunov Theory of Robustness and Stability of Nonlinear Systems [4, 5]. Our robust solutions are based on this theory, which dates back further than a century and has been extensively used in the field of control theory to design automated systems such as aircrafts and automobile systems; the expectation is that these systems should be stable, robust, and able to maintain the desired performance in unknown environments.

Our robust deep learning models, tested against strong white-box attacks, are trained with a similar expectation, so that they can make correct predictions in unknown environments and under the possibility of adversarial attacks. We also train our deep learning models in a novel way by enhancing the well-known backpropagation algorithm commonly used across industry to train deep learning models. Our robust models are trained to rely on a holistic set of features learned from the input when making predictions.

For example, our image classifiers look at the context of the entire image before classifying an object. This means that modifying a few pixels will not affect the final classification decisions made by our models. Consequently, our robust models closely imitate how humans learn and make decisions.

Adversarial attacks on deep neural networks pose a great risk to the successful deployment of deep learning models in mission-critical environments. One challenging aspect of adversarial AI is the fact that these small adversarial perturbations, while capable of completely fooling the deep learning model, are imperceptible to the human eye. The increasing reliance on deep learning models in the field of artificial intelligence further points to the adverse impact that adversarial AI can have on our society.

We take this risk seriously and are actively developing new ways to enhance the defensive capabilities of our models. Our robust deep learning models guarantee high performance and resilience against adversarial AI and are trained to be deployed into unknown environments.

References:

  • Szegedy, Christian, et al. “Intriguing properties of neural networks.” arXiv preprint arXiv:1312.6199 (2013). 
  • Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. “Explaining and harnessing adversarial examples.” arXiv preprint arXiv:1412.6572 (2014).
  • Madry, Aleksander, et al. “Towards deep learning models resistant to adversarial attacks.” arXiv preprint arXiv:1706.06083 (2017).
  • Arash Rahnama, Andre T. Nguyen, and Edward Raff, “Connecting Lyapunov Control Theory to Adversarial Attacks,” ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2019.
  • Arash Rahnama, Andre T. Nguyen, and E. Raff, “Robust Design of Deep Neural Networks Against Adversarial Attacks Based on Lyapunov Theory,” IEEE/CVF International Conference on Computer Vision and Pattern Recognition (CVPR), 2020.