Recently, there has been a big buzz around the downsides and negative effects of AI and ML. This has sparked the conversation on adversarial ML and adversarial attacks. Alongside the usual types of attacks, one type has stood out, especially in consideration of copyright infringement. Data poisoning has been making waves, and now people are noticing what it could do to AI and ML models’ precision, correctness, and reliability.
Done with either malicious intent or as a way to restrict the growing influence of AI on those who create content and art, data poisoning is not something to take lightly. Considering that AI and ML models are only as good as the data they are trained on, we can see why this poses a serious issue.
Introductory call to data poisoning
Data poisoning, a term briefly mentioned in some of our previous blog posts on adversarial ML, is an alternation of entries or injection of training data with false or tempered data, to change the outcome of AI and ML models. It influences the data during training time by changing existing data or introducing incorrectly labeled data. The algorithm learns from corrupted data and consequently generates unwanted or unintended conclusions. The model will make incorrect predictions since it was fed wrong or corrupted data to train the model.
Beware of the adversaries. By manipulating data, the results of your ML or AI models won’t be what you expect them to be. Data poisoning works in such ways that sometimes it’s hard to see when it happened and how. And when you do realize something is off, it will probably be too late. These attacks are ultimately costly because it’s hard to backtrack to the point of data poisoning. One might argue that machine unlearning is the solution, but it still isn’t advanced enough to be used in such a way.
Not many talk about data poisoning in terms of AI, but it’s something to really worry about. Its influence can have drastic effects on the model’s performance, as well as on the users who will try to benefit in some ways from AI. Data poisoning is basically polluting data and models to generate results adversaries try to skew.
Attacks from all sides
Data poisoning happens in more than one form. It’s not always about inputting corrupted or fake data in the training dataset. There are multiple points of attack, each influencing ML models differently. The most basic types of these attacks are black box and white box attacks, but we can go even more narrower with subtypes.
Availability attack or straightforward attack
Availability attacks are oriented towards injecting as much bad data into the database as possible. The primary goal is to influence the data the model is trained on, by creating false data or tempering with it.
Integrity attack or backdoor attack
Integrity attacks are far more complex than availability attacks. They leave the database alone, but they create a backdoor for adversaries to control it. These are harder to detect since they influence the model by tempering with training data by mislabeling or changing the status of a particular piece so it is read like the rest, but it’s changed slightly. One type of such attack is label poisoning described below.
Label poisoning (Backdoor poisoning)
Here the attackers inject mislabeled data to influence the model in the inference stage. They purposely feed the model with data that is labeled differently than true examples to influence the final results. Here we can distinguish clean label poisoning where the attackers influence the classifier in the model by choosing to attack the target instance. It is done by injecting poison instances into the training data with the intent of fooling the model into labeling the target instance with the base label at test time.
Training data poisoning
As its name suggests, this attack is targeted towards training data. So it doesn’t influence the model after the fact, but in the early stages of development. The attacker modifies parts of training data so they can manipulate the learning part and the outcome of the AI model.
Model inversion attacks
These types of attacks are focused on extracting sensitive or private data the model was trained on. So, they don’t necessarily influence the model results, but rather try to collect information on the dataset this model was trained on.
In stealth attacks, adversaries try to leave vulnerabilities on training data so they are undetectable in the development phase, This is done with the purpose of exploiting the model when it’s deployed in the real world.
Subpopulation attacks are relevant for large, diverse datasets. The goal is to compromise and affect a particular subpopulation that doesn’t alter the model’s performance on the rest of the population. The adversary induces a model that creates a targeted and incorrect output of the subset and affects a classifier.
Where does that leave us?
Data poisoning is not a small thing. As inconspicuous as it may seem at first glance, it digs deeper and creates issues that affect the overall result and AI model performance that could have grave consequences. It perhaps requires minimal effort to produce big implications.
Recently there have been talks about respecting copyright and AI using copyrighted art as their data source. This is how Nightshade and Glaze came into the spotlight. They are data poisoning tools that artists and creators use to manipulate their artistic products so it, well, ruins AI models’ results, especially in terms of generative AI. The creators whose art and content were “stolen” to train AI are fighting back, and it could prove to be tricky. Such tools that change their art, could ruin models who used them for training. The results will be far from expected or correct, and the reliability of these AI models will remain in shambles.
This has started a sort of war between humans and AI. People want to protect their art from being misused or from not being compensated for their use, and data poisoning is the tool they chose. The only question is, are those tools considered malware or not?
It sort of leaves those who work on AI models or products a bit wary. In this situation how can they be sure that data sets and their labels are clean?! How can they know that the training phase has been successful until they see the results, which, of course, will be wrong?
A future outlook on data poisoning
Nobody knows what the future will hold. There are methods to prevent adversarial attacks and data poisoning, but it’s quite difficult to fight them. As AI progresses so will the data poisoning methods. New ways to influence data will arise and make this battle even more difficult.
There should definitely be a warning with AI. Not because it will do damage, well, hopefully not, but because it will push out wrong results if affected by adversarial attacks. Some of these consequences could be minimal, like skewed or dysmorphic images, to radical ones like accidents from self-driving cars.
The issue is that many businesses rush with AI implementation. After all, if they aren’t using it, the competition is, and they are falling behind. But, many disregard security concerns and don’t consider data poisoning as a threat. A lot of companies have already integrated generative AI without fully analyzing its possible downfalls. In this AI race, moving forward at a greater speed has become imperative, but many fail to see obstacles ahead, such as adversaries.
What will possibly happen with many AI models that are already running on some either sensitive or manipulated data, is a black box. Some may get affected by adversarial attacks, others may not. If data poisoning gets even more sophisticated, a large number of data points will be affected and it will become harder and harder to circle back to fixing training data without completely scrapping affected AI and ML models.