Understanding the AI Data Poisoning Attack Landscape

Data poisoning is becoming a huge concern in machine and deep learning. Surveys highlight it as one of the biggest threats in the industry. Deep learning models rely on vast amounts of training data, often using massive web-scale datasets like LAION-400B and COYO-700M as well as synthetic data. Research from NVIDIA and DeepMind shows that poisoning these datasets is pretty feasible and can be done for as little as $60. Alarming, right? Even poisoning rates as low as 0.001% can compromise models.

At Valyu, we've been diving deep into adversarial machine learning research. If you're curious about this field, the NIST Adversarial Machine Learning publication is a great place to start. It covers all the key terms, attack methods, and the latest on mitigation.

To understand data poisoning attacks better, it helps to know they can happen in two main ways for web-scale datasets:

Split-View Data Poisoning Attack

One notable attack is the split-view data poisoning attack on the LAION-400M dataset. This involves buying expired domains of images in the dataset. By purchasing these domains, an attacker can replace the original content with malicious data that gets included when the dataset is next scraped or updated.

Front-Running Data Poisoning Attack

Another method is the front-running data poisoning attack on Wikipedia. This attack takes advantage of the predictable times of Wikipedia's snapshots. By injecting malicious content just before a snapshot is taken, the attacker ensures that this content is included in the dataset.

Categorising Data Poisoning Attacks

Data poisoning attacks can be broadly categorised into two types: backdoor attacks and triggerless poisoning attacks.

Backdoor Attacks: These embed a specific trigger in the training data. The model learns to associate this trigger with a specific behaviour, like misclassifying any input containing the trigger at inference time.
Triggerless Poisoning Attacks: These don't require any modification at inference time. Instead, they subtly alter the training data so the model makes incorrect predictions under normal conditions.

A Closer Look at Four Specific Data Poisoning Attacks

Here, we detail four specific types of data poisoning attacks, including their desired effects, mechanisms, and optimisation problems.

1. Feature Collision (FC)

Desired Effect: The model misclassifies the target image as the label of the poisoned (base) images.

Mechanism: Small perturbations are added to base images so that their feature representations are very close to the feature representation of the target image. This minimises the distance between the poisoned images and the base images, ensuring that the model associates the target image with the label of the base images.

Optimisation Problem:

Feature Collision Model Poisonong

2. Convex Polytope (CP)

Desired Effect: The model misclassifies a specific target image by making it appear as a combination of multiple poisoned images in the feature space.

Mechanism: The attacker creates multiple poisoned images whose feature representations, when combined, mimic the feature representation of the target image. This confuses the model, causing it to misclassify the target image.

Optimisation Problem:

Convex Polytype Model Posioning

3. Clean Label Backdoor (CLBD)

Desired Effect: The model misclassifies images containing a specific trigger at inference time.

Mechanism: Poisoned images that look similar to the base images are created but include a hidden trigger. These images are perturbed to maximise cross-entropy loss, ensuring that at inference time, the presence of the trigger causes the model to misclassify the images.

Optimisation Problem:

Clean Label Backdoor Model Poisoning

4. Hidden Trigger Backdoor (HTBD)

Desired Effect: The model misclassifies images that contain a hidden trigger.

Mechanism: This attack is similar to FC but also incorporates a hidden trigger. Poisoned images are crafted to be close in feature space to both the base images and a patched image with the hidden trigger. When the trigger is present in an image, the model is tricked into misclassifying it.

Optimisation Problem:

Hidden Trigger Backdoor Model Poisoning

Challenges in Detecting Data Poisoning

Detecting data poisoning is tough because these attacks are so covert. Most detection methods are very domain-specific, like sample-based approaches used in recommendation systems. General detection techniques are still lacking, so mitigation is the way to go.

Mitigation Strategies

Mitigation strategies can be surprisingly simple. Techniques like horizontal flips, random crops, and data normalisation for image datasets have been shown to significantly reduce the effectiveness of data poisoning attacks. For example, even just applying horizontal flips can cut attack effectiveness by up to 80%.

Practical Recommendations for ML Engineers

Machine learning engineers can adopt several strategies to mitigate the risks of data poisoning:

Data Augmentation: Using techniques like random crops and horizontal flips during training can make models more robust against poisoned data.
Optimiser Choice: Switching from the ADAM optimiser to SGD has been shown to reduce the effectiveness of poisoning attacks. For instance, data normalisation and augmentation with SGD reduced the success rate of Feature Collision attacks by 41%.

We believe adopting such mitigation methods for data poisoning attacks is crucial for maintaining the integrity of machine learning models. By incorporating effective mitigation strategies into training practices, engineers can better protect their models from these attacks.

References:

‍—-

Photo by Google DeepMind from Pexels.

Understanding the Data Poisoning Attack Landscape

Split-View Data Poisoning Attack

Front-Running Data Poisoning Attack

Categorising Data Poisoning Attacks

A Closer Look at Four Specific Data Poisoning Attacks

1. Feature Collision (FC)

2. Convex Polytope (CP)

3. Clean Label Backdoor (CLBD)

4. Hidden Trigger Backdoor (HTBD)

Challenges in Detecting Data Poisoning

Mitigation Strategies

Practical Recommendations for ML Engineers