**Targeted Data Poisoning Attacks in Deep Learning: A Coding Guide**
Hey there, fellow machine learning enthusiasts! Today, we’re going to tackle a crucial topic in machine learning security: targeted data poisoning attacks. We’ll be exploring how to manipulate labels in the CIFAR-10 dataset and observe the impact on model behavior. Get ready to learn about refining your data pipelines to ensure consistent and comparable learning dynamics!
To achieve this, we’ll selectively flip a fraction of samples from a target class to a malicious class during training, demonstrating how refined corruption in the data pipeline can propagate into systematic misclassification at inference time.
**Code:** Check out the full code here:
**Dataset Wrapping**: We’ll implement a custom dataset wrapper that allows controlled label poisoning during training. We’ll flip a configurable fraction of samples from the target class to a malicious class while keeping the test data untouched.
**Model Definition**: We’ll define a lightweight ResNet-based model tailored for CIFAR-10 and implement the full training loop. We’ll train the model using standard cross-entropy loss and Adam optimization to ensure steady convergence.
**Evaluation**: We’ll run inference on the test set and acquire predictions for quantitative evaluation. We’ll compute confusion matrices to visualize class-wise behavior for both clean and poisoned models. We’ll use these visual diagnostics to highlight focused misclassification patterns introduced by the attack.
**Code Snippets**:
“`python
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader, Dataset
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import confusion_matrix, classification_report
CONFIG = {
“batch_size”: 128,
“epochs”: 10,
“lr”: 0.001,
“target_class”: 1,
“malicious_label”: 9,
“poison_ratio”: 0.4,
}
…
def get_model():
model = torchvision.models.resnet18(num_classes=10)
model.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
model.maxpool = nn.Identity()
return model.to(CONFIG[“device”])
…
def train_and_evaluate(train_loader, description):
model = get_model()
optimizer = optim.Adam(model.parameters(), lr=CONFIG[“lr”])
criterion = nn.CrossEntropyLoss()
for _ in range(CONFIG[“epochs”]):
model.train()
for photos, labels in train_loader:
photos = photos.to(CONFIG[“device”])
labels = labels.to(CONFIG[“device”])
optimizer.zero_grad()
outputs = model(photos)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
return model
…
def plot_results(clean_preds, clean_labels, poisoned_preds, poisoned_labels, courses):
fig, ax = plt.subplots(1, 2, figsize=(16, 6))
for i, (preds, labels, title) in enumerate([
(clean_preds, clean_labels, “Clean Model Confusion Matrix”),
(poisoned_preds, poisoned_labels, “Poisoned Model Confusion Matrix”)
]):
cm = confusion_matrix(labels, preds)
sns.heatmap(cm, annot=True, fmt=”d”, cmap=”Blues”, ax=ax[i],
xticklabels=courses, yticklabels=courses)
ax[i].set_title(title)
plt.tight_layout()
plt.show()
…
“`
**Conclusion**: In this tutorial, we’ve demonstrated how label-level data poisoning attacks can degrade class-specific performance without necessarily destroying overall accuracy. We analyzed this behavior using confusion matrices and per-class classification reports, revealing focused failure modes introduced by the attack. This experiment reinforces the importance of data provenance, validation, and monitoring in real-world machine learning systems, particularly in security-critical domains.
Try the FULL CODES here!
