Day 23 Adversarial Examples

When ML Sees What’s Not There

Imagine adding a few pixels of noise to a stop sign… and suddenly a self-driving car thinks it’s a speed limit sign. That’s not science fiction — that’s Adversarial Machine Learning in action.


🎯 What Are Adversarial Examples?

Inputs that are intentionally and subtly modified to fool machine learning models — without changing what a human would perceive.

Examples:

  • 🐼 + invisible noise ➡️ 🦍 &#xNAN;(ImageNet misclassification from Goodfellow et al., 2014)

  • 📝 Reworded spam that passes email filters

  • 📸 Camouflaged clothing that bypasses surveillance AI

These attacks exploit linear weaknesses in high-dimensional models — essentially hacking the math behind ML.


🔍 Motive of the Attacker

Why craft adversarial examples?

  • 🎯 Evade detection: Slip past spam filters, malware classifiers, or surveillance tools.

  • 🎯 Trigger misclassification: Mislead self-driving cars or biometric systems into dangerous decisions.

  • 🎯 Model probing: Map model boundaries or reverse-engineer behavior.

  • 🎯 Strategic or financial gain: Disrupt AI-driven systems (ads, pricing, fraud detection) for profit or sabotage.


🔐 Security Lens

⚠️ Evasion Attacks

Craft inputs at inference time to bypass detection. → e.g., tweak malware binaries or phishing images

⚠️ Black-box Attacks

Don’t need access to model internals — thanks to transferability, attacks created for one model often work on others.

⚠️ Physical-World Attacks

Stickers on road signs or custom glasses that fool facial recognition. Adversarial ML escapes the lab — and enters reality.


🧪 Real-World Examples

  • 🔧 Tesla Autopilot fooled by altered road signs – [Tencent Keen Lab]

  • 🐢 Google Vision API labeled turtle as rifle

  • 🪞 Apple FaceID bypassed by 3D-printed mask (2017 demo)

These are not theoretical flaws — they’ve already been exploited.


🛡 Defenses (Imperfect, But Useful)

  • Adversarial Training – Train with adversarial examples

  • Input Sanitization – Remove or normalize noise

  • Certified Defenses – e.g., randomized smoothing

  • Gradient Masking – Obfuscate gradients (but fragile!)

  • Reject Suspicious Inputs – Flag inputs close to decision boundaries

  • Defensive Distillation


📚 Key References

  • Goodfellow et al. (2014) – “Explaining and Harnessing Adversarial Examples”

  • Kurakin et al. (2016) – “Adversarial Examples in the Physical World”

  • Athalye et al. (2018) – “Obfuscated Gradients Give a False Sense of Security”

🔗 Adversarial.js by Kenny Song 🔗 OpenAI blog 🔗 cleverhans python libarary


💬 Question for You

How do you test your models for adversarial robustness? Should it be part of every AI model's CI/CD pipeline?


📅 Coming Up

Day 23 → Data Poisoning Attacks – when the training data becomes the attack vector. 🔥


🔗 Catch Up on Day 22

Last updated