Day 23 Adversarial Examples

When ML Sees What’s Not There

Imagine adding a few pixels of noise to a stop sign… and suddenly a self-driving car thinks it’s a speed limit sign. That’s not science fiction β€” that’s Adversarial Machine Learning in action.


🎯 What Are Adversarial Examples?

Inputs that are intentionally and subtly modified to fool machine learning models β€” without changing what a human would perceive.

Examples:

  • 🐼 + invisible noise ➑️ 🦍 &#xNAN;(ImageNet misclassification from Goodfellow et al., 2014)

  • πŸ“ Reworded spam that passes email filters

  • πŸ“Έ Camouflaged clothing that bypasses surveillance AI

These attacks exploit linear weaknesses in high-dimensional models β€” essentially hacking the math behind ML.


πŸ” Motive of the Attacker

Why craft adversarial examples?

  • 🎯 Evade detection: Slip past spam filters, malware classifiers, or surveillance tools.

  • 🎯 Trigger misclassification: Mislead self-driving cars or biometric systems into dangerous decisions.

  • 🎯 Model probing: Map model boundaries or reverse-engineer behavior.

  • 🎯 Strategic or financial gain: Disrupt AI-driven systems (ads, pricing, fraud detection) for profit or sabotage.


πŸ” Security Lens

⚠️ Evasion Attacks

Craft inputs at inference time to bypass detection. β†’ e.g., tweak malware binaries or phishing images

⚠️ Black-box Attacks

Don’t need access to model internals β€” thanks to transferability, attacks created for one model often work on others.

⚠️ Physical-World Attacks

Stickers on road signs or custom glasses that fool facial recognition. Adversarial ML escapes the lab β€” and enters reality.


πŸ§ͺ Real-World Examples

  • πŸ”§ Tesla Autopilot fooled by altered road signs – [Tencent Keen Lab]

  • 🐒 Google Vision API labeled turtle as rifle

  • πŸͺž Apple FaceID bypassed by 3D-printed mask (2017 demo)

These are not theoretical flaws β€” they’ve already been exploited.


πŸ›‘ Defenses (Imperfect, But Useful)

  • βœ… Adversarial Training – Train with adversarial examples

  • βœ… Input Sanitization – Remove or normalize noise

  • βœ… Certified Defenses – e.g., randomized smoothing

  • βœ… Gradient Masking – Obfuscate gradients (but fragile!)

  • βœ… Reject Suspicious Inputs – Flag inputs close to decision boundaries

  • βœ… Defensive Distillation


πŸ“š Key References

  • Goodfellow et al. (2014) – β€œExplaining and Harnessing Adversarial Examples”

  • Kurakin et al. (2016) – β€œAdversarial Examples in the Physical World”

  • Athalye et al. (2018) – β€œObfuscated Gradients Give a False Sense of Security”

πŸ”— Adversarial.js by Kenny Song πŸ”— OpenAI blog πŸ”— cleverhans python libarary


πŸ’¬ Question for You

How do you test your models for adversarial robustness? Should it be part of every AI model's CI/CD pipeline?


πŸ“… Coming Up

Day 23 β†’ Data Poisoning Attacks – when the training data becomes the attack vector. πŸ”₯


πŸ”— Catch Up on Day 22

Last updated