Day 23 Adversarial Examples
When ML Sees Whatβs Not There
Imagine adding a few pixels of noise to a stop signβ¦ and suddenly a self-driving car thinks itβs a speed limit sign. Thatβs not science fiction β thatβs Adversarial Machine Learning in action.






π― What Are Adversarial Examples?
Inputs that are intentionally and subtly modified to fool machine learning models β without changing what a human would perceive.
Examples:
πΌ + invisible noise β‘οΈ π¦ &#xNAN;(ImageNet misclassification from Goodfellow et al., 2014)
π Reworded spam that passes email filters
πΈ Camouflaged clothing that bypasses surveillance AI
These attacks exploit linear weaknesses in high-dimensional models β essentially hacking the math behind ML.
π Motive of the Attacker
Why craft adversarial examples?
π― Evade detection: Slip past spam filters, malware classifiers, or surveillance tools.
π― Trigger misclassification: Mislead self-driving cars or biometric systems into dangerous decisions.
π― Model probing: Map model boundaries or reverse-engineer behavior.
π― Strategic or financial gain: Disrupt AI-driven systems (ads, pricing, fraud detection) for profit or sabotage.
π Security Lens
β οΈ Evasion Attacks
Craft inputs at inference time to bypass detection. β e.g., tweak malware binaries or phishing images
β οΈ Black-box Attacks
Donβt need access to model internals β thanks to transferability, attacks created for one model often work on others.
β οΈ Physical-World Attacks
Stickers on road signs or custom glasses that fool facial recognition. Adversarial ML escapes the lab β and enters reality.
π§ͺ Real-World Examples
π§ Tesla Autopilot fooled by altered road signs β [Tencent Keen Lab]
π’ Google Vision API labeled turtle as rifle
πͺ Apple FaceID bypassed by 3D-printed mask (2017 demo)
These are not theoretical flaws β theyβve already been exploited.
π‘ Defenses (Imperfect, But Useful)
β Adversarial Training β Train with adversarial examples
β Input Sanitization β Remove or normalize noise
β Certified Defenses β e.g., randomized smoothing
β Gradient Masking β Obfuscate gradients (but fragile!)
β Reject Suspicious Inputs β Flag inputs close to decision boundaries
β Defensive Distillation
π Key References
Goodfellow et al. (2014) β βExplaining and Harnessing Adversarial Examplesβ
Kurakin et al. (2016) β βAdversarial Examples in the Physical Worldβ
Athalye et al. (2018) β βObfuscated Gradients Give a False Sense of Securityβ
π Adversarial.js by Kenny Song π OpenAI blog π cleverhans python libarary
π¬ Question for You
How do you test your models for adversarial robustness? Should it be part of every AI model's CI/CD pipeline?
π
Coming Up
Day 23 β Data Poisoning Attacks β when the training data becomes the attack vector. π₯
π Catch Up on Day 22
Last updated