Day 11 Dimensionality Reduction

Today I explored Dimensionality Reduction β a vital step to make sense of high-dimensional data ππ
π Analogy
Imagine packing for a trip. You have a massive wardrobe (your data), but your suitcase only fits 10 items. So, you pick versatile clothes β a few that cover most needs (formal, casual, warm, cold).
Thatβs Dimensionality Reduction: Compressing a huge dataset into its most meaningful parts β minimizing loss while maximizing utility.
πΉ Two Common Techniques
β
PCA (Principal Component Analysis)
π§³ Like folding and layering smartly to save space, PCA finds directions (components) that capture the most variance (info).
Linear method
Great when data lies in neat, straight patterns
β
t-SNE (t-distributed Stochastic Neighbor Embedding)
π§³ Like grouping clothes by outfits (shoes + formalwear), t-SNE clusters related data together.
Non-linear
Captures local relationships, distorts global structure
Ideal for visualizing complex, high-dimensional datasets
π These techniques help models:
β Train faster
β Generalize better
β Reveal hidden patterns
π Security Lens
β οΈ Information Loss & Blind Spots
π You packed for summer, but forgot a raincoat. Rare threats (low variance) may get discarded β making your model blind to anomalies or attacks.
"Low variance" β "low importance" β especially in security contexts.
β οΈ Feature Obfuscation by Attackers
π Attackers can embed malicious patterns in dimensions likely to be discarded or compressed β bypassing detection pipelines.
β οΈ Inference Attacks on Embeddings
π It's like sharing a blurred photo of your bag β someone could still guess your travel habits from the outline of items.
t-SNE visualizations can leak structural info β attackers might infer relationships between users, labels, or features. These compressed representations, if exposed, can be mined or reversed to extract sensitive patterns or identities.
π Key References
Jolliffe (2002) β Principal Component Analysis
Carlini et al. (2020) β Extracting Training Data from Embeddings
π¬ Prompt
Have you visualized your model with t-SNE? What insights β or vulnerabilities β did you discover?
π
Tomorrow
We explore KNN & Clustering β the simplest ML algorithms and how attackers exploit proximity logic ππ₯
π Missed Day 10?
Catch up here: https://lnkd.in/gMh3rr8b
#100DaysOfAISec - Day 11 Post #AISecurity #MLSecurity #MachineLearningSecurity #DimensionalityReduction #CyberSecurity #AIPrivacy #AdversarialML #LearningInPublic #100DaysChallenge #ArifLearnsAI #LinkedInTech
Last updated