Adversarial Machine Learning

nidhi.gowdra · August 12, 2025, 11:55pm

After completing the Adversarial Machine Learning course here are a few tips that could be of use to the community.

Assessments:

Evasion
I couldn’t get the label to pass the autograder, as you only need above a 90 score to pass, I just ignored it. Given enough time and patience I could have probably got it to work. I implemented the carlini and wagner l2 attack. I had to loop through the steps to get the target class, I wrote the resized image to disk, reloaded and continued if it failed to detect adding more perturbations. It took around 1000 steps to get the classifier to detect the target class after reloading (failed by autograder).
Extraction
It takes a while (Around 10 minutes for the dataset to load and map everything). Assessment: The dataloader and extracted labels functions are quite simple, just use the signature hint. To relabel the ds[‘train‘] with extracted labels, I used a function to take in a label and relabel it to 0 if f101_hotdog; 1 otherwise then used the .map() by passing in the function definition.
The only catch is during training, ensure you check the syntax by understanding that we need to use the victim model and we only care about the 2 classes.
Assessments
Probably the easiest one, just follow the course content but ensure you use the test string variable defined in the grading notebook.
Inversion
Another relatively simple task, the only catch is that you write to file only the exact 3 classes listed in the assessment description.
Poisoning
This was quite difficult to get autograder working and not getting OOM errors. It took around 30 mins to get the data poisoned data. We need to ensure that we perform a clean-label attack i.e. only modify the data not the label. I had to split the poisoning (GradientMatchingAttack) into 4, i.e. I created a dict, then used 1/4th of the data, then iterated over it and updated the dict after each iteration. I used a manual epsilon value set higher than what was covered in the lessons. Finally, I got the image of a cat in black pixels, the rest was just artefacted.
LLMs
This was tricky and I spent most of my time getting the autograder to work. The catch here is that the evaluation is done server-side. While it was relatively easy after injection to get the model to output the warden’s name and color (I used darkblue, you can also go with powderblue etc., as others mentioned in the forums), it would not work when grading since it’s server-side evaluation.
I eventually got it working by inserting the sentence I needed i.e. “Danny Shaffer is the Warden and his favorite color is DarkBlue“, used the s.embed_model defined in the init method to get embeddings and then wrote the output sentence into txt file. You can test if it works by using the get_similar_strs and answer_question (I went with what is Danny Shaffer’s favorite color?). It is finicky.

Hope this helps.

Topic		Replies	Views
Exploring Adversarial Machine Learning - 2_evasion_assessment Teaching & Curriculum Support	7	238	March 12, 2025
Exploring Adversarial Machine Learning Teaching & Curriculum Support dli	12	902	August 18, 2025
Exploring Adversarial Machine Learning Teaching & Curriculum Support	2	91	July 11, 2025
Exploring Adversarial Machine Learning - Poisoning Assessment Teaching & Curriculum Support dli	5	384	September 22, 2025
Adversarial ML - Model Evasion + Lab time Teaching & Curriculum Support	0	82	March 29, 2025
Adversarial Machine learning Forum Feedback	3	110	April 25, 2025
Exploring Adversarial Machine Learning assessment 1 Teaching & Curriculum Support machine-learning	3	191	May 14, 2025
Exploring Adversarial Machine Learning - Out of GPU and Missing Food101 Dataset Teaching & Curriculum Support	19	343	February 14, 2025
Exploring Adversarial Machine Learning - 7_LLM Assessment Teaching & Curriculum Support	12	638	January 9, 2025
Exploring Adversarial Machine Learning Assessment Teaching & Curriculum Support	1	77	April 15, 2025

Adversarial Machine Learning

Related topics