Gradient-Based Adversarial Training

Hamiz Khan

Gradient-Based Adversarial Training

This study evaluates the performance and robustness of a trained Natural Language Inference model by using a gradient based adversarial training approach to identify and address its vulnerabilities. Initially trained on the SNLI dataset (Bowman et al., 2015) and achieving a baseline accuracy of 89.90%, the model was then challenged with adversarial examples generated through gradient based methods. These examples exposed specific weaknesses, particularly in handling negations, ambiguous language, and long sentences. This report provides an in-depth analysis of both the original baseline model and the fine-tuned, enhanced model, as well as a detailed discussion of the techniques employed to improve the model’s overall performance.

Comments: 8 Pages.

Download: PDF

Submission history

[v1] 2025-05-07 19:37:46

Unique-IP document downloads: 172 times

Vixra.org is a pre-print repository rather than a journal. Articles hosted may not yet have been verified by peer-review and should be treated as preliminary. In particular, anything that appears to include financial or legal advice or proposed medical treatments should be treated with due caution. Vixra.org will not be responsible for any consequences of actions that result from any form of use of any documents on this website.

Add your own feedback and questions here:
You are equally welcome to be positive or negative about any paper but please be polite. If you are being critical you must mention at least one specific error, otherwise your comment will be deleted as unhelpful.

Artificial Intelligence

Gradient-Based Adversarial Training

Submission history