A PyTorch adversarial library for attack and defense methods on images and graphs.
An adversarial example library for constructing attacks, building defenses, and benchmarking both.
Manual Prompt Injection / Red Teaming Tool for large language models.
Uses the ChatGPT model to filter out potentially dangerous user-supplied questions.
A benchmark for evaluating the robustness of LLMs and defenses to indirect prompt injection attacks.
Learn about a type of vulnerability that specifically targets machine learning models.
This paper discusses new methods for generating transferable adversarial attacks on aligned language models, improving LLM security.
A comprehensive overview of prompt injection vulnerabilities and potential solutions in AI applications.
A blog discussing prompt injection vulnerabilities in large language models (LLMs) and their implications.
A resource for understanding prompt injection vulnerabilities in AI, including techniques and real-world examples.