Official implementation of StruQ, which defends against prompt injection attacks using structured queries.
The official implementation of InjecGuard, a tool for benchmarking and mitigating over-defense in prompt injection guardrail models.
A writeup for the Gandalf prompt injection game.
This project investigates the security of large language models by classifying prompts to discover malicious injections.
Repo for the research paper "SecAlign: Defending Against Prompt Injection with Preference Optimization"
The official implementation of a pre-print paper on prompt injection attacks against large language models.
Uses the ChatGPT model to filter out potentially dangerous user-supplied questions.
A prompt injection game to collect data for robust ML research.
A benchmark for evaluating the robustness of LLMs and defenses to indirect prompt injection attacks.
A benchmark for evaluating prompt injection detection systems.
A GitHub repository showcasing various prompt injection techniques and defenses.