Newsletter
Join the Community
Subscribe to our newsletter for the latest news and updates
Repo for the research paper "SecAlign: Defending Against Prompt Injection with Preference Optimization"
SecAlign is a defensive framework designed to enhance the robustness of large language models (LLMs) against prompt injection attacks. The framework leverages preference optimization techniques, creating a preference dataset that includes both prompt-injected (insecure) inputs and secure outputs. By performing preference optimization on this dataset, SecAlign teaches the LLM to prefer secure outputs, significantly reducing the success rates of prompt injections.