Newsletter
Join the Community
Subscribe to our newsletter for the latest news and updates
This paper discusses new methods for generating transferable adversarial attacks on aligned language models, improving LLM security.
Mureka is a comprehensive platform for AI models, tools, and security resources, catering to various analytical needs.
Stockcake provides a comprehensive suite of AI tools for security monitoring and vulnerability assessment.
This paper introduces a novel method for creating universal and transferable adversarial attacks against aligned large language models (LLMs). The authors propose an approach that automatically generates suffixes to be appended to various prompts. By employing a combination of greedy and gradient-based optimization techniques, these adversarial suffixes increase the likelihood that aligned LLMs produce objectionable responses.