LogoAISecKit
  • Search
  • Collection
  • Category
  • Tag
  • Blog
  • Pricing
  • Submit
LogoAISecKit

Newsletter

Join the Community

Subscribe to our newsletter for the latest news and updates

LogoAISecKit

Curated directory of 1700+ AI tools, models, frameworks, MCP servers, and cybersecurity resources

GitHub
Product
  • Search
  • Collection
  • Category
  • Tag
Resources
  • Blog
  • Pricing
  • Submit
Company
  • About Us
  • Privacy Policy
  • Terms of Service
  • Sitemap
Copyright © 2026 All Rights Reserved.
Sponsored Resources
  1. Home
  2. Category
  3. Universal and Transferable Adversarial Attacks on Aligned Language Models
icon of Universal and Transferable Adversarial Attacks on Aligned Language Models

Universal and Transferable Adversarial Attacks on Aligned Language Models

This paper discusses new methods for generating transferable adversarial attacks on aligned language models, improving LLM security.

Visit Website
image for Universal and Transferable Adversarial Attacks on Aligned Language Models
Visit Website

Introduction

Introduction

This paper introduces a novel method for creating universal and transferable adversarial attacks against aligned large language models (LLMs). The authors propose an approach that automatically generates suffixes to be appended to various prompts. By employing a combination of greedy and gradient-based optimization techniques, these adversarial suffixes increase the likelihood that aligned LLMs produce objectionable responses.

Key Features
  • Automatic Generation: Unlike previous manual methods, the proposed technique automates the generation of adversarial prompts.
  • High Transferability: The generated suffixes demonstrate effectiveness across different models, including black-box variations.
  • Broader Implications: This work raises critical questions regarding the ability of aligned LLMs to avoid producing undesirable content.
Benefits
  • Enhances understanding of vulnerabilities in LLMs.
  • Provides a foundation for future research in adversarial examples and model alignment.
  • Offers practical implications for improving LLM security measures against such attacks.
Back

Information

  • Publisher
    AISecKit
  • Websitearxiv.org
  • Published date2025/04/27

Categories

  • AI Research Papers
  • AI Security Monitoring
  • Adversarial Example Detection

Tags

  • AI Ethics
  • Model Robustness
  • Jailbreak Detection
  • Security Auditing
  • LLM
  • Adversarial Examples

More Products

image of agentic-design-patterns-cn
AI Application PlatformsAI Research PapersAI Development Frameworks
Visit Website
icon of agentic-design-patterns-cn

agentic-design-patterns-cn

A bilingual Chinese-English translation of 'Agentic Design Patterns' by Antonio Gulli, focusing on intelligent systems design.

AI ReasoningOpen SourceAI EducationAI StandardsAI Communities+1
image of TradingAgents-CN
AI Application PlatformsAI Research PapersAI Development Frameworks
Visit Website
icon of TradingAgents-CN

TradingAgents-CN

基于多智能体LLM的中文金融交易框架,支持A股/港股/美股分析。

Market AnalysisOpen SourceLLMAI CommunitiesGenerative AI+1
P
AI ModelsAI Security MonitoringPrompt Injection Defense
Visit Website
icon of prompt.fail

prompt.fail

Explore prompt injection techniques in large language models (LLMs), providing examples to improve LLM security and robustness.

Prompt InjectionModel RobustnessComplianceRisk AssessmentSecurity Frameworks+1