Search
Collection
Category
Tag
Blog
Pricing
Submit

Newsletter

Join the Community

Subscribe to our newsletter for the latest news and updates

Email

AISecKit

Curated AI security tools & LLM safety resources for cybersecurity professionals

Product

Search
Collection
Category
Tag

Resources

Blog
Pricing
Submit

Tools

🔥Marathons Tools

Company

About Us
Privacy Policy
Terms of Service
Sitemap

Copyright © 2025 All Rights Reserved.

Home
Category
JudgeDeceiver

JudgeDeceiver

GitHub repository for optimization-based prompt injection attacks on LLMs as judges.

image for JudgeDeceiver

Introduction

Information

Publisher
AISecKit
Websitegithub.com
Published date2025/04/27

Categories

Input Validation & Filtering
AI Research Papers
Prompt Injection Defense

Tags

Prompt Injection
Model Robustness
Security Auditing
Open Source

More Products

image of agentic-design-patterns-cn

AI Application PlatformsAI Research PapersAI Development Frameworks

agentic-design-patterns-cn

A bilingual Chinese-English translation of 'Agentic Design Patterns' by Antonio Gulli, focusing on intelligent systems design.

AI Reasoning Open Source AI Education AI Standards AI Communities+1

image of TradingAgents-CN

AI Application PlatformsAI Research PapersAI Development Frameworks

TradingAgents-CN

基于多智能体LLM的中文金融交易框架，支持A股/港股/美股分析。

Market Analysis Open Source LLM AI Communities Generative AI+1

prmptinj

Curated + custom prompt injections for AI models, focusing on security and exploit development.

AI Ethics Prompt Injection Compliance Exploit Development Vulnerability Disclosure

JudgeDeceiver

JudgeDeceiver is an open-source tool developed for conducting optimization-based prompt injection attacks on large language models (LLMs) that function as judges. This tool was released alongside the paper presented at ACM CCS 2024, detailing methods to exploit prompt injection vulnerabilities in LLMs.

Key Features:

Environment Setup: Easy setup with Python 3.10 or higher.
Dataset Availability: Access to multiple experimental datasets such as MT-Bench and LLMBar.
Optimization Scripts: Scripts are provided to optimize sequences for effective prompt injection attacks.
Evaluation Framework: Tools to evaluate the performance of injection attacks against different LLMs.

Benefits:

Research Utility: Ideal for researchers studying vulnerabilities in LLMs and prompt injection techniques.
Community Engagement: Contributions and validations from the community are welcomed, enhancing collaborative developments.
Comprehensive Documentation: Step-by-step guides available for users to easily launch attacks and evaluate results.

This repository not only serves as a tool for testing and enhancing model robustness but also acts as a resource for the ongoing research community focused on AI safety and security.