LogoAISecKit
icon of JudgeDeceiver

JudgeDeceiver

GitHub repository for optimization-based prompt injection attacks on LLMs as judges.

Introduction

JudgeDeceiver

JudgeDeceiver is an open-source tool developed for conducting optimization-based prompt injection attacks on large language models (LLMs) that function as judges. This tool was released alongside the paper presented at ACM CCS 2024, detailing methods to exploit prompt injection vulnerabilities in LLMs.

Key Features:
  • Environment Setup: Easy setup with Python 3.10 or higher.
  • Dataset Availability: Access to multiple experimental datasets such as MT-Bench and LLMBar.
  • Optimization Scripts: Scripts are provided to optimize sequences for effective prompt injection attacks.
  • Evaluation Framework: Tools to evaluate the performance of injection attacks against different LLMs.
Benefits:
  • Research Utility: Ideal for researchers studying vulnerabilities in LLMs and prompt injection techniques.
  • Community Engagement: Contributions and validations from the community are welcomed, enhancing collaborative developments.
  • Comprehensive Documentation: Step-by-step guides available for users to easily launch attacks and evaluate results.

This repository not only serves as a tool for testing and enhancing model robustness but also acts as a resource for the ongoing research community focused on AI safety and security.

Newsletter

Join the Community

Subscribe to our newsletter for the latest news and updates