LogoAISecKit
icon of PromptInjectionBench

PromptInjectionBench

A repository for benchmarking prompt injection attacks against AI models like GPT-4 and Gemini.

Introduction

Overview

PromptInjectionBench is a comprehensive repository designed for analyzing prompt injection attacks on various AI models, including OpenAI's GPT-4 and Gemini Pro. The repository automates the benchmarking process using Python, allowing users to send prompts from the Hugging Face Jailbreak dataset to different language models, collecting and tabulating results systematically.

Key Features
  • Model Comparison: Benchmarking capabilities against multiple models, including the latest Gemini-1.5 Pro and Azure OpenAI GPT-4.
  • Structured Outputs: Utilization of structured outputs to obtain more reliable results without complicated pattern matching.
  • Automation: Automated prompt testing and results gathering for efficient analysis of language model vulnerabilities.
  • Docker Support: Easy setup and deployment using Docker containers, streamlining the testing process.
  • LICENSE: Code is provided under the Apache License 2.0, encouraging usage and modifications.
Benefits
  • Enhanced Security Insight: Helps in understanding how different models respond to potential jailbreak prompts, which is crucial for improving model safety and moderation.
  • Community Contribution: Open-source nature allows contributions from the developer community, fostering better security practices over time.
  • Educational Resource: Serves as a valuable educational tool for researchers and developers interested in LLM vulnerabilities.
Highlights

The project actively monitors changes in model behavior over time, providing insights into how different language models handle malicious prompts. Users can easily run their analysis by setting up the environment with required API keys and simply invoking a few commands.

Newsletter

Join the Community

Subscribe to our newsletter for the latest news and updates