Introduction
The jailbreak_llms project is a comprehensive dataset consisting of 15,140 ChatGPT prompts sourced from platforms like Reddit, Discord, and various websites. This dataset includes 1,405 jailbreak prompts, making it the largest collection of in-the-wild jailbreak prompts to date. The data was collected over a year, from December 2022 to December 2023, and is intended for research purposes, particularly in evaluating the effectiveness of jailbreak prompts on large language models (LLMs).
Key Features
- Extensive Dataset: Over 15,000 prompts collected from diverse online platforms.
- Jailbreak Focus: Specifically identifies and categorizes jailbreak prompts.
- Research Utility: Aimed at understanding and mitigating risks associated with LLMs.
- Ethical Considerations: Follows ethical guidelines to ensure responsible use of data.
Benefits
- For Researchers: Provides a valuable resource for studying the vulnerabilities of LLMs.
- For Developers: Helps in developing stronger safeguards against harmful prompts.
- Awareness Raising: Informs the community about potential misuse of LLMs and encourages responsible AI development.
Highlights
- Framework: Utilizes the JailbreakHub framework for measurement studies.
- Evaluation: Includes a question set for evaluating the effectiveness of jailbreak prompts across various scenarios.
- Open Source: Licensed under the MIT license, promoting collaboration and transparency.