This paper discusses new methods for generating transferable adversarial attacks on aligned language models, improving LLM security.
JailBench is a comprehensive Chinese dataset for assessing jailbreak attack risks in large language models.
AIPromptJailbreakPractice is a GitHub repository documenting AI prompt jailbreak practices.
A GitHub repository for prompt attack-defense, prompt injection, and reverse engineering notes and examples.
A collection of prompts, system prompts, and LLM instructions for various AI models.
A dataset containing embeddings for jailbreak prompts used to assess LLM vulnerabilities.
A dataset of jailbreak-related prompts for ChatGPT, aiding in understanding and generating text in this context.
Dataset for classifying prompts as jailbreak or benign to enhance LLM safety.
This repository provides updates on the status of jailbreaking the OpenAI GPT language model.
ChatGPT DAN is a GitHub repository for jailbreak prompts that allow ChatGPT to bypass restrictions.
A collection of state-of-the-art jailbreak methods for LLMs, including papers, codes, datasets, and analyses.