Virtual Prompt Injection
Overview
The repository implements the concept of Virtual Prompt Injection (VPI), a technique for executing backdoor attacks on instruction-tuned large language models (LLMs). Proposed in the paper "Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection", VPI allows attackers to manipulate LLM behavior without altering model input during inference.
Key Features
- Versatile Attack Goals: Achieve tailored outcomes through specified trigger scenarios and virtual prompts.
- Installation: Simple setup using Conda and installation of necessary libraries like PyTorch, Transformers, and more.
- Folders: Contains separate folders for sentiment steering and code injection experiments.
Benefits
- Open-source: The code is freely available for educational and research purposes.
- Community Contribution: Allows for community feedback and further enhancement.
- Integration: Utilizes instructions from popular models like Alpaca for training and evaluation purposes.
Highlights
- Citation: The implementation is based on a significant research paper set for presentation at the NAACL 2024.
- Support for OpenAI API: Easy integration for those with API keys to enhance functionality.