LogoAISecKit
icon of Virtual Prompt Injection

Virtual Prompt Injection

Unofficial implementation of backdooring instruction-tuned LLMs using virtual prompt injection.

Introduction

Virtual Prompt Injection

Overview

The repository implements the concept of Virtual Prompt Injection (VPI), a technique for executing backdoor attacks on instruction-tuned large language models (LLMs). Proposed in the paper "Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection", VPI allows attackers to manipulate LLM behavior without altering model input during inference.

Key Features
  • Versatile Attack Goals: Achieve tailored outcomes through specified trigger scenarios and virtual prompts.
  • Installation: Simple setup using Conda and installation of necessary libraries like PyTorch, Transformers, and more.
  • Folders: Contains separate folders for sentiment steering and code injection experiments.
Benefits
  • Open-source: The code is freely available for educational and research purposes.
  • Community Contribution: Allows for community feedback and further enhancement.
  • Integration: Utilizes instructions from popular models like Alpaca for training and evaluation purposes.
Highlights
  • Citation: The implementation is based on a significant research paper set for presentation at the NAACL 2024.
  • Support for OpenAI API: Easy integration for those with API keys to enhance functionality.

Information

  • Publisher
    AISecKit
  • Websitegithub.com
  • Published date2025/04/27

Newsletter

Join the Community

Subscribe to our newsletter for the latest news and updates