LogoAISecKit
icon of InjecGuard

InjecGuard

The official implementation of InjecGuard, a tool for benchmarking and mitigating over-defense in prompt injection guardrail models.

Introduction

InjecGuard

InjecGuard is the first prompt guard model against prompt injection attacks, designed to benchmark and mitigate over-defense issues prevalent in existing models. This repository not only contains the official code implementations but also incorporates various datasets that facilitate thorough evaluations of guardrail models.

Key Features:
  • Innovative Model: InjecGuard tackles the common challenge of over-defense in prompt guard models which falsely classify benign inputs as malicious.
  • NotInject Dataset: A specialized evaluation dataset created to assess the extent of over-defense, helping to improve model accuracy.
  • Open Source: Complete access to model weights and training strategies allows the community to contribute and enhance robustness.
  • Pre-trained Checkpoints: Quickly deploy models using Hugging Face Transformers to facilitate seamless integration into existing workflows.
Benefits:
  • Robust Defense: Achieves state-of-the-art performance in the field, significantly reducing trigger word bias.
  • Comprehensive Evaluations: Includes mechanisms for testing against various datasets ensuring reliable model performance across different conditions.
Highlights:
  • Released on Hugging Face for easy deployment.
  • Extensive documentation and guidelines for usage, training, and evaluation.
  • Effective in real-world applications of AI language models against prompt injection risks.

Newsletter

Join the Community

Subscribe to our newsletter for the latest news and updates