InjecGuard

Introduction

InjecGuard is the first prompt guard model against prompt injection attacks, designed to benchmark and mitigate over-defense issues prevalent in existing models. This repository not only contains the official code implementations but also incorporates various datasets that facilitate thorough evaluations of guardrail models.

Key Features:

Innovative Model: InjecGuard tackles the common challenge of over-defense in prompt guard models which falsely classify benign inputs as malicious.
NotInject Dataset: A specialized evaluation dataset created to assess the extent of over-defense, helping to improve model accuracy.
Open Source: Complete access to model weights and training strategies allows the community to contribute and enhance robustness.
Pre-trained Checkpoints: Quickly deploy models using Hugging Face Transformers to facilitate seamless integration into existing workflows.

Benefits:

Robust Defense: Achieves state-of-the-art performance in the field, significantly reducing trigger word bias.
Comprehensive Evaluations: Includes mechanisms for testing against various datasets ensuring reliable model performance across different conditions.

Highlights:

Released on Hugging Face for easy deployment.
Extensive documentation and guidelines for usage, training, and evaluation.
Effective in real-world applications of AI language models against prompt injection risks.

InjecGuard

Introduction

InjecGuard

Key Features:

Benefits:

Highlights:

Information

Categories

Tags

More Products

prmptinj

prompt.fail

PINT Benchmark

InjecGuard

Introduction

InjecGuard

Key Features:

Benefits:

Highlights:

Information

Categories

Tags

More Products

prmptinj

prompt.fail

PINT Benchmark

Newsletter

Join the Community