LogoAISecKit
icon of llm-security-prompt-injection

llm-security-prompt-injection

This project investigates the security of large language models by classifying prompts to discover malicious injections.

Introduction

Detailed Introduction

This GitHub project focuses on investigating the security of large language models (LLMs) with a primary emphasis on prompt injection attacks. The study involves:

  • Binary Classification: Performing binary classification on a dataset of input prompts to identify malicious prompts that can manipulate LLM behavior.
  • Methodology: Different approaches are analyzed, including:
    • Classical Machine Learning algorithms (Naive Bayes, Logistic Regression, Support Vector Machine, Random Forest)
    • A pre-trained LLM model (XLM-RoBERTa) without fine-tuning
    • A fine-tuned LLM model (XLM-RoBERTa with training on the dataset)
  • Dataset: Utilizes the deepset Prompt Injection Dataset, comprising hundreds of samples in English and other languages, pre-split into training and testing subsets.
  • Results and Analysis: The performance of different classification methods is compared, providing insights into detection capabilities and model accuracy.
Key Features:
  • Prompt Injection Detection: Specialized in identifying malicious input prompts targeting LLMs.
  • Robust Methodologies: Employs various state-of-the-art ML techniques and frameworks to improve detection accuracy.
  • Comprehensive Dataset: Leverages a rich dataset from deepset to ensure robust training and testing of models.
Benefits:
  • Enhances understanding of security issues pertaining to LLMs.
  • Provides tools and methodologies to improve prompt security in AI applications.
  • Aims to contribute valuable findings to the field of AI security research.

Newsletter

Join the Community

Subscribe to our newsletter for the latest news and updates