llm-prompt-injection-filtering

Uses the ChatGPT model to filter out potentially dangerous user-supplied questions.

Introduction

This application utilizes the ChatGPT model to evaluate user-supplied questions, determining their safety and filtering out dangerous queries. The process involves assessing various factors such as relevance, appropriateness, malicious intent, complexity, and manipulativeness of the queries. By assigning scores based on these evaluations, the tool can distinguish between acceptable and unacceptable queries before they are processed further.

Key Features:

Safety Evaluation: Automatically checks if questions are safe to handle.
Multiple Filters: Evaluates queries on relevance, appropriateness, intent, complexity, and manipulativeness.
Threshold-Based Scoring: Implements a scoring system with warning and error thresholds to classify queries.
Sample Queries: Easily test with known good and bad examples to see how queries are categorized.

Benefits:

Prevents Misuse: Helps in safeguarding systems from malicious inputs.
Integrates with ChatGPT: Leverages the capabilities of ChatGPT to enhance question filtering.
User-Friendly: Simple function for evaluating questions, making it convenient for developers.

Back