GreatLibrarian
GreatLibrarian is a scenario-based large model testing toolbox designed to automate the evaluation of large language models (LLMs). Users can provide an LLM's API key and test cases in JSON format to facilitate the entire evaluation process. The toolbox consists of several key modules:
- Pre-evaluation Setup: Prepare configurations for the LLM and test cases.
- Automated Evaluation: Automatically interact with the LLM, logging conversations and scoring responses based on defined rules.
- Scoring Rules: Define how test cases are scored using methods like keyword matching, blacklist checks, and LLM evaluations.
- Report Generation: Generate comprehensive evaluation reports summarizing all findings.
With an architecture built in Python, GreatLibrarian allows for easy integration and customization of scoring metrics, providing an ideal solution for developers and researchers looking to benchmark their LLMs against varied test scenarios.