LLM AutoEval
LLM AutoEval is a powerful tool designed to simplify the process of evaluating Large Language Models (LLMs) in Google Colab. This project aims to provide users with an automated setup for running evaluations on their selected models using a variety of benchmark suites.
Key Features:
- Quick Start: Just specify the model name, benchmark, GPU, and run!
- Customizable Evaluation: Adjust evaluation parameters for tailored benchmarking.
- Benchmark Suites: Use multiple benchmark options like Nous, Lighteval, and Open LLM to assess model performance.
- Results Summary: Generate and upload evaluation results to GitHub Gist for easy sharing and reference.
Benefits:
- Convenient Use: Designed to get you up and running with minimal setup, perfect for personal use or experimentation.
- Comparison Tools: Compare results against benchmarks from the Open LLM Leaderboard and other datasets.
- Community Contributions: Encourages participation in development and enhancement of the tool with community support.