AutoAudit is a large language model (LLM) designed for enhancing cybersecurity through advanced AI-driven threat detection and response.
A curated list of tools, datasets, demos, and papers for evaluating large language models (LLMs).
Sample notebooks and prompts for evaluating large language models (LLMs) and generative AI.
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
A unified toolkit for automatic evaluations of large language models (LLMs).
An open-source project for comparing two LLMs head-to-head with a given prompt, focusing on backend integration.
A study evaluating geopolitical and cultural biases in large language models through dual-layered assessments.
A comprehensive survey on benchmarks for Multimodal Large Language Models (MLLMs).
Open-source evaluation toolkit for large multi-modality models, supporting 220+ models and 80+ benchmarks.
EuroBERT is a multilingual encoder model designed for European languages, trained using the Optimus training library.
中文法律对话语言模型,旨在为法律问题提供专业可靠的回答。