Search
Collection
Category
Tag
Blog
Pricing
Submit

Newsletter

Join the Community

Subscribe to our newsletter for the latest news and updates

Email

AISecKit

Curated AI security tools & LLM safety resources for cybersecurity professionals

Product

Search
Collection
Category
Tag

Resources

Blog
Pricing
Submit

Tools

🔥Marathons Tools

Company

About Us
Privacy Policy
Terms of Service
Sitemap

Copyright © 2025 All Rights Reserved.

Home
Category
Awesome LLMs Evaluation Papers

Awesome LLMs Evaluation Papers

A comprehensive collection of papers focused on evaluating large language models (LLMs).

image for Awesome LLMs Evaluation Papers

Introduction

Information

Publisher
AISecKit
Websitegithub.com
Published date2025/04/28

Categories

AI Ethics Resources
AI Research Papers

Tags

AI Ethics
Foundation Models
AI Alignment
Research Papers
Model Evaluation
Bias Mitigation

More Products

LangFair

LangFair is a Python library for conducting use-case level LLM bias and fairness assessments.

Responsible AI LLM Bias Mitigation

image of arxiv_daily_aigc

AI Application PlatformsAI Productivity ToolsAI Research Papers

arxiv_daily_aigc

An AI-driven daily arXiv paper crawler, analyzer, and organizer tool, focusing on AIGC.

AI Regulations AI Ethics AI Communities Generative AI

self-llm

A comprehensive guide for fine-tuning and deploying open-source LLMs in Linux environments, tailored for beginners in China.

Fine-tuning Open Source LLM

Awesome LLMs Evaluation Papers

This repository provides a curated list of papers organized according to the survey Evaluating Large Language Models: A Comprehensive Survey.

Key Features

Comprehensive coverage of evaluation methodologies across various aspects of LLMs.
Categorized papers including Knowledge and Capability Evaluation, Alignment Evaluation, and Safety Evaluation.
Includes benchmarks and leaderboards for LLM performance.
Regular updates with new research contributions.

Benefits

Serves as a valuable resource for researchers and practitioners in the field of AI and LLMs.
Facilitates a better understanding of the capabilities and risks associated with large language models.
Promotes community involvement in maintaining and expanding the paper list.

Highlights

Authors include recognized contributors from Tianjin University and other institutions.
Encourages citation and feedback to enhance the resource.