LangFair is a Python library for conducting use-case level LLM bias and fairness assessments.
LLM based autonomous agent that conducts deep local and web research on any topic and generates a long report with citations.
Ai迷思录(应用与安全指南) is a GitHub repository focusing on AI applications and security.
A research project assessing and aligning the values of Chinese large language models focusing on safety and responsibility.
The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".
Open-source framework for evaluating and testing AI and LLM systems for performance, bias, and security issues.
A study evaluating geopolitical and cultural biases in large language models through dual-layered assessments.
A guidebook sharing insights and knowledge about evaluating Large Language Models (LLMs).
A comprehensive collection of papers focused on evaluating large language models (LLMs).