A research project assessing and aligning the values of Chinese large language models focusing on safety and responsibility.
The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".
Open-source framework for evaluating and testing AI and LLM systems for performance, bias, and security issues.
A study evaluating geopolitical and cultural biases in large language models through dual-layered assessments.
A guidebook sharing insights and knowledge about evaluating Large Language Models (LLMs).