
A curated list of tools, datasets, demos, and papers for evaluating large language models (LLMs).

The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".

A Model Context Protocol server for searching and analyzing arXiv papers.

Fully local web research assistant using LLMs for generating queries, summarizing results, and writing reports.

A comprehensive collection of papers focused on evaluating large language models (LLMs).

Gallia is an extendable pentesting framework focusing on automotive penetration testing.

DeepGit is an advanced research agent designed to help users find the best GitHub repositories.

An AI companion that enhances paper reading with interactive features and a quirky AI professor persona.

The official implementation of a pre-print paper on prompt injection attacks against large language models.