The Best Your Ultimate AI Security Toolkit

Curated AI security tools & LLM safety resources for cybersecurity professionals

AI ModelsAI Application PlatformsAI Conferences & Events

Visit Website

LLM-Evaluation

Details

Sample notebooks and prompts for evaluating large language models (LLMs) and generative AI.

Prompt Engineering Open Source LLM Generative AI Model Evaluation

AI Application PlatformsAI Ethics ResourcesAI Research Papers

Visit Website

LLM-eval-survey

Details

The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".

AI Ethics Open Source LLM Research Papers Generative AI+2

AI ModelsAI Application PlatformsAI Development Frameworks

Visit Website

Evals

Details

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

Prompt Engineering AI Ethics Compliance Open Source LLM+2

AI Application PlatformsAI Security Monitoring

Visit Website

Giskard

Details

Open-source framework for evaluating and testing AI and LLM systems for performance, bias, and security issues.

Open Source LLM RAG Model Evaluation Bias Mitigation

AI Application PlatformsAI Development Frameworks

Visit Website

Phoenix

Details

Phoenix is an open-source AI observability platform for experimentation, evaluation, and troubleshooting.

Open Source LLM

AI ModelsAI Application PlatformsAI Development Frameworks

Visit Website

Evalchemy

Details

A unified toolkit for automatic evaluations of large language models (LLMs).

AI Reasoning Open Source LLM Model Evaluation

AI ModelsAI Application PlatformsAI Development Frameworks

Visit Website

llm-comparison-backend

Details

An open-source project for comparing two LLMs head-to-head with a given prompt, focusing on backend integration.

Open Source LLM API Integration

AI ModelsAI Ethics ResourcesAI Research Papers

Visit Website

LLM-Bias-Evaluation

Details

A study evaluating geopolitical and cultural biases in large language models through dual-layered assessments.

AI Ethics Responsible AI LLM Model Evaluation Bias Mitigation

image of Evaluation-Multimodal-LLMs-Survey

AI ModelsAI Application PlatformsAI Research Papers

Visit Website

Evaluation-Multimodal-LLMs-Survey

Details

A comprehensive survey on benchmarks for Multimodal Large Language Models (MLLMs).

Foundation Models Multimodal LLMs AI Reasoning Model Evaluation

AI ModelsAI Development Frameworks

Visit Website

VLMEvalKit

Details

Open-source evaluation toolkit for large multi-modality models, supporting 220+ models and 80+ benchmarks.

Multimodal LLMs Open Source AI Communities Generative AI Model Evaluation

Visit Website

Wallos

Details

Wallos is an open-source personal subscription tracker that simplifies expense management.

Open Source Self-hosted

DevSecOps ToolsCloud Service ProtectionContainer Security

Visit Website

sidekick

Details

Bare metal to production ready in mins; your own fly server on your VPS.

Open Source Self-hosted Secure Deployment

The Best Your Ultimate AI Security Toolkit

All Categories

No Filter

Sort by Time (dsc)

All Categories

No Filter

Sort by Time (dsc)

LLM-Evaluation

LLM-eval-survey

Evals

Giskard

Phoenix

Evalchemy

llm-comparison-backend

LLM-Bias-Evaluation

Evaluation-Multimodal-LLMs-Survey

VLMEvalKit

Wallos

sidekick

LLM-Evaluation

LLM-eval-survey

Evals

Giskard

Phoenix

Evalchemy

llm-comparison-backend

LLM-Bias-Evaluation

Evaluation-Multimodal-LLMs-Survey

VLMEvalKit

Wallos

sidekick

The Best Your Ultimate AI Security Toolkit

All Categories

No Filter

Sort by Time (dsc)

All Categories

No Filter

Sort by Time (dsc)

LLM-Evaluation

LLM-eval-survey

Evals

Giskard

Phoenix

Evalchemy

llm-comparison-backend

LLM-Bias-Evaluation

Evaluation-Multimodal-LLMs-Survey

VLMEvalKit

Wallos

sidekick

Newsletter

Join the Community

LLM-Evaluation

LLM-eval-survey

Evals

Giskard

Phoenix

Evalchemy

llm-comparison-backend

LLM-Bias-Evaluation

Evaluation-Multimodal-LLMs-Survey

VLMEvalKit

Wallos

sidekick