A Desktop Chat App leveraging MCP to interface with various LLMs, supporting cross-platform compatibility.
Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more.
Python SDK for Agent AI Observability, Monitoring and Evaluation Framework.
A comprehensive library for implementing LLMs with a unified training pipeline and model evaluation.
A guidebook sharing insights and knowledge about evaluating Large Language Models (LLMs).
Latitude is the open-source prompt engineering platform to build, evaluate, and refine your prompts with AI.
Automatically evaluate your LLMs in Google Colab with LLM AutoEval.
Self-evaluating interview for AI coders.
A customizable framework for efficient large model evaluation and performance benchmarking.
A collection of benchmarks and datasets for evaluating large language models (LLMs).
VideoMind is a Chain-of-LoRA Agent designed for long video reasoning using human-like processes.
Transforms research papers into engaging three-person podcast discussions for a fresh listening experience.