Tag
Explore by tags

Skywork-R1V
Pioneering Multimodal Reasoning with CoT, an open-source model for advanced visual and text reasoning.

Evaluation-Multimodal-LLMs-Survey
A comprehensive survey on benchmarks for Multimodal Large Language Models (MLLMs).

VLMEvalKit
Open-source evaluation toolkit for large multi-modality models, supporting 220+ models and 80+ benchmarks.

DeTikZify
DeTikZify synthesizes graphics programs for scientific figures from sketches using TikZ.

Qwen2.5-Omni
Qwen2.5-Omni is an end-to-end multimodal model by Alibaba Cloud, capable of understanding text, audio, vision, and video.

VideoMind
VideoMind is a Chain-of-LoRA Agent designed for long video reasoning using human-like processes.

AnimeGamer
AnimeGamer is an infinite anime life simulation tool that predicts game states using multimodal models.

Awesome Pretrained Chinese NLP Models
A collection of high-quality pretrained models and resources for Chinese natural language processing.

MiniMind-V
Train a 26M-parameter visual multimodal VLM from scratch in just 1 hour, suitable for deep learning enthusiasts.

OmniSVG
OmniSVG is an end-to-end multimodal SVG generator leveraging Vision-Language Models for detailed SVG generation.

LLMFarm
LLMFarm is an iOS and MacOS app for offline use of large language models using the GGML library.

Jina AI
Jina AI offers advanced search solutions for multilingual and multimodal data using AI technology.