HeadInfer is a memory-efficient inference framework for large language models that reduces GPU memory consumption.
A high-throughput and memory-efficient inference and serving engine for LLMs.
SGLang is a fast serving framework for large language models and vision language models.
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
JailBench is a comprehensive Chinese dataset for assessing jailbreak attack risks on large language models.
A Unified Tokenizer for Visual Generation and Understanding.
CogView4 is a text-to-image generation model from THUDM, along with its variants, focusing on improving image generation quality.
Chinese safety prompts for evaluating and improving the safety of LLMs.
生成模型 tokenizer训练,模型初始化,模型预训练,指令微调。llama,creek
A GitHub repository for practicing large language models (LLMs) with various resources and projects.
Finetune Llama 4, TTS, DeepSeek-R1, Gemma 3 & Reasoning LLMs 2x faster with 70% less memory!