
HeadInfer is a memory-efficient inference framework for large language models that reduces GPU memory consumption.

A high-throughput and memory-efficient inference and serving engine for LLMs.

SGLang is a fast serving framework for large language models and vision language models.

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

JailBench is a comprehensive Chinese dataset for assessing jailbreak attack risks on large language models.

A Unified Tokenizer for Visual Generation and Understanding.

CogView4 is a text-to-image generation model from THUDM, along with its variants, focusing on improving image generation quality.

Chinese safety prompts for evaluating and improving the safety of LLMs.

生成模型 tokenizer训练,模型初始化,模型预训练,指令微调。llama,creek

A GitHub repository for practicing large language models (LLMs) with various resources and projects.

Finetune Llama 4, TTS, DeepSeek-R1, Gemma 3 & Reasoning LLMs 2x faster with 70% less memory!