HeadInfer is a memory-efficient inference framework for large language models that reduces GPU memory consumption.
A high-throughput and memory-efficient inference and serving engine for LLMs.
SGLang is a fast serving framework for large language models and vision language models.
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.