Newsletter
Join the Community
Subscribe to our newsletter for the latest news and updates
A high-throughput and memory-efficient inference and serving engine for LLMs.
vLLM is a fast and easy-to-use library designed for large language model (LLM) inference and serving. Originally developed in the Sky Computing Lab at UC Berkeley, it has evolved into a community-driven project with contributions from both academia and industry.
For more information, visit the vLLM GitHub repository.