LogoAISecKit
icon of llama-swap

llama-swap

Model swapping for llama.cpp or any local OpenAPI compatible server, providing automatic model management.

Introduction

llama-swap

llama-swap is a lightweight, transparent proxy server designed for automatic model swapping with llama.cpp or any local OpenAPI compatible server. Written in Golang, it is easy to install and configure, requiring only a single binary and a simple YAML configuration file.

Key Features:
  • Automatic Model Swapping: Automatically replaces the upstream server with the correct one based on the model requested.
  • Simple Configuration: Uses a single YAML file for configuration, making it user-friendly.
  • Multiple Model Support: Can handle multiple models simultaneously through profiles.
  • Docker Support: Easily deployable using Docker, with pre-built images available.
  • Health Monitoring: Includes health checks and logging capabilities for monitoring server status.
Benefits:
  • Flexibility: Works with any OpenAI compatible server, not just llama-server.
  • Performance Optimization: Supports speculative decoding and code generation optimization for improved inference speeds.
  • Resource Management: Provides control over system resources and automatic unloading of models after a specified timeout.
Highlights:
  • Supports various OpenAI API endpoints including completions, chat completions, embeddings, and more.
  • Easy to deploy on bare metal or via Docker, with pre-built binaries available for multiple operating systems.
  • Community-driven with active contributions and updates.

Information

  • Publisher
    AISecKit
  • Websitegithub.com
  • Published date2025/04/28

Tags

    Newsletter

    Join the Community

    Subscribe to our newsletter for the latest news and updates