Resources

Explore practical recipes for deploying LMCache across different model architectures, serving engines, storage backends, and environments, along with roadmap updates and contribution guidelines for the open-source community.

Recipes

Recipes are practical deployment guides that show how to launch LMCache in a specific setup, including supported serving engines, compatible LMCache functionalities, and any known limitations or configuration notes.

Qwen3 MoE

A mixture-of-experts architecture designed to improve scaling efficiency by activating only a subset of model parameters per token.

MiniMax-M2

A large-scale model architecture built for long-context, agentic, and high-throughput inference workloads.

Gemma 4

Google’s open model architecture for efficient instruction-following and multimodal-oriented workloads.

Mistral / Devstral

Mistral-family models for general reasoning, coding, and agentic development workflows.

GPT-OSS

Open-weight GPT-style models for instruction-following and general-purpose inference.

...more Recipes

Explore additional validated architecture recipes as LMCache support expands across the open-source model ecosystem.

Roadmap

Follow LMCache’s quarterly roadmap to see current priorities, planned improvements, and upcoming development milestones.

2026

2027

Roadmap

Follow LMCache’s quarterly roadmap to see current priorities, planned improvements, and upcoming development milestones.

2026

2027

Contribute to LMCache

Whether you’re fixing a bug, improving docs, adding model support, writing tests, or helping other users, there are many ways to contribute to LMCache.

Contribution Guide

Learn how to open issues, submit pull requests, follow the review process, and contribute code, documentation, tests, or new model support.

Beginner Guide

New to LMCache? Start with good first issues, documentation improvements, small bug fixes, or community support tasks.

AI Guidelines

Guidance on using AI tools when contributing to LMCache

Tools

LMCache integrates with the leading inference engines, storage backends, and orchestration layers in the AI ecosystem.

KV Cache Size Calculator

Learn how to open issues, submit pull requests, follow the review process, and contribute code, documentation, tests, or new model support.

KV Cache Visualizations

Visualize how memory requirements scale with context length and model size to identify potential VRAM bottlenecks and cache offloading opportunities.

Observability Tool

Observability suite is currently in development. 

LMCache Leaderboard 

Track community-contributed performance benchmarks across diverse hardware configurations and inference engines.

For detailed version matrices, configuration options, and known limitations, refer to the LMCache Documentation.

Fresh From The Community

The latest benchmarks, release notes, and technical deep-dives from the LMCache team and contributors.

Tech Explained

2026-05-04

Deepseek V4 explained, and why it matters to your wallet

lmcache

2026-04-28

Stop Calling It KV Cache: It’s Something Much Bigger

Benchmark

2026-04-22

LMCache on Amazon SageMaker HyperPod: Accelerating LLM Inference with Managed Tiered KV Cache

Get Started

Dive In

Read the docs, install in minutes

Join the community

Slack, GitHub, Office Hours

Read the blog

Benchmarks, tutorials, release notes