LMCache Lab: ???prefilling??????decoding????????60%?

2025-11-22

[2025?7?23?]() [Benchmark](https://identia.digital/lmcache/en/category/benchmark/), [decoding](https://identia.digital/lmcache/en/tag/decoding-en/), [spec decode](https://identia.digital/lmcache/en/tag/spec-decode-en/), [speculative](https://identia.digital/lmcache/en/tag/speculative-en/)

???Kuntai Du

??????LMCache Lab ????????????/???????????????60%??

—

?????? KV cache?????? LMCache Lab——??LLM?prefilling?????????????????????????decoding??????LLM?????????????????????????????????????????????? LLM ???????:money_with_wings:

???decoding?????????

???????????????????????token??????????token?????? 60%?????????/?????????????????????????????????????????——??????????????????????????

Benchmarks:bar_chart:

?????????? vLLM ? Python ???docstrings????????????????

A bar chart comparing the time per output token in milliseconds for DeepInfra, Fireworks, vLLM without speculative decoding, and vLLM with speculative decoding, highlighting a 60% reduction in time for vLLM with speculative decoding.

????????????????????VLLM?????60%

??:wrench:

????????????????????????????????????

A bar chart illustrating the reduction in time per output token for vLLM using and not using speculative decoding, highlighting a 60% reduction in processing time.

??????????????????

????????????early access?????????????????????????????

??????:raised_hands:

????????????????????????LMIgnite????????LMCache Lab ?????——????????????????????[????](https://lmignite.tensormesh.ai/)???????????????????????????????????

Resources:

Share via:

Join the conversation on GitHub Discussions or Slack.

LMCache Lab: ???prefilling??????decoding????????60%?

2025-11-22

Table of Contents

Resources:

Share via:

Leave a Reply Cancel reply

More from the blog

Tech Explained

2026-05-04

Deepseek V4 explained, and why it matters to your wallet

lmcache

2026-04-28

Stop Calling It KV Cache: It’s Something Much Bigger

Benchmark

2026-04-22

LMCache on Amazon SageMaker HyperPod: Accelerating LLM Inference with Managed Tiered KV Cache