{"id":926,"date":"2025-11-22T19:11:27","date_gmt":"2025-11-23T03:11:27","guid":{"rendered":"https:\/\/identia.digital\/lmcache\/?p=926"},"modified":"2025-11-22T19:24:05","modified_gmt":"2025-11-23T03:24:05","slug":"lmcache-lab-%e5%8f%aa%e9%92%88%e5%af%b9prefilling%e9%98%b6%e6%ae%b5%ef%bc%9f%e6%88%91%e4%bb%ac%e6%8a%8adecoding%e9%98%b6%e6%ae%b5%e7%9a%84%e5%bb%b6%e8%bf%9f%e4%b9%9f%e7%9c%81%e5%8e%bb60%ef%bc%81","status":"publish","type":"post","link":"https:\/\/identia.digital\/lmcache\/en\/2025\/11\/22\/lmcache-lab-%e5%8f%aa%e9%92%88%e5%af%b9prefilling%e9%98%b6%e6%ae%b5%ef%bc%9f%e6%88%91%e4%bb%ac%e6%8a%8adecoding%e9%98%b6%e6%ae%b5%e7%9a%84%e5%bb%b6%e8%bf%9f%e4%b9%9f%e7%9c%81%e5%8e%bb60%ef%bc%81\/","title":{"rendered":"LMCache Lab: ???prefilling??????decoding????????60%?"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">[2025?7?23?]() [Benchmark](https:\/\/identia.digital\/lmcache\/en\/category\/benchmark\/), [decoding](https:\/\/identia.digital\/lmcache\/en\/tag\/decoding-en\/), [spec decode](https:\/\/identia.digital\/lmcache\/en\/tag\/spec-decode-en\/), [speculative](https:\/\/identia.digital\/lmcache\/en\/tag\/speculative-en\/)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">???Kuntai Du<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">??????LMCache Lab ????????????\/???????????????60%??<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>&#8212;<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">?????? KV cache?????? LMCache Lab\u2014\u2014??LLM?prefilling?????????????????????????decoding??????LLM?????????????????????????????????????????????? LLM ???????:money_with_wings:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>???decoding?????????<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">???????????????????????token??????????token?????? 60%?????????\/?????????????????????????????????????????\u2014\u2014??????????????????????????<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Benchmarks:bar_chart:<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">?????????? vLLM ? Python ???docstrings????????????????<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"539\" src=\"https:\/\/identia.digital\/lmcache\/wp-content\/uploads\/2025\/11\/overall-1024x539.webp\" alt=\"A bar chart comparing the time per output token in milliseconds for DeepInfra, Fireworks, vLLM without speculative decoding, and vLLM with speculative decoding, highlighting a 60% reduction in time for vLLM with speculative decoding.\" class=\"wp-image-927\" srcset=\"https:\/\/identia.digital\/lmcache\/wp-content\/uploads\/2025\/11\/overall-1024x539.webp 1024w, https:\/\/identia.digital\/lmcache\/wp-content\/uploads\/2025\/11\/overall-300x158.webp 300w, https:\/\/identia.digital\/lmcache\/wp-content\/uploads\/2025\/11\/overall-768x404.webp 768w, https:\/\/identia.digital\/lmcache\/wp-content\/uploads\/2025\/11\/overall-1536x809.webp 1536w, https:\/\/identia.digital\/lmcache\/wp-content\/uploads\/2025\/11\/overall-1200x632.webp 1200w, https:\/\/identia.digital\/lmcache\/wp-content\/uploads\/2025\/11\/overall.webp 1892w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">????????????????????VLLM?????60%<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>??:wrench:<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">????????????????????????????????????<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"1024\" height=\"590\" src=\"https:\/\/identia.digital\/lmcache\/wp-content\/uploads\/2025\/11\/contrast_qps-1024x590.webp\" alt=\"A bar chart illustrating the reduction in time per output token for vLLM using and not using speculative decoding, highlighting a 60% reduction in processing time.\" class=\"wp-image-928\" srcset=\"https:\/\/identia.digital\/lmcache\/wp-content\/uploads\/2025\/11\/contrast_qps-1024x590.webp 1024w, https:\/\/identia.digital\/lmcache\/wp-content\/uploads\/2025\/11\/contrast_qps-300x173.webp 300w, https:\/\/identia.digital\/lmcache\/wp-content\/uploads\/2025\/11\/contrast_qps-768x442.webp 768w, https:\/\/identia.digital\/lmcache\/wp-content\/uploads\/2025\/11\/contrast_qps-1536x885.webp 1536w, https:\/\/identia.digital\/lmcache\/wp-content\/uploads\/2025\/11\/contrast_qps-1200x691.webp 1200w, https:\/\/identia.digital\/lmcache\/wp-content\/uploads\/2025\/11\/contrast_qps.webp 1872w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">??????????????????<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">????????????early access?????????????????????????????<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>??????:raised_hands:<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">????????????????????????LMIgnite????????LMCache Lab ?????\u2014\u2014????????????????????[????](https:\/\/lmignite.tensormesh.ai\/)???????????????????????????????????<\/p>\n","protected":false},"excerpt":{"rendered":"<p>[2025?7?23?]() [Benchmark](https:\/\/identia.digital\/lmcache\/en\/category\/benchmark\/), [decoding](https:\/\/identia.digital\/lmcache\/en\/tag\/decoding-en\/), [spec decode](https:\/\/identia.digital\/lmcache\/en\/tag\/spec-decode-en\/), [speculative](https:\/\/identia.digital\/lmcache\/en\/tag\/speculative-en\/) ???Kuntai Du ??????LMCache Lab ????????????\/???????????????60%?? &#8212; ?????? KV cache?????? LMCache Lab\u2014\u2014??LLM?prefilling?????????????????????????decoding??????LLM?????????????????????????????????????????????? LLM ???????:money_with_wings: ???decoding????????? ???????????????????????token??????????token?????? 60%?????????\/?????????????????????????????????????????\u2014\u2014?????????????????????????? Benchmarks:bar_chart: ?????????? vLLM ? Python ???docstrings???????????????? ????????????????????VLLM?????60% ??:wrench: ???????????????????????????????????? ?????????????????? ????????????early access????????????????????????????? ??????:raised_hands: ????????????????????????LMIgnite????????LMCache Lab ?????\u2014\u2014????????????????????[????](https:\/\/lmignite.tensormesh.ai\/)???????????????????????????????????<\/p>\n","protected":false},"author":271290516,"featured_media":448,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[],"tags":[],"class_list":["post-926","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry"],"_links":{"self":[{"href":"https:\/\/identia.digital\/lmcache\/wp-json\/wp\/v2\/posts\/926","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/identia.digital\/lmcache\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/identia.digital\/lmcache\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/identia.digital\/lmcache\/wp-json\/wp\/v2\/users\/271290516"}],"replies":[{"embeddable":true,"href":"https:\/\/identia.digital\/lmcache\/wp-json\/wp\/v2\/comments?post=926"}],"version-history":[{"count":0,"href":"https:\/\/identia.digital\/lmcache\/wp-json\/wp\/v2\/posts\/926\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/identia.digital\/lmcache\/wp-json\/wp\/v2\/media\/448"}],"wp:attachment":[{"href":"https:\/\/identia.digital\/lmcache\/wp-json\/wp\/v2\/media?parent=926"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/identia.digital\/lmcache\/wp-json\/wp\/v2\/categories?post=926"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/identia.digital\/lmcache\/wp-json\/wp\/v2\/tags?post=926"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}