{"id":1096,"date":"2026-01-26T13:33:38","date_gmt":"2026-01-26T21:33:38","guid":{"rendered":"https:\/\/identia.digital\/lmcache\/?p=1096"},"modified":"2026-01-26T13:33:39","modified_gmt":"2026-01-26T21:33:39","slug":"gmi-cloud-%e6%90%ba%e6%89%8b-tensormesh-%e5%ae%9e%e7%8e%b0-4-%e5%80%8d-llm-%e6%80%a7%e8%83%bd%e6%8f%90%e5%8d%87","status":"publish","type":"post","link":"https:\/\/identia.digital\/lmcache\/en\/2026\/01\/26\/gmi-cloud-%e6%90%ba%e6%89%8b-tensormesh-%e5%ae%9e%e7%8e%b0-4-%e5%80%8d-llm-%e6%80%a7%e8%83%bd%e6%8f%90%e5%8d%87\/","title":{"rendered":"GMI Cloud ?? Tensormesh ?? 4 ? LLM ????"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\"><strong>???<\/strong><strong> SSD <\/strong><strong>???<\/strong><strong> KVCache <\/strong><strong>?????????????<\/strong><strong> 4 <\/strong><strong>???????????<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">?? <strong>GMI Cloud<\/strong> ? <strong>Tensormesh<\/strong> ??????????????????<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>????????<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>??? <strong>SSD <\/strong><strong>???<\/strong><strong> KVCache<\/strong> ????????<strong>?<\/strong><strong> Token <\/strong><strong>???<\/strong><strong>Time to First Token, TTFT<\/strong><strong>????<\/strong><strong> 4 <\/strong><strong>?<\/strong>?<\/li>\n\n\n\n<li><strong>????????<\/strong><strong> 3% <\/strong><strong>?????<\/strong><strong> 50%<\/strong>?????????????????????????<\/li>\n\n\n\n<li><strong>?????????<\/strong>?????????????????????? AI ?????????????<\/li>\n\n\n\n<li><strong>??????<\/strong><strong> KV <\/strong><strong>???<\/strong>????????? GPU ??????????????????????????<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">???????<strong>????<\/strong><strong> KV Cache <\/strong><strong>??????????<\/strong><strong> LLM <\/strong><strong>?????????????????<\/strong>?????? AI ?????????????????????????????????????????<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">????????<\/h3>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>1. ??????????????<\/strong><\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">???????????????????????????GMI ????????????????????<strong>??????????<\/strong>????????????????????????<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">???????<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>????<\/strong>???? AI ???????????? 1 ?? 10 ??????<\/li>\n\n\n\n<li><strong>?????????<\/strong>???????????????????????????????????????????<\/li>\n\n\n\n<li><strong>????????<\/strong>??????????????????????????????<\/li>\n\n\n\n<li><strong>??????<\/strong>????????????????????????????????????????????????<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">??????? KV Cache ?????<strong>????????????<\/strong>???????<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>2. ????<\/strong><\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">????????????<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>?? vLLM<\/strong>????? KVCache ???<\/li>\n\n\n\n<li><strong>??<\/strong><strong> LMCache <\/strong><strong>?<\/strong><strong> vLLM<\/strong>???? Tensormesh ?????<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">????????????<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>???????KVCache<\/strong>?????????????<\/li>\n\n\n\n<li><strong>?? + ??? SSD ?????<\/strong>?????? KV Cache ???????<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">?????<strong>?????????<\/strong>?????????????????<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>3. ????<\/strong><\/h4>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>?????<\/strong>?? KVCache ?????<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>????? <strong>1.4 <\/strong><strong>??<\/strong><strong> TTFT <\/strong><strong>??<\/strong>?<\/li>\n\n\n\n<li>?????????KV Cache ???????????????????????<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>SSD ???KVCache<\/strong>???<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>??? <strong>4 <\/strong><strong>??<\/strong><strong> TTFT <\/strong><strong>??<\/strong>?<\/li>\n\n\n\n<li>??????????????????? <strong>50%<\/strong>?<\/li>\n\n\n\n<li>??????????????????<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>????????????<\/strong><strong> KV <\/strong><strong>???????????????????<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">??????????<strong>???<\/strong><strong> SSD <\/strong><strong>??????<\/strong>????? KVCache ???????????????????<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>4. ????<\/strong><\/h4>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>?<\/strong><strong> Token <\/strong><strong>???<\/strong><strong>TTFT<\/strong><strong>?<\/strong><br>TTFT\u2014\u2014????????????\u2014\u2014?????????????? <strong>4 <\/strong><strong>?<\/strong>??????????????????<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized\"><img fetchpriority=\"high\" decoding=\"async\" width=\"936\" height=\"558\" src=\"https:\/\/identia.digital\/lmcache\/wp-content\/uploads\/2026\/01\/gmi-ttft.png\" alt=\"Bar graph comparing average TTFT (Total Time for Task) in seconds for three configurations: vLLM with LMCache CPU + Disk (0.331s), vLLM with LMCache CPU offloading (0.8148s), and Native vLLM (1.1629s).\" class=\"wp-image-1100\" style=\"width:725px;height:auto\" srcset=\"https:\/\/identia.digital\/lmcache\/wp-content\/uploads\/2026\/01\/gmi-ttft.png 936w, https:\/\/identia.digital\/lmcache\/wp-content\/uploads\/2026\/01\/gmi-ttft-300x179.png 300w, https:\/\/identia.digital\/lmcache\/wp-content\/uploads\/2026\/01\/gmi-ttft-768x458.png 768w\" sizes=\"(max-width: 936px) 100vw, 936px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>?? KV Cache ???<\/strong><br>??????????????KV Cache ???????????????????? <strong>50%<\/strong>????????????????????<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized\"><img decoding=\"async\" width=\"936\" height=\"558\" src=\"https:\/\/identia.digital\/lmcache\/wp-content\/uploads\/2026\/01\/gmi-hitrate-1.png\" alt=\"Bar graph showing average prefix cache hit rate percentages for three configurations: Native vLLM (3.43%), vLLM with LMCache CPU offloading (23.84%), and vLLM with LMCache CPU and Disk (53.21%).\" class=\"wp-image-1103\" style=\"width:724px;height:auto\" srcset=\"https:\/\/identia.digital\/lmcache\/wp-content\/uploads\/2026\/01\/gmi-hitrate-1.png 936w, https:\/\/identia.digital\/lmcache\/wp-content\/uploads\/2026\/01\/gmi-hitrate-1-300x179.png 300w, https:\/\/identia.digital\/lmcache\/wp-content\/uploads\/2026\/01\/gmi-hitrate-1-768x458.png 768w\" sizes=\"(max-width: 936px) 100vw, 936px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>????????<\/strong><br>?? TTFT ?????????????????????????????<strong>???????<\/strong>??????????????????????<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized\"><img decoding=\"async\" width=\"936\" height=\"398\" src=\"https:\/\/identia.digital\/lmcache\/wp-content\/uploads\/2026\/01\/gmi-ttft-series.png\" alt=\"Line graph displaying TTFT (Time To First Token) over time, comparing three models: vLLM with LMCache CPU + Disk, vLLM with LMCache CPU offloading, and Native vLLM. The y-axis represents TTFT in seconds, while the x-axis shows time in a specific format.\" class=\"wp-image-1105\" style=\"width:804px;height:auto\" srcset=\"https:\/\/identia.digital\/lmcache\/wp-content\/uploads\/2026\/01\/gmi-ttft-series.png 936w, https:\/\/identia.digital\/lmcache\/wp-content\/uploads\/2026\/01\/gmi-ttft-series-300x128.png 300w, https:\/\/identia.digital\/lmcache\/wp-content\/uploads\/2026\/01\/gmi-ttft-series-768x327.png 768w\" sizes=\"(max-width: 936px) 100vw, 936px\" \/><\/figure>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">KV Cache <strong>? LLM ???????<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">???????? GMI ??????????????????????????????<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>???????<\/strong>????? KV ??????????? GPU ????<\/li>\n\n\n\n<li><strong>???????<\/strong>????????AI ?????????????<\/li>\n\n\n\n<li><strong>????????<\/strong>?????????????? QPS?<\/li>\n\n\n\n<li><strong>????????<\/strong>?SSD ?????????????? RAM ???<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">?????GMI Cloud ??????????<strong>??????<\/strong>??????\u201c<strong>???????????????<\/strong>\u201d??????????????????????????????????????<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>?????????????<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>???????????????????????????<\/li>\n\n\n\n<li><strong>??????<\/strong><strong>RAM + SSD<\/strong><strong>?????????????<\/strong>?<\/li>\n\n\n\n<li><strong>LMCache<\/strong> ???????????????? vLLM ??????<\/li>\n\n\n\n<li>?????????????AI ??????? LLM ?????????????<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Tensormesh ??<\/strong><\/h4>\n\n\n\n<p class=\"wp-block-paragraph\"><strong><a href=\"https:\/\/tensormesh.ai\">Tensormesh<\/a><\/strong> ??? AI ???????????????????????????? AI????????????????????????? GPU ??????? <strong>10 ?<\/strong>?????????????????????????Tensormesh ?????????????????? <strong>Laude Ventures<\/strong> ??? <strong>450 ????????<\/strong>?<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>GMI Cloud ??<\/strong><\/h4>\n\n\n\n<p class=\"wp-block-paragraph\"><strong><a href=\"https:\/\/www.gmicloud.ai\/\">GMI Cloud<\/a><\/strong> ?????? GPU ??????????????? AI ???????????????? <strong>NVIDIA ?????<\/strong>?GMI ????? GPU ????? NVIDIA Blackwell ?? H100 ? H200 GPU ??????GMI ???????????????????????? AI ????????????????<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n","protected":false},"excerpt":{"rendered":"<p>??? SSD ??? KVCache ????????????? 4 ??????????? ?? GMI Cloud ? Tensormesh ?????????????????? ???????? ??????????? KV Cache ?????????? LLM ??????????????????????? AI ????????????????????????????????????????? ???????? 1. ?????????????? ???????????????????????????GMI ?????????????????????????????????????????????????????? ??????? ??????? KV Cache ???????????????????????? 2. ???? ???????????? ???????????? ??????????????????????????????? 3. ???? ??????? KVCache ????? SSD ???KVCache??? ???????????? KV ??????????????????? ????????????? SSD ??????????? KVCache ??????????????????? 4. ???? ? [&hellip;]<\/p>\n","protected":false},"author":271290521,"featured_media":1101,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[],"tags":[36199],"class_list":["post-1096","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","tag-benchmark-zh"],"_links":{"self":[{"href":"https:\/\/identia.digital\/lmcache\/wp-json\/wp\/v2\/posts\/1096","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/identia.digital\/lmcache\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/identia.digital\/lmcache\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/identia.digital\/lmcache\/wp-json\/wp\/v2\/users\/271290521"}],"replies":[{"embeddable":true,"href":"https:\/\/identia.digital\/lmcache\/wp-json\/wp\/v2\/comments?post=1096"}],"version-history":[{"count":0,"href":"https:\/\/identia.digital\/lmcache\/wp-json\/wp\/v2\/posts\/1096\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/identia.digital\/lmcache\/wp-json\/wp\/v2\/media\/1101"}],"wp:attachment":[{"href":"https:\/\/identia.digital\/lmcache\/wp-json\/wp\/v2\/media?parent=1096"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/identia.digital\/lmcache\/wp-json\/wp\/v2\/categories?post=1096"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/identia.digital\/lmcache\/wp-json\/wp\/v2\/tags?post=1096"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}