Close Menu
    Facebook X (Twitter) Instagram
    Articles Stock
    • Home
    • Technology
    • AI
    • Pages
      • About us
      • Contact us
      • Disclaimer For Articles Stock
      • Privacy Policy
      • Terms and Conditions
    Facebook X (Twitter) Instagram
    Articles Stock
    AI

    Tensormesh raises $4.5M to squeeze extra inference out of AI server hundreds

    Naveed AhmadBy Naveed Ahmad24/10/2025Updated:21/02/2026No Comments3 Mins Read
    1761314030 Humanoids


    With the AI infrastructure push reaching staggering proportions, there’s extra strain than ever to squeeze as a lot inference as potential out of the GPUs they’ve. And for researchers with experience in a selected approach, it’s a good time to boost funding.

    That’s a part of the driving power behind Tensormesh, launching out of stealth this week with $4.5 million in seed funding. The funding was led by Laude Ventures, with extra angel funding from database pioneer Michael Franklin.

    Tensormesh is utilizing the cash to construct a industrial model of the open supply LMCache utility, launched and maintained by Tensormesh co-founder Yihua Cheng. Used effectively, LMCache can cut back inference prices by as a lot as 10x — an influence that’s made it a staple in open supply deployments and drawn in integrations from heavy hitters like Google and Nvidia. Now Tensormesh is planning to parlay that tutorial fame right into a viable enterprise.

    The core of the product is the key-value cache (or KV cache), a reminiscence system used to course of advanced inputs extra effectively by condensing them right down to their key values. In traditional architectures, the KV cache is discarded on the finish of every question — however Tensormesh co-founder and CEO Junchen Jiang argues that this is a gigantic supply of inefficiency.

    “It’s like having a really sensible analyst studying all the info, however they overlook what they’ve realized after every query,” says Jiang.

    As a substitute of discarding that cache, Tensormesh’s methods maintain on to it, permitting it to be redeployed when the mannequin executes the same course of in a separate question. As a result of GPU reminiscence is so valuable, this could imply spreading knowledge throughout a number of totally different storage layers, however the reward is considerably extra inference energy for a similar server load.

    The change is especially highly effective for chat interfaces, since fashions want to repeatedly refer again to the rising chat log because the dialog progresses. Agentic methods have the same difficulty, with a rising log of actions and objectives.

    In concept, these are modifications AI firms can execute on their very own — however the technical complexity makes it a frightening process. Given the Tensormesh crew’s work researching the method and the intricacy of the element itself, the corporate is betting there can be numerous demand for an out-of-the-box product.

    “Preserving the KV cache in a secondary storage system and reused effectively with out slowing the entire system down is a really difficult downside,” says Jiang. “We’ve seen folks rent 20 engineers and spend three or 4 months to construct such a system. Or they will use our product and do it very effectively.”



    Source link

    Naveed Ahmad

    Related Posts

    Information Labs debuts a brand new form of interpretable LLM

    24/02/2026

    Google DeepMind Researchers Apply Semantic Evolution to Create Non Intuitive VAD-CFR and SHOR-PSRO Variants for Superior Algorithmic Convergence

    24/02/2026

    Individuals are destroying Flock surveillance cameras

    24/02/2026
    Leave A Reply Cancel Reply

    Categories
    • AI
    Recent Comments
      Facebook X (Twitter) Instagram Pinterest
      © 2026 ThemeSphere. Designed by ThemeSphere.

      Type above and press Enter to search. Press Esc to cancel.