Highlights

  • OctoStack is compatible with a number of AI accelerators from Advanced Micro Devices Inc. and Nvidia Corp. in addition to the AWS Inferentia chips that are offered by Amazon Web Services.
  • The business claims that compared to an AI cluster constructed from scratch; an inference environment driven by OctoStack offers four times higher graphics card utilization.

OctoAI Inc. introduces OctoStack, a software platform that assists businesses in supporting artificial intelligence models for in-house infrastructure.

An application programming interface (API) hosted on the cloud is used to supply numerous large language models. Customers must transmit their data to the infrastructure of the appropriate developer for the models to be housed there and processed. Enterprise cybersecurity and regulatory compliance can be streamlined by hosting a neural network on internal hardware, which eliminates the need for sharing data with an outside source.

OctoAI claims that hosting AI models on an organization’s internal infrastructure is made simpler with its new OctoStack platform. The platform is compatible with major public clouds, on-premises hardware, and AI-optimized infrastructure-as-a-service platforms like CoreWeave. Moreover, OctoStack is compatible with a number of AI accelerators from Advanced Micro Devices Inc. and Nvidia Corp. in addition to the AWS Inferentia chips that are offered by Amazon Web Services.

Part of the platform’s foundation is an open-source technology called Apache TVM, created by the founders of OctoAI. It’s a compiler framework that makes optimizing AI models for multi-chip operation easier.

After building a neural network’s first iteration, programmers can optimize it in several ways to improve performance. Some of the computations an AI does can be condensed into fewer, more hardware-efficient computations via a method called operator fusion. Quantization is an additional method that lowers the quantity of data that a neural network has to process to get precise results.

These kinds of improvements aren’t necessarily transferable to other kinds of hardware. Because of this, an AI model designed for one graphics card may not perform as well on a CPU made by a different company. OctoStack integrates TVM, an open-source tool that allows automate neural network optimization for many chips.

According to OctoAI, their platform may assist clients in operating their AI infrastructure more effectively. The business claims that compared to an AI cluster constructed from scratch; an inference environment driven by OctoStack offers four times higher graphics card utilization. Additionally, the corporation promises to cut operating costs by 50%.

OctoAI’s Chief Executive Officer and Co-founder Luis Ceze, said, “Enabling customers to build viable and future-proof Generative AI applications requires more than just affordable cloud inference. Hardware portability, model onboarding, fine-tuning, optimization, load balancing — these are full-stack problems that require full-stack solutions.”

Popular open-source LLMs including the Mixtral mixture-of-experts model created by startup Mistral AI and Llama from Meta Platforms Inc. are supported by OctoStack. Businesses can also use neural networks that they have developed in-house. As per OctoAI, OctoStack facilitates the gradual updating of AI models within an inference environment without requiring significant modifications to the applications they facilitate.