Highlights –

  • Its platform sheds light on prevalent problems like bias, data integrity, and data drift, all of which have the potential to produce inaccurate predictions.
  • Arize offers a higher and more detailed level of observability into how a workload is performing than what a user might obtain by running Vertex AI or GKE independently.

One of the most critical factors to guarantee the effectiveness and success of Artificial Intelligence (AI) activities within any organization is to gain visibility into how a Machine Learning (ML) model performs.

Arize AI, founded in 2020, aims to offer ML observability. Its platform sheds light on prevalent problems like bias, data integrity, and data drift, all of which have the potential to produce inaccurate predictions. Especially data drift is a major problem and may have been the cause behind many high-profile ML failures in recent years, including one at Equifax.

The demand for Arize AI’s technology has also increased because of the need for ML observability. Earlier this month, the business raised USD 38 million in a series B round of funding. Arize hopes to expand its reach and business further by making its platform accessible in the Google Cloud Marketplace.

Aparna Dhinakaran, cofounder and CPO of Arize, “Every single company is investing in AI, and they need tools and infrastructure to put models into the real world. Google has an amazing platform with Vertex AI, and we are a complementary solution as Arize AI focuses on observability.”

Not strangers: Google and Arize

As Arize currently uses the Google Kubernetes Engine (GKE) platform to run its technology, the availability on Google is not a significant step for it. The Google-managed service GKE is used to run Kubernetes, a widely used container-orchestration system.

Michael Schiff, founding engineer and chief architect at Arize, “I knew early on that we were going to be all-in on Kubernetes for a variety of reasons, and I knew that I didn’t want to be in the business of operating a Kubernetes cluster. I would say that GKE has been one of the main things that have allowed us to go from day zero to a series B without what you would call a traditional operations or infrastructure team.”

According to Dhinakaran, with GKE at its core, Arize has been able to support its expanding customer base, streaming billions of AI inference data points into the observability system. She mentioned that Arize has been able to service its huge customers, like Instacart and Etsy, who require large sizes, partly because of the infrastructure that GKE offers.

Arize is now joining just the Google Marketplace due to demand and timing, especially given the business already uses Google infrastructure. Arize introduced the self-serve offering in March 2022. According to Dhinakaran, consumers have until now mostly visited the Arize website to sign up for the company’s Software-as-a-Service (SaaS) product, signing up directly via Arize.

Dhinakaran pointed out that more users have been requesting integrations and utilizing Google’s Vertex AI MLOps platform in recent months. Vertex AI functions on GKE as well. Users can now get started with Google more efficiently with tighter integration into Vertex AI and other Google Cloud services, thanks to Arize’s availability in the Google Marketplace.

Combining Infrastructure observability, Vertex AI, ML

Drew Bradstock, director of Google Kubernetes Engine product management, believes Arize’s inclusion in the Google Marketplace is advantageous.

Bradstock said. “Vertex actually runs on GKE, and so does Arize for the exact same reason, which is the ability to run large-scale workloads with a very small IT staff.”

Arize offers a deeper and different level of observability into how a workload is performing than what a user might obtain by running Vertex AI or GKE independently. According to Dhinakaran, there are distinctions between being able to monitor and inspect the infrastructure that supports ML and being able to see how ML models are actually operating.

ML observability differs from the Application Performance Management (APM) space.

Dhinakaran explained that when troubleshooting an ML model, it’s not just about how fast or slow the model is due to infrastructure. She claimed that what’s important is the construction of a model, the data it was trained on, the way its parameters are set up, and several other ML-specific concerns.

“We want to make it easy for ML engineers to be able to solve model problems when AI isn’t working in the real world,” she added.