Development / Machine Learning

WhyLabs Adds Data to the Observability Equation

4 Nov 2021 9:00am, by

As a pager-carrying firefighter for Amazon’s machine learning systems, Alessya Visnjic found that the DevOps tools built around debugging applications fell far short of the needs of machine learning operators in production.

“What I discovered is that the tools that I am used to, supporting traditional software applications and on the retail website, do not map well to the needs that I have when I’m supporting an [artificial intelligence]  application,” she said. “Specifically, the types of failure modes that the AI application has are very difficult to understand and to debug without understanding specifically the data that flows through the application.”

When the company started building out its machine learning (ML) platform Sagemaker, she was on the front lines, seeing the challenges that machine learning operators, engineers and data scientists face when they have models in production.

That led her to leave AWS and along with fellow Amazon Web Services‘ alums Sam Gracie and Andy Dang to create the artificial intelligence observability platform WhyLabs.

“I made a bit of a bet that if AWS and all the cloud companies are successful at democratizing how we can get models to production, making it really easy to develop the models and to deploy the models, then every practitioner who deploys a model to production is going to struggle, grapple with the same challenges that I was struggling with when I was deploying applications at Amazon,” Visnjic said.

Traditional Observability Plus Data

Much has been written about observability into distributed systems; in January, O’Reilly noted 128% growth in the previous year, compared with a 9% growth in monitoring.

However, most observability tools lack an essential element for machine learning practitioners.

“Traditional software systems observability means that you’re seeing how the code is behaving, and you’re seeing how the infrastructure is behaving,” Visnjic said. “With machine learning systems, there’s an additional dimension, which is data.

“If that dataset, say, is buggy, your code may not be changing at all. But if the data changes, then the machine learning system’s behavior is going to change. So machine learning observability captures a lot of the aspects that traditional observability captures, but it adds the dimension of data, and then adds the dimension of how data and code and infrastructure all kind of interoperate.

“[It] helps the debugging of machine learning models take into account how ever-changing real-world data affects the behavior of these machine learning-powered applications.”

WhyLabs alerts users to changes in the data to help point out data drift, model drift and other ML problems related to data.

The current supply chain problems are an example of circumstances that can throw a wrench into a model’s accuracy. Better to know early on that something’s happening before you’re left with a warehouse full of socks that nobody wants to buy or unable to fulfill orders when there’s suddenly a run on paper towels.

The WhyLabs workflow.

A Two-Pronged Approach

There are two parts to WhyLabs’ offering:

The open source library whylogs uses a lightweight agent similar to that used with DevOps tools like Splunk and Datadog. It integrates with existing data pipelines and with all major ML frameworks.

The library can summarize terabytes of data into tiny statistical fingerprints (10 to 100MB, uncompressed) and scales with the number of features in the data. Not relying on sampling, it summarizes all the data using the HyperLogLog algorithm, which requires only one pass on the data to create these approximations.

It creates logs similar to other logging software but adds in these statistical profiles of the data itself to help point out drift patterns and other ML problems related to data.

Whylogs runs in parallel with AI applications, requiring virtually no additional computing power than the application uses.

It scales from local development to multinode clusters, and works well with batch and streaming architectures. Because whylogs infers the schema of the data, it requires no manual configuration. The user needs only an API key to get started, and a single line of code can be used to capture all the data statistics.

Whylogs supports structured and unstructured data, images, video and audio.

Installed in any Python, Java or Spark environment, whylogs can be deployed as a container, and run as a sidecar or invoke it through various ML tools. Released under the Apache 2.0 open source license, whylogs can be freely used, independent of the larger WhyLabs platform.

The platform, built atop whylogs, provides monitoring and observability of ML applications through a purpose-built user interface that collects information about all your models in one place. It enables AI practitioners to track raw data, feature data, model predictions and more with a comprehensive view of the AI application’s entire pipeline and a visualization of how the statistical properties of each feature evolved over time.

Active monitoring highlights deviations in data quality and drifts to generate timely alerts that can be shared across the organization via Slack, email or other messaging platforms. There’s no limit on the number of data points or model predictions captured for monitoring.

Automating Manual Processes

WhyLabs came out of stealth in September 2020, announcing it had raised a $4 million seed funding round from Madrona Venture Group, Bezos Expeditions, Defy Partners and Ascend VC. Maria Karaivanova, a former Cloudflare executive who spent three and a half years as a principal at Madrona, is a fourth co-founder. On Thursday, it announced close of a $10 million Series A co-led by Defy Partners and Andrew Ng’s AI Fund,

“While intelligent applications are on the rise, the tools that AI builders rely on are at best immature, causing data scientists and engineering teams to spend precious time and effort on non-value-added work such as data sampling, error detection and debugging,” Madrona’s Tim Porter and Karaivanova wrote in a blog post about its investment in WhyLabs.

After leaving AWS, Visnjic joined the Allen Institute for AI, a Seattle-based research lab started by Paul Allen, the Microsoft co-founder, and spent a year figuring out how to generalize the tools for the typical problems that machine learning models face in production — how to test models and monitor them.

Visnjic spoke with hundreds of practitioners and founded Rsqrd AI, a community of AI builders committed to making AI technology robust and responsible. More recently, WhyLabs is among 25 AI startups that have formed the AI Infrastructure Alliance (AIIA) to establish robust engineering standards and consistent integration points within the AI infrastructure ecosystem.

Most companies she’s talked to that run AI applications have built some solution in-house, Visnjic said. Those manual processes for debugging are expensive because data scientists have to build them, monitor them and maintain them.

The major cloud providers are adding observability features to their machine learning platforms, and a host of startups have entered the space for data monitoring, including Datatron, Bigeye and Monte Carlo.

The cloud provider features pose problems with portability, Visnjic said.

“Essentially, if we have a team that is using both Sagemaker and Azure ML, then they will have one feature that would monitor the Sagemaker models and the other one will monitor the Azure models, which means that they’ll have to use two tools and try to kind of reconcile the data to understand how the two models are doing, especially if they’re trying to compare and contrast the models.”

WhyLabs also is focused on scalability and ease of use for not just for data scientists, but also site reliability engineers, product managers and other members of ML teams.

The Road Ahead

“We’ve been impressed with how scalable, intuitive, and elegant the platform is — it integrates with existing tools and workflows to support any data type at any scale,” Porter and Karaivanova wrote in their blog post.

“While we expect that cloud platforms and many ML platforms for model building and deployment will offer their own model monitoring, we think customers will also need a truly platform-agnostic AI monitoring solution that provides consistent, best-of-breed insights and observability into model performance regardless of where it is running.”

Added Visnjic: “Within the AI Infrastructure Alliance, we’re building integrations to real-time streaming systems, to federated learning systems, to more machine learning platforms and so on. The goal is to maybe be something similar to Datadog, where we have hundreds of integrations to make it as easy as one line of code to integrate with WhyLabs and to enable observability. So that’s one part of it.

“Another part of it is expanding to more and more data types. … We’re growing our offering with [natural language processing]. With the support of NLP, we want to support kind of various multimodal signal data and so on.

“We believe any given team has models running on many, many different data types,” Visnjic continued. “Even a typical retail organization has models that are running on images, on time-series data, on NLP, on structured data. There should be one place, one platform where they can go and understand the health of these models.

“Then beyond that, we’re focused on making it easy for the user to resolve the issue as soon as it’s identified and potentially proactively identify problems before they affect the customer experience.”

Image by Gundula Vogel from Pixabay