Data / DevOps / Machine Learning

Red Hat Fills a Gap with OpenShift Data Science

24 Nov 2021 6:00am, by

Following its initial launch earlier this year, Red Hat has released Red Hat OpenShift Data Science as a “field trial”. The managed cloud service provides enterprises with an environment tailored for artificial intelligence and machine learning (AI/ML) on Red Hat OpenShift.

According to Steve Huels, senior director for AI product management at Red Hat, AI/ML is nothing new for Red Hat, and the origins of Red Hat OpenShift Data Science lie in the company’s experience building AI/ML features into its own platform, with things like Red Hat Insights. Eventually, Red Hat took that experience and codified it into the Open Data Hub open source project, but Huels explained that Red Hat’s customers were looking for a managed service they could buy that would provide these capabilities.

“They were like, this is fantastic, it did everything we wanted to do, can we buy it? And the answer was always around a set of Red Hat components and partner components, but that left a little spot in the middle that nobody was offering in a standalone capacity,” said Huels. “You could always buy a much larger suite from a partner, but it came with a lot of other stuff that maybe you weren’t ready for or didn’t need. Ultimately, that is what led to OpenShift Data Science. It was able to fill this gap that partners were having around a core set of AI/ML components.”

Built on a Subset of Components

Red Hat OpenShift Data Science is built upon a subset of the components offered in Open Data Hub, such as JupyterLab, Tensorflow, PyTorch, SciKit, Panda, and NumPY, which it then integrates with more deeply in Red Hat OpenShift and offers SRE support around, as part of the managed service. Part of the integration also centers around the ability for users to perform repeatable actions, rather than having to configure and deploy from the ground up each time, explained Huel.

“Part of the challenge also with AI/ML is how do we put this stuff into a repeatable lifecycle process, right? It’s one thing to do it as a one-off experiment, it’s another to be able to take that, do it repeatedly, do it routinely,” Huel said. “This is where Red Hat’s DNA within DevOps really came into play. We’ve been managing things like for developers on the GitOps lifecycle for a long, long time,” he added, pointing to things like Red Hat’s use of Tecton for pipeline management.

Use Cases

As for who should consider using OpenShift Data Science, Huel said that the service has something for everyone, from those just getting started with AI/ML who want to more easily try out training models and using ML, to those who want to run the latest tools but don’t want to be bothered with managing them. Red Hat puts out a new release every two weeks, helping data scientists keep up with the rapid change to the components, and operations teams don’t have to do anything out of the ordinary to keep things up to date.

“When it comes time to take that model back in house and deploy it closer to whatever the source system is, the operations side is comfortable with that because the model comes out as a container,” explained Huel. “It helps alleviate that tension where both communities get what they need, and they both feel comfortable with the working environment.”

Another potential use case for OpenShift Data Science, explained Huel, was for organizations that found themselves unable to hire data scientists for one reason or another.

“This helps fill a gap where you can take really good engineers, and they can do some of that modeling themselves,” said Huel. “Instead of being dependent on the data science community, they’re able to do some of that work, and then get it validated by the data scientists, but be able to move forward more rapidly. It helps alleviate some of that talent shortage concern that’s out there, and keep folks moving forward.”

New Features

This latest version of Red Hat OpenShift Data Science comes with several new features, including the Intel oneAPI AI Analytics Toolkit and support for Intel OpenVINO Pro for Enterprise, as well as new integrations for Anaconda Commercial Edition, IBM Watson Studio, Seldon Deploy, and Starburst Galaxy.

With the service’s release as a “field trial,” Huel said this was another thing that Red Hat was starting as a parallel to the usual alpha, beta, general availability (GA) cycle found with traditional on-prem software. The field trial, he said, provides a “code-ready product” with support and SRE monitoring, but doesn’t put it at the level of GA quite yet. Instead, Red Hat is looking for feedback and to make sure it has a market fit before it is fully released as GA.

Moving forward, the service is expected to gain support for NVIDIA-accelerated computing with NVIDIA graphical processing units (GPUs) and the addition of Google Cloud Platform (GCP) and Microsoft Azure. In more broad terms, Huel said that Red Hat wants to expand into providing “a more robust model serving environment”.

“We want to be able to provide the full end-to-end ModelOps, because that really is the automated realization of the full DevOps lifecycle: develop it, publish it, monitor it, rinse and repeat,” said Huel. “These things are true to Red Hat’s DNA. We’ve got a long history there, we can plug these in. It’s just a matter of the standards in the AI/ML world kind of settling, and there being some agreement there.”