Kubernetes / Machine Learning

IBM’s K8s-Based CodeFlare Framework Takes AI from Laptop to the Cloud

9 Jul 2021 3:00am, by

IBM has launched CodeFlare, a new open source framework for simplifying the integration and scaling of big data and AI workflows. Built on top of Ray, a distributed computing framework for machine learning applications, CodeFlare takes aim at a common problem, especially in the world of AI — allowing a developer to easily scale from their laptop directly to the cloud, to take advantage of high-performance computing and scale, without infrastructure expertise.

“What we saw was this convergence of machine learning, simulations, and big data processing, where the big workflow or application you’re trying to build in the end needs to stitch together various aspects of these capabilities and that the struggle is in executing the scalability on the cloud,” said Priya Nagpurkar, director of cloud platform research at IBM Research. “There is still a little bit of struggle and challenges when it comes to the skills and expertise required for the data scientists or the high-performance computing domain experts to run these things scalably. That’s why, from a cloud and platform and runtimes perspective, what we said is, ‘How can we evolve our cloud platform to cater to these new emerging workloads?'”

The AI workflows in question handle a variety of tasks — from data cleaning to feature extraction and model optimization. Currently, explained Nagpurkar, developers and data scientists are stitching together these workflows using different frameworks — especially when it comes to handling parallelization and scaling — each of which potentially using a different language. CodeFlare looks to simplify this process by narrowing everything down to a single runtime using Python, the language that already serves as a common tool in data science, and by handling the task of scaling those processes as needed.

“The DevOps divide that exists in other places exists here, also,” said Nagpurkar. “I develop on my laptop, I build these pipelines, but then when it comes to, I now need to run them against a big data set and with some scale, it completely breaks down. I basically have to go and paralyze fragments of it separately.”

In a blog post further describing how CodeFlare works, a team of IBM researchers explains that “Cloud native platforms are the obvious choice, but using a container to achieve parallelism is too coarse-grained: from a data scientist perspective, scaling a Python function should not require standing up a container.” The answer, then, was to build on top of Ray, which not only provides scaling at a Python function level, but also offers a distributed object store for sharing of objects.

As for how CodeFlare delivers on its promise to take these workflows from laptop to the cloud, the answer underneath it all is still Kubernetes. The framework abstracts away the process of standing up that container, however, making it so that the developer does all their work in Python while still enjoying the benefits of parallelization and scaling made possible by Kubernetes. In its announcement, IBM specifically calls out the company’s new serverless platform, IBM Cloud Code Engine, as well as Red Hat OpenShift, as cloud platforms that CodeFlare easily deploy to, but Nagpurkar explained that Kubernetes is the common necessary substrate.

“The common denominator there is Kubernetes. That’s where the cloud native and new frameworks come together,” Nagpurkar said. “I think the trend definitely is for the data science, big data, as well as machine learning community to recognize that Kubernetes environments are the way to go to run on cloud. All the problems that Kubernetes solves with multiclusters, scalability, security, isolation, multitenancy, we want to leverage all of these, but Ray as a framework gives us that one more kind of level of abstraction and APIs around tasks and actors.”

As with many developments of this nature, CodeFlare is described as providing a consistency for data scientists that will allow them to “focus more on their actual research than the configuration and deployment complexity.” In that same sense, the consistency of operating on Kubernetes will also help to stitch together not just disparate workflows, but also siloed applications, Nagpurkar said.

“We talk about AI being infused in business applications, and so on, and that being the future. Today, it’s still siloed,” he noted. “You do your data science, and you build your models over here with a different set of tools, and then you have your business applications in running and Knative, and so on. One thing we’re doing in CodeFlare today is bridging these worlds through eventing for now. If you have, for example, a simple dashboard that’s implemented as a microservice running on Knative, you can easily imagine a CodeFlare pipeline spitting out events that update that thing on Knative.”

Feature image via IBM.

A newsletter digest of the week’s most important stories & analyses.