What news from AWS re:Invent last week will have the most impact on you?
Amazon Q, an AI chatbot for explaining how AWS works.
Super-fast S3 Express storage.
New Graviton 4 processor instances.
Emily Freeman leaving AWS.
I don't use AWS, so none of this will affect me.
AI / Data / Python

Alluxio Launches Enterprise Platform for Generative AI Apps

Alluxio introduces new technology for accommodating complex AI pipelines, accelerating training times and optimizing GPU utilization.
Oct 20th, 2023 9:34am by
Featued image for: Alluxio Launches Enterprise Platform for Generative AI Apps
Image via Pixabay

Alluxio, a company tracing its roots to 2014 open source project Tachyon at UC Berkeley’s AMPLab, has launched its Alluxio Enterprise AI (AEAI) platform, geared specifically to deep learning and generative AI applications. The new platform runs atop the established Alluxio Data Platform, adding optimizations for GPU resources and specific machine learning (ML) and deep learning libraries.

Earnest Beginnings Arrive at AI Destination

Although the original Alluxio Data Platform always seemed to resemble a data virtualization platform, it is, in fact, rather different from most platforms with that label. As Adit Madan, Alluxio’s Director of Product Management, told The New Stack in a briefing: “We have different capabilities… including a distributed cache and global access to data, no matter where the data is coming from.” He continued: “This is relevant […] with folks who started adopting a hybrid cloud and multiple object stores […] for people with a multicloud strategy.”

More background: Alluxio Provides Distributed Storage at In-Memory Speeds

Madan explained that Alluxio Enterprise AI follows suit in this regard, but with a special focus on deep learning and large-scale model training. It provides optimizations for ML and deep learning libraries like Spark, PyTorch and TensorFlow, and delivers high-performance I/O over commodity storage, using Alluxio’s new Decentralized Object Repository Architecture (DORA). DORA works over cloud object storage (helping customers avoid egress charges in the process), and both open source and commercial on-premises storage platforms like the Hadoop Distributed File System (HDFS) and Minio. DORA also offers workload-specific optimizations for ML training and analytics. Other features include a Kubernetes operator and data preload for fast model deployment, which Alluxio says can reduce production deployment times by 2-3x.

Alluxio Enterprise AI’s process flow for AI model training and inferencing.

In doing all this, Alluxio Enterprise AI has the power to accelerate end-to-end AI pipelines, including those for large language models (LLMs), natural language processing and computer vision. Alluxio can handle complex pipelines in this regard, which for example might include combined cloud and on-premises training jobs using data in both cloud object storage and HDFS, as well as deployment of new and updated models to the cloud, to service inferencing requests (calls to trained models in production) from downstream applications.

GPU Optimization

By extension, though, AEAI does more than accelerate performance; it also optimizes usage of GPUs (Graphics Processing Units, the specialized chips that power demanding AI workloads). The secret to that is awareness not just of multiple data source platforms, but of the ML and Deep Learning libraries that access the data. As Madan put it: “We are tightly integrated with the compute applications on top, so we have awareness of, for example, the PyTorch programs and, inside that, there is the concept of a DataLoader, which is responsible for the I/O access. Based on that, we are able to intelligently detect what is going to be accessed and when it is going to be accessed, to kind of neatly plug into that infrastructure to provide this benefit.”

Check out:

Alluxio says its AEAI platform can reduce the percentage of training run time spent in PyTorch’s DataLoader from more than 80% to less than 2%, resulting in GPU utilization rates jumping from as little as 20% or less, to as much as 90% or more. Alluxio Enterprise AI can even train on cloud-based GPUs using on-premises training data. As GPUs are both expensive and may be scarce within a particular organization, this ends up being a very strategic approach. Optimizing GPU usage can cut costs for AI model inferencing, in addition to training, but Madan says the primary benefit of GPU optimization is the higher capacity it delivers, allowing for more experimentation and training of more models.

LLM Dividend

That’s already a lot, but there’s one more benefit, specific to the context of generative AI: the Alluxio Enterprise AI platform, according to the company, can also accelerate the fine-tuning of pre-trained foundation models. The fine-tuning process is what allows a generalized LLM to become tailored to the specific data and context of a particular organization. It is, in effect, a customized training pass that is key to making LLMs useful in enterprise settings.

With these new capabilities in place, Alluxio has happened upon a use case for its Data Platform’s architecture and approach in AI that, for some, will be more critical than the data access benefits it provides. Now, instead of a query and analysis caching and acceleration layer that some confused for a data virtualization platform, Alluxio Enterprise AI is a processing layer that makes machine learning, deep learning, and LLM training, inference and tuning far more pragmatic and economical for both enterprise organizations and smaller, cost-sensitive organizations alike.

Getting AI to Enterprise Scale

Whether AEAI will give Alluxio increased traction in the market is hard to predict. But even if it doesn’t, the approach is laudable and could create a new paradigm in AI platforms. Enterprises can’t just hobble along by combining AI cloud services and all-purpose data analytics layers. Instead, they need platforms that eliminate the superfluous layers of processing, calculation and resource usage that can arise from these hodgepodge approaches. That’s the only way to make AI scale, and AI can’t become ubiquitous and pragmatic without that scale.

The cloud can be an economical place to store the masses of data needed to train today’s AI models, but it can be an inefficient place to access that data and get the training done. By highly optimizing file access over cloud object storage and by letting customers leverage data available GPUs wherever they may be, AEAI takes efficiencies that may have seemed merely elegant in the Alluxio Data Platform and leverages them for concrete economic and time-to-market benefits in the growing world of applied AI.

Group Created with Sketch.
TNS owner Insight Partners is an investor in: The New Stack.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.