With VantageCloud, Teradata Makes a Bigger Dent
In a splashy event at the New York Stock Exchange last week, Teradata announced its new VantageCloud services, which enable the company’s data warehouse platform to accommodate new workloads and new users. In addition, Teradata is branding and reiterating its library of in-database analytics functions for the new users and scenarios for which it hopes to be applied and implemented.
The Core News
In response to competition from the likes of Snowflake and Databricks, Teradata is introducing its VantageCloud Lake platform to service exploratory and ad-hoc analytics requirements with cloud object storage economics, even as it retains the Teradata Vantage platform’s functionality and enterprise-grade governance. For more traditional data warehouse customers, VantageCloud Enterprise would seem to provide a solution for Global 2000 enterprises and their IT organizations. VantageCloud Lake looks to target more of the Global 5000, including both its IT and its line of business groups. Both VantageCloud services will leverage cloud object storage and its associated cost model.
VantageCloud Lake is a cloud native SaaS offering, providing self-service access to an elastic, auto-scaled version of the Teradata Vantage platform, with usage-based unit pricing, and can even auto-hibernate when usage dips to zero. And despite its end-user-oriented web-browser-based UI, VantageCloud Lake also features central cost management controls to avoid runaway cloud service bills.
Despite the “Lake” name, the offering is not about switching to Parquet or Iceberg as native formats for table storage. Instead, the focus is around the Lake File System and the Native Object Store, which manage proprietary Teradata tables in cloud storage. Meanwhile, Teradata’s QueryGrid technology will continue to provide connectivity to data stored in more raw, open formats, which can be in object storage as well.
In-Database Analytics and ML
Teradata is introducing ClearScape Analytics as the brand for its advanced, in-database analytics functionality, which extends SQL with a library of inline functions covering advanced calculations, data preparation and some significant machine learning capabilities. Along with the brand, Teradata is introducing new time series functionality across over 50 new in-database functions.
The ML capabilities include feature engineering, model evaluation and scoring, as well as the ability to bring in custom Python or R code, open source ML libraries and even custom models. In-database models can also be exported for external serving and the Vantage ModelOps Extension supports model deployment, data drift and performance monitoring, and a built-in feature store. There’s also the integration of third-party ML platforms, including Dataiku, H2O, Microsoft’s Azure Machine Learning, Amazon SageMaker and several of Amazon’s ready-to-run AI services, like Forecast and Comprehend.
Classic Data Warehouse Technology Still Relevant
Let’s unpack this a bit and review some of the relevant history of data warehousing, big data and cloud analytics to see how they all fit together. Teradata is a data warehouse pioneer, having practically invented the category, many of its constituent technologies and its early business models. Maybe that sounds old-fashioned, but the reality is that most of the newer data cloud data warehouse contenders use these technologies now, and they’ve had less time to hone and perfect them, so experience here is an asset, not a liability — at least when you base your analysis on the merits.
Today’s now-standard combination of data warehouse technologies consists of the following:
- MPP (massively parallel processing), wherein the effort to fulfill large, complex queries is split up among a bank of worker nodes whose collective output is then integrated and returned to the client.
- Columnar storage, wherein data value storage is organized around the columns in a table — rather than the rows — which streamlines the aggregation of those values.
- Vector processing, wherein batches of values are processed together, rather than sequentially.
All of these technologies were spearheaded by Teradata, either on its own or as a member of a small cohort of early data warehouse platforms. As such, indexes and query optimization heuristics around these capabilities are something Teradata has been able to build, refine and perfect literally over decades, leading to superior performance. That’s why it’s been in use at large enterprise organizations for that same amount of time and has had incumbent status. Again, the relevance of these technologies and optimizations has not gone away.
Cloud Economics and Change
But the economic model for data warehousing has changed drastically because of the cloud. On-premises data warehousing was typically based on appliance-like hardware form factors — big cabinets with all the servers, storage and networking preconfigured and ready to plug and go. The abstraction that provided was actually very cloud-like, but the limits it imposed were not. Those big cabinets created physically finite limits on all the resources contained within them.
With a data warehouse appliance, there could only be a certain amount of processing power and storage. With both of those at a premium, expansion of those resources involved significant capital expense and a lot of operational disruption. This combination of cost and scarcity led most IT organizations to deploy data warehouses in a highly conservative and defensive fashion, with high bars imposed for access, usage and the introduction of new data. This meant that data warehouses could get the job done, but those jobs had to be highly vetted, operationalized and mission-critical.
Big Data as DW Counterpoint
Such a standard was hostile to discretionary, exploratory, ad-hoc analytics. Arguably, this is exactly what led to the big data revolution, especially because it was based on commodity servers, and direct-attached drive storage, both of which were cheap and incrementally expandable and scalable. In addition, the big data paradigm is based on working with data in its raw form, which short-circuits all the modeling that must be done to add new data to the warehouse. As a result, big data clusters were perfect for more agile, ad hoc and exploratory use cases, and they countered the defensive deployment of an enterprise data warehouse with an offensive deployment that encouraged the more experimental use of analytics critical to establishing a data culture, data literacy and data-driven operations.
Interestingly, companies and technologies that Teradata acquired in previous decade’s big data era underly some of VantageCloud’s crown jewels. For example, QueryGrid and many of the ClearScape Analytics capabilities were derived from Teradata’s acquisition of Aster Data Systems and its SQL-H and Aster Analytics capabilities, respectively.
VantageCloud Brings Teradata up to Date
Even as Teradata introduced its Vantage architecture that permitted cloud deployment in addition to using on-premises, the old economic model persisted, as did the barriers to entry of using the platform. The movement from Vantage to VantageCloud — and especially VantageCloud Lake — changes all that. Here’s why:
- VantageCloud uses cloud object storage, rather than high-end, onboard disks in the worker nodes. This means storage is cheaper. It also means it’s infinitely expandable and independent of computing resources.
- VantageCloud Lake’s compute is elastic and auto-scaled, thus decoupling the performance attributes of the Teradata platform from the need to size, configure, optimize and manage a data warehouse cluster.
- VantageCloud Lake allows provisioning of multiple clusters that operate on the same data, allowing for greater user concurrency and isolation, to assure discretionary analytics work won’t compromise the performance of production workloads.
The end result is that Teradata enters the realm of self-service operation and becomes much more appropriate as a platform for exploratory analytics, at the business unit level, with far less dependency on IT and, ostensibly, faster time to insight and data-driven decisions. At the same time, the workload management, cost controls and data governance that Teradata is known for, and which are quite relevant in the era of cloud data warehousing, can still be applied. Teradata sees this as the best of both worlds.
We’ve seen Databricks create and roll out its Photon engine, specifically to bring data warehouse-grade query performance to its platform. It’s one key reason the company describes its platform as a data lakehouse, rather than a data lake. We’ve also seen Snowflake add governance capabilities and start to grapple with customer sensitivity around cost, as adoption of its platform has increased.
Teradata believes this validates its core technology approach and credibility while VantageCloud Lake gives it the self-service and scale flexibility to let it compete in the more organizationally decentralized analytics environment that digital transformation and trends like data mesh have popularized and justified.
Some challenges do remain. Teradata is not adopting open source file formats for native table processing, while Snowflake recently announced that it will do exactly that (with the Iceberg format) and Databricks leverages formats like Parquet and Delta Lake natively. Accordingly, both companies may both claim greater entitlement to the lakehouse moniker.
Price of Admission
Beyond the technology, many of Teradata’s competitors provide entry-level pricing that allows curious developers to get hands-on with their platforms at low or no cost. Teradata, in its targeting of large enterprises, doesn’t have a comparable offering, and it probably should be for non-production use. That said, Teradata may see low-cost or freemium models as tarnishing its enterprise reputation. Yes, VantageCloud Lake is designed to accommodate modern capabilities, workloads and usage patterns. But it’s aimed at departmental needs in large organizations, rather than the core needs of smaller organizations or individuals.
That’s a disciplined sales strategy decision. Time will tell if the modern data industry can abide by it.