What news from AWS re:Invent last week will have the most impact on you?
Amazon Q, an AI chatbot for explaining how AWS works.
Super-fast S3 Express storage.
New Graviton 4 processor instances.
Emily Freeman leaving AWS.
I don't use AWS, so none of this will affect me.
Cloud Services / Data

MySQL HeatWave Gets Generative AI and JavaScript, Slew of New Features

Beyond GenAI and JavaScript, "Oracle's other database" gets lakehouse, machine learning, AutoPilot, analytics, OLTP, and multicloud goodies.
Sep 21st, 2023 10:18am by
Featued image for: MySQL HeatWave Gets Generative AI and JavaScript, Slew of New Features
Feature image via Pixabay. 

As the Oracle CloudWorld conference takes place in Las Vegas this week, Oracle‘s MySQL team is announcing a number of enhancements to the HeatWave platform that shore up its core functionality; add capabilities in the realm of generative AI; enhance support for the data lakehouse approach to analytics data management, autonomous operation, and in-database machine learning; and address core programmability and performance on the OLTP side, too.

Developer Goodies

The MySQL team briefed the media by starting on the analytics side, and leaving the developer-oriented features for last. As far as readers of The New Stack are concerned, I say they buried the lede, so I’m going to kick off with what the MySQL team left until last: goodies for developers including JSON acceleration and JavaScript-based stored procedures and functions.

JSON support in that base MySQL platform allows JSON data to be materialized in binary and text columns in tables or in virtual columns. It also allows JSON payloads to be passed to stored procedures and functions as arguments. MySQL supports use of its MongoDB API-compatible XDevAPI on the client side and numerous programming languages can be used in the MySQL shell to manipulate the JSON data on the input or output side. But now JSON data can be brought into HeatWave, where it is stored in binary format, partitioned, compressed up to 3x and scaled across nodes. The MySQL team says simple filter queries can be accelerated up to 20x, aggregation queries up to 22x and large join queries up to 144x.

Moving on from the JavaScript Object Notation format to the JavaScript language itself, stored procedures in HeatWave can now be coded in that language, in addition to the long-supported use of SQL. SQL is a declarative, set-based language, which can make it hard to perform more imperative tasks. JavaScript stored procs and functions eliminate this constraint and are called and used in exactly the same way as SQL-based ones, be it in queries, views, data manipulation language (DML) commands or data definition language (DDL) commands.

Data type conversions between the two languages are implemented implicitly. The JavaScript code executes in a GraalVM virtual machine, which provides for secure/sandboxed use of compute and memory, and which blocks direct network and file system access.

Lakehouse Enhancements

Now let’s move on to HeatWave’s lakehouse capabilities, as there are a few dimensions to it. First off, HeatWave is adding support for the Apache Avro data file format to its existing compatibility with CSV and Apache Parquet formats. The functionality includes support for multiple compression algorithms, across which the team says performance is consistent. Avro support also includes — via HeatWave’s “Autopilot” assistance feature — schema inference, cluster capacity estimation for data load operations, and a time estimate for same.

What’s key in this announcement is that HeatWave now supports an optimized data format for row-oriented data. Compare this with the unoptimized text-based CSV and the column-oriented Parquet format and you can see that Oracle’s MySQL team is paying attention to OLTP workloads, in addition to the analytical workload support that was HeatWave’s original hook. Meanwhile, that analytical side would benefit from support for the Delta, Iceberg and/or Hudi open table formats that build on top of the Parquet standard.

Next on the lakehouse side is support for HeatWave on the Amazon Web Services cloud. This means data in any of the three supported formats that any customer may already have in Amazon’s S3 object storage is now available for processing with HeatWave. Even though HeatWave itself runs in Oracle’s own AWS account, connectivity to data in the customer’s account is still provided. Adding S3 data to HeatWave can be done simply by providing an ENGINE = LAKEHOUSE clause in a CREATE TABLE command, and that command can itself be auto-generated by Autopilot, leveraging the schema inference we’ve already discussed.

AutoML Enhanced, Now Encompasses Generative AI

Moving on to the world of AI, HeatWave’s AutoML (automated machine learning) can leverage this S3 data access, including the new Avro support, to build machine learning models that reside in HeatWave and are trained on HeatWave data. HeatWave AutoML also supports recommendation models, beyond other AutoML platforms’ typical support for classification, regression, clustering/anomaly detection and time-series forecasting models.

With respect to competition, Oracle claims HeatWave’s training times are 25x faster than those for Amazon Redshift, with the implication that HeatWave is a better analytics database for AWS than AWS’ own data warehouse offering. And beyond Redshift, Snowflake’s SnowPark ML, provides a bridge to scikit-learn and doesn’t provide any built-in AutoML, according to the MySQL team.

There’s generative AI support in MySQL AutoML too, and it takes a couple of forms, including support for Large Language Models (LLMs) and a built-in vector store. On the LLM side, HeatWave can use BERT and Tfidf to generate embeddings from the content of text columns in the database and submit them to the AutoML engine, alongside numerical representations of data in conventional scalar data columns. From all these inputs, tuned models are produced.

Documents in object storage factor in as well, as vector embeddings for them can be stored and indexed in the HeatWave vector store. Together, these features lead to more contextual answers to generative AI queries, as data in the vector store can be used to augment the prompts sent to the LLM.

Autonomous Autopilot

Moving on to HeatWave’s Autopilot, which uses AI to implement autonomous operation, or assistance with advanced features, the team has added support for Autopilot indexing, auto unload, auto compression, and adaptive query execution. The last of these, according to the MySQL team, dynamically adjusts data structures and system resources even after query execution has begun, to accommodate the actual distribution of the data observed as the query engine encounters it. The MySQL team reports first-run performance improvement of between 10% and 25% as a result of adaptive query execution.

Autopilot indexing is a machine learning-driven service that recommends secondary indexes for OLTP workloads, and includes suggesting new indexes as well as pointing out superfluous (e.g. unused or duplicate) indexes that should be dropped. Autopilot indexing takes both queries and DML operations — like UPDATE, INSERT and DELETE — into account. The service also predicts both storage requirements and performance, and it provides explanations for its recommendations.

Auto load and unload moves data from a conventional MySQL database into and out of the HeatWave cluster, based on frequency of access, helping developers avoid performing these operations manually. Auto-column compression will mix and match compression algorithms on a per-column basis, finding the right balance between memory usage and performance. The company claims memory savings of between 6% and 25% and performance increases between 6% and 10%. The fact that there can be improvement on both the memory and perf axes, rather than making developers choose between them, is an impressive testimonial to the value of algorithmic optimization.

And More

Other capabilities include a bulk data ingest/load feature, partitioning, analytics functions, SET operations, and availability on multiple clouds (Amazon Web Services, Microsoft’s Azure and Oracle Cloud Infrastructure). These and all the other capabilities discussed here should ensure continued momentum for MySQL HeatWave that Oracle says it has seen in the digital marketing, gaming, healthcare and fintech sectors. This is a real smorgasbord of capabilities, demonstrating that Oracle views MySQL as a strategic asset in its portfolio. Does Oracle Database itself rule the roost? Maybe. But MySQL, with its decades-long ecosystem, its huge community, and its modular, pluggable engine architecture, has found new life in the cloud, in analytics, in machine learning, and now in generative AI.

Group Created with Sketch.
TNS owner Insight Partners is an investor in: The New Stack.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.