Cloud native platform provider Pivotal Software has brought its flagship massively parallel processing analytic database Greenplum into the Kubernetes era.
At last week’s Greenplum Summit in New York, the company unveiled Greenplum for Kubernetes, which “gives data professionals and application developers the ability to deploy, operate, and upgrade self-service clusters wherever Kubernetes is installed, in both cloud and cloud native scenarios,” according to a blog post introducing the new product.
Based on PotsgreSQL, Greenplum was designed to manage large-scale analytic data warehouses and business intelligence workloads.
According to the blog post, the key to enabling Greenplum for Kubernetes lies in the Greenplum Operator, which “creates, configures and manages instances of complex stateful applications on behalf of a Kubernetes user, informing how Greenplum should be configured and deployed.” This will help users “avoid lower-level configuration tasks.” And like many offerings of its kind released for Kubernetes, Greenplum for Kubernetes works wherever Kubernetes is deployed, whether Pivotal’s own Pivotal Container Services (PKS) or another cloud provider, such as Google Kubernetes Engine (GKE) and even on-prem.
Adapting to Kubernetes didn’t really affect the architecture of Greenplum all that much, and as long as certain parameters were kept in mind, there is “literally almost no impact on the performance,” explained Jacque Istok, Pivotal vice president of data.
“The architecture of Greenplum is relatively consistent. At the base level, if you have a single host, a single VM, or a single server, the way Greenplum would parallelize on that server is by installing ten Postgres instances. Then your queries would be distributed across ten Postgres instances all on that same server, all with one-tenth of your data,” he said.
With Kubernetes, Pivotal containerized each of Postgres databases and then baked in the orchestration of moving data between them. “Through the Kubernetes operator, we’re allowing for each of those containers to function very similarly to how they were before without the containers,” said Istok.
“All SQL databases, including Postgres and Greenplum, generally run as fast as the underlying infrastructure available to them,” he further explained. “Your Greenplum cluster is going to run as well as your underlying infrastructure, which means your Kubernetes architecture needs to include the ability to attach fast network and fast disk I/O in-between and to each of those containers in order to get the same level of experience you would get non-containerized.”
Beyond the new Kubernetes offering, Istok also focused on Pivotal’s recent attention toward bringing Greenplum to parity with the open source PostgreSQL project in version, as well as its launch of Pivotal Postgres, which is the “open-source PostgreSQL binaries, packaged and commercially supported by Pivotal.”
When Greenplum was created in 2005, it forked from Postgres 8.2. Greenplum 5 took it to Postgres 8.3, and Greenplum 6 is now based off Postgres 9.4, with the advantage being that Greenplum gains all the base features that have been introduced in Postgres and can instead focus on innovating its own features.
“That’s significant, first and foremost, because 9.4 brings all the innovation from 8.3 on, but it also allows us to function more like a traditional single processor, single instance database and it also allows us to take advantage of practically the latest Postgres connectivity that exists in the ecosystem — that would be both open source and commercial,” said Istok. “It took us about two and a half years to just uplevel one version. What we’ve learned over that time is how to really accelerate it. About a year later, we’re able to get several versions ahead. For Greenplum 7, when it comes out a year from now, we’ll be that much closer to current.”
As for specific new capabilities with Greenplum 6, Istok pointed to two features as highlights over Greenplum 5. First, Greenplum 6 has adopted the write-ahead log (WAL), which he says allows them to “begin to offer advanced capabilities such as point in time recovery and site-to-site replication.” Second, Istok pointed to the introduction of row-level locking for updates and deletes, which he says has led to a 50x performance increase for mixed workloads over Greenplum 5.
While Greenplum 6 is currently available only in beta, it is expected to be generally available in June 2019.
Pivotal is a sponsor of The New Stack.
Feature image via Pixabay.