PostgreSQL 16 Expands Analytics Capabilities
The recent release of PostgreSQL 16 is significant for a number of reasons. It enables more flexible access control mechanisms, which have immediate consequences for deployments involving Managed Service Providers (MSPs).
Version 16 also supports hot standby capabilities which, when serving as the source for logical replications, has the ability to “allow for new architectures,” affirmed Adam Wright, Senior Product Manager for EnterpriseDB (EDB). EDB was one of the foremost code contributors to PostgreSQL 16, an open source Relational Database Management System.
Most importantly, however, the latest version of PostgreSQL includes analytics functionality for facilitating complicated aggregation and windowing queries. This enhancement, when paired with the database’s extensions for managing geospatial and vector data, respectively, is perhaps the latest indicator of the increasing relevance of transactional databases for analytics.
“I think what you’re starting to see is the need for specialty data warehousing is starting to get lower,” Wright reflected. “You might have extreme ends of the data warehousing market, where having a specialty system is necessary. But that’s really starting to become the extreme end and, for a lot of use cases, you can just use Postgres and not need that specialty system.”
Although PostgreSQL is widely used in transactional systems, its implementation of what Wright called the “any value function” in version 16 has definite analytics overtones. This function is part of the SQL:2023 standard. “This function is really mainly used for analytical databases,” Wright revealed. “Complex aggregations/windowing queries is kind of the subheading for what you can use that for.”
This particular function allows administrators and developers to do calculations across a set of rows in a table. “You might compare how a calculation for one row is done against another row and get some aggregation of those two,” Wright explained. “So, things like getting a running total and doing that easily through a few lines of SQL.” This feature is valuable for use cases such as comparing product types for stock ordering in retail, particularly across different locations represented in a large table.
Vector and Geospatial Data
According to Wright, it’s often more efficient to manage this task at the database level than at the application level. With the latter approach, administrators or developers would have to write more code than they otherwise would, as well as write multiple functions. Then, they’d have to bring the data back and compare it, instead of filtering everything out of the database server. However, “if I’m on the database server and I write this aggregation query, I’m comparing different rows in the table,” Wright mentioned. “Once that’s all done, I’m going to stream back only the records that are necessary to the application.”
A PostgreSQL extension, PostGIS, enables users to store and query geospatial data. Wright referenced another extension for managing vector workloads that includes storing vector data and supporting vector operators. “It’s just another use case that’s available in Postgres that you don’t need to go to a specialty database vendor and get another contract, and another support, and have to onboard whatever you may need to actually support those workloads,” Wright commented.”
Although the extensions Wright mentioned are not part of the new capabilities unveiled in PostgreSQL 16, they attest to the database’s growing analytics usefulness, which coincides with the recently released aggregation and windowing function. The new edition also includes enhancements to its logical replication capabilities, the most substantial of which is “that you’re able to do logical representation from a standby server,” Wright remarked. With this paradigm, users can do logical representations from what Wright termed a hot standby, which he described as involving a physical replication of every change in PostgreSQL data to a target system — frequently for High Availability.
“With logical representation, instead of getting all the changes from the database, you might just want to replicate a couple of tables, a sales table, an orders table, and feed them to these different systems,” Wright observed. Logical representation enables users to specify the tables that are replicated which, when combined with the ability to do so from standby systems, broadly expands the possibilities for replications and architectural approaches. “You can have cascading multiple replications and things like that,” Wright said. “You can do everything from one system, but can also get subsets of the data from Postgres more easily, get into read-only systems… There’s just a lot more architectures that are going to be supported because of these native features.”
PostgreSQL 16 also enhances the degree and scope of control associated with superusers — users with a considerable amount of privileges for data and system access. In previous versions, superusers had the latitude to do almost anything, including tampering on the underlying “operating system as the service that’s running as Postgres,” Wright admitted. “This is a big problem for managed data services.” Consequently, managed service providers (including some of the hyperscalers) would “fork” or replicate the database, users, and settings from the original cluster to another.
According to Wright, this process may lead to “bugs and security issues.” Consequently, the more refined superuser controls in the most recent version of PostgreSQL enable “you granular management of privileges and to delegate tasks that are needed to manage the database for DBAs, but not give them things that let them break out of the database,” Wright said. “Or, you can manage a role but not necessarily manage the data for that role.”
The Database Space
The new additions to PostgreSQL 16 are, for the most part, indicative of developments that are impacting the database space as a whole. Systems that were conventionally used for transactional purposes are taking on more analytics responsibilities. PostgreSQL’s native support of an aggregate function for writing aggregation and windowing queries — and extensions for workloads pertaining to geospatial and vector data — is perhaps a harbinger of a future in which the traditional divide between transactional and analytics databases is not so pronounced.