Pivotal is claiming up to 100x performance improvements with the upgrade of its Big Data Suite, including major component updates to the Pivotal HD enterprise-grade Apache Hadoop distribution.
The Hadoop stack update provides better stability, management, security, monitoring and data processing, and the Pivotal Query Optimizer has been added to both analytical MPP data warehouse Greenplum Database and HAWQ, an SQL on Hadoop analytic engine, according to the company.
Pivotal Query Optimizer is designed to effectively determine the cost of processing a query across a number of machines and processors in a cluster.
“In the past, queries have been executed in a very static methodology,” explained Michael Cucchi, senior director of outbound product for Pivotal. ”You write a database query engine and ask a question of it, it breaks down the query, then runs off and executes it. [Pivotal Query Optimizer] takes into account the cluster environment being run and all the conditions across the cluster.” It simultaneously performs numerous optimizations to execute much faster than historic solutions. In addition to the performance gains, it’s highly configurable to provide flexibility in the way it runs various queries, he said. “The real breakthrough is having such a powerful optimization engine working in real time to optimize queries as they occur,” Cucchi said. These advancements are designed to help customers manage the massive amounts of data being produced in their mobile, cloud, social, and the Internet of Things projects. This is the first version of Pivotal HD based on an Open Data Platform, a coalition of Big Data vendors unveiled in February.
The new release updates existing Hadoop components for scripting and query (Pig and Hive), non-relational database (HBase), along with basic coordination and workflow orchestration (Zookeeper and Oozie); adds Apache Spark stack including related component Spark SQL, Spark Streaming, MLLib, GraphX; and adds additional Hadoop components for improved security (Ranger, Knox), monitoring (Nagios, Ganglia in addition to Ambari) and data processing (Tez).
Around 18 companies have joined the Open Data Platform, according to Cucchi, including Hortonworks, IBM and Teradata. The group announced they will standardize around core technologies including Apache Hadoop 2.6, inclusive of HDFS, YARN and MapReduce as well as Apache Ambari software for managing Hadoop environments at scale.
Most notably absent from the alliance are vendors Cloudera and MapR, which openly declined to join. It criticized the initiative in a blog post, saying it’s redundant to the Apache Software Foundation, its core is “vendor-biased,” and that the platform is “solving problems that don’t need solving” while “benefitting Hortonworks marketing and providing a graceful market exit for Greenplum Pivotal.”
Pivotal also announced plans in February to open-source its Big Data Suite. It did so with in-memory database GemFire three weeks ago. Its Geode project is in incubation with the Apache Software Foundation. It plans to propose HAWQ to ASF in the third quarter and to open source Greenplum Database in in the fourth quarter.
MapR’s criticisms had included whether other Hadoop resource managers and frameworks such as Apache Spark and Mesos would be excluded. The addition of Spark seems to answer that in part, just as adding support for Parquet files last year seems another attempt to address its critics.
Cucchi continues to maintain that Greenplum gains performance and flexibility benefits from its proprietary file system, and that since HAWQ runs natively on the Hadoop cluster, it scales linearly along with that cluster.
Tomer Shiran, vice president of product management at MapR Technologies and a member of the Apache Drill Project Management Committee, has suggested that these projects are being open-sourced because they haven’t gained the market traction that Pivotal had hoped.
The company still will be able to make money off them, in the view of analyst Janakiram MSV, because it knows how to monetize its enterprise editions, which offer extra features and a more cohesive, reliable way of deploying and maintaining them.
Pivotal is a sponsor of The New Stack.
Feature image via Flickr Creative Commons.