MariaDB has released into general availability ColumnStore 1.0, a storage engine allows users to run analytics and transactional processes simultaneously with a single front end on the MariaDB 10.1 database.
While the open source community pushed back against the licensing model of MariaDB’s MaxScale database proxy, ColumnStore is open source, licensed under GPLv2.
ColumnStore is a fork of InnoDB database engine, the default storage engine for MySQL. Despite the popularity of SQL alternatives, the company notes that every OLTP (online transaction processing) and analytics solution has been building a SQL layer.
So it set out to combine InfiniDB’s OLAP (online analytical processing) engine to MariaDB’s OLTP (online transactional processing) engine to enable users to run analytic queries on production data in near real-time.
It’s certainly not the first database to combine the two functions. Big players such as SAP, Oracle, Microsoft, IBM, and Teradata offer this capability along with upstarts including VoltDB, NuoDB, Clustrix, MemSQL, Splice Machine, CockroachDB and others. Premium MySQL support provider Percona was talking about column stores back in 2010.
“What we heard from customers was that they wanted to use their own toolset. They were a Tableau customer or Pentaho customer. It’s really hard to use those effectively with things like Hadoop. ColumnStore provides ANSI SQL, the world’s most popular query language,” said David Thompson, MariaDB vice president of engineering.
How it Works
InnoDB or the other default MySQL storage engine MyISAM provide reasonable performance on analytical query workloads up to 100,000 rows or tables under a million rows. Performance is harder to tune and maintain beyond that, Thompson wrote in a blog post.
ColumnStore is suitable for reporting or analysis on much larger data sets. Mobile applications of customer Pinger, for example, process millions of text messages and phone calls with ColumnStore, in addition to more than 1.5 billion rows of logs per day.
ColumnStore has a data-ingestion rate of up to 1 million rows per second.
It has two components: User module (UM) and production module (PM). UM runs on MariaDB server as execution manager (ExeMgr). User modules assign tasks to parallel performance modules that access the columnar distributed storage layer below. As the data volume grows, you to scale both sets of modules.
Being multithreaded, UM can handle the results coming back in a different order, and multi-task as multiple PMs work in parallel and make a final aggregation before sending the results back to the client. This improves performance. Redundancy and high availability are built in.
With ColumnStore, the company touts:
- Lower cost of ownership
- The ability for MariaDB users to turn on the columnar engine for a particular dataset and run analytics from the same front end as their transactional system.
- Providing ANSI SQL capabilities including queries for complex joins, aggregation and window functions. It also works with Hadoop’s HDFS file system.
- Out-of-the-box connection with BI tools through ODBC/JDBC, as well as the standard MariaDB connectors.
- It also works with R for advanced statistical analysis.
It includes SSL support, audit and authentication plug-ins and role-base access.
Acknowledging that SQL is not the best choice for machine learning and data discovery, it plans to integrate Apache Spark libraries like MLLib into the product. It also plans to add a regression window function and full integration with MaxScale database proxy tool.
“Whilst it is true that some others … have been able to offer both transactional and analytic capabilities through the use of in-memory approaches that give transactional databases the necessary speed to also run analytics, MariaDB has particularly good compatibility with MySQL, which is a huge base of users for it to go after,” said 451 Research big data analyst Jason Stamper.
Running analytics nearer to the operational database has several advantages — reduced latency of analytic queries, less complexity than managing separate analytic and transactional databases, and the potential to reduce or eliminate the need for an extract, transform and load (ETL) stage that moves data from operational to analytic databases, and can be expensive, he explained, noting big players Oracle Database, IBM DB2 and Microsoft SQL Server among those offering in-memory options.
Open source vs. BSL
The license sets a data at which the code will become open source, but until then it’s proprietary. That set off a furor in the open source world, which prompted a fork and competing project from a Percona employee.
Thompson said the company has decided storage engines should be open source, but the company might use the BSL in the future to “create value at the top.”
There are many open source licensing options, Stamper pointed out, adding that he doesn’t think it reasonable to say that all MariaDB technologies should be open source.
“After all, the company needs to generate revenue to invest in MariaDB Server — it isn’t a not-for-profit organization, and maintaining compatibility with MySQL comes at an engineering price as MySQL is being improved all the time.
“MariaDB needs to keep up, if not out-innovate it. … It can’t make such innovations and continue to improve its products going forward, with a pure open source (or free) model, and it is not alone in this. EnterpriseDB, which offers support and additional features for PostgreSQL, and also Oracle’s MySQL, have an almost identical approach. Basic features are free and open source, but at some point support and additional enterprise features need investment from customers, or the whole business model falls apart,” he said.
Feature image via Pixabay.