Drill Now a Top-Level Apache Project
The Apache Software Foundation has elevated Drill to a top-level project similar to the likes of Hadoop, Spark and httpd. Drill entered the Apache Incubator in August 2012 and was first released to the public last August. Its team, including engineers at MapR, Intuit, Mesosphere and Hortonworks, has been issuing monthly updates.
Projects are promoted to top-level status when it is clear that there is a big community and the project is here to stay. This attracts more users as well as more contributors, explained Tomer Shiran, a member of the Apache Drill Project Management Committee.
He called the new status a “a major milestone for Drill,” adding, “We are excited that Drill will indeed be a game changer for Hadoop application developers and BI analysts alike.”
The upgraded status gives the project more visibility — it has its own website now at drill.apache.org and Apache’s press team is involved. At the same time, Drill has become an autonomous project that makes its own decisions about releases, processes and more.
Shiran heads the product management team at MapR, which has integrated Drill into its big data platform. He explains in a blog post that Drill grew out of a need for faster application development, using volumes of data from myriad sources in multiple forms.
Drill was designed as a schema-free SQL query engine for multiple data sources. including JSON, Parquet, and HBase. It not only allows rapid application development on Apache Hadoop, but empowers enterprise BI analysts to explore the data themselves – freeing IT staff from structuring the data for them.
“Enabling additional business units to use our Hadoop big data and correlating across our other big data sources from a single ANSI SQL interface will be incredibly powerful and require very little retooling. This will result in less custom application work and more efficient knowledge sharing throughout the organization,” Scott Russman, director of software development at managed security services firm Solutionary, said at Drill’s initial release.
HP, SAP and NuoDB are among those that have recently unveiled SQL-on-Hadoop offerings. However, Drill lets you analyze Hadoop data without ETL or creating schemas first; it generates schemas on the fly and keeps files in their original formats rather than converting them into tables or pre-specified formats before they’re loaded into the database system.
And unlike technologies such as Hive and Impala, it’s not limited to Hadoop. It supports storage plugins including local file system, HDFS, MongoDB, HBase, and others are under development. A single query can gather data from multiple sources.
As for Drill’s future, “We have a lot of new functionality coming. 1.0 will be released in Q1. New data sources like Cassandra and RDBMS are being added. We’re also adding more advanced SQL capabilities. We’re at a point now where we are working closely with some of the more sophisticated users and addressing new use cases like sparse JSON data,” Shiran said.
Feature image via Flickr Creative Commons