Facebook’s Presto Big Data Query Engine Moves to The Linux Foundation
“We want to put the power in the community’s hands to make decisions and influence the technology’s direction,” she said.
Facebook open sourced the technology in 2013.
This project is entirely different from another foundation formed back in January for the same Big Data SQL query engine, said Michael Dolan, vice president of strategic programs at The Linux Foundation. That entity involved its creators Martin Traverso, Dain Sundstrom and David Phillips. The other Presto foundation has not responded to a press query, as of the date of this post.
The Linux Foundation-led effort will be working through a process to become a more neutral, community-focused open source project beyond Facebook’s plans for the project, Dolan said.
“Over the next year, I think you’ll see the community coming together around forming norms and ways to build a consensus on the project,” he said. “And as more and more companies have taken a look at Presto and become dependent on it, we expect to see more voices coming into the community as well.”
This group’s core companies, Facebook, Alibaba, Twitter and Uber, use Presto across thousands of machines and at petabyte scale, according to Nezih Yigitbasi, software engineering manager of the Presto team at Facebook.
There are plenty of competitors to Presto, including Apache Drill, Apache Impala, Spark SQL, Apache Hawk, and one of the more recent open source options, the GPU-accelerated BlazingSQL.
It solves the problem of having to choose between a fast, but expensive commercial offering or a free, but slower option that requires excessive hardware, according to the group.
In comparison to other open source choices, Presto is usually more efficient in terms of resources, according to Yigitbasi. It’s noted for high performance and scale.
“At Facebook, over a thousand employees use Presto, running several million queries and processing petabytes of data per day,” he said.
A single query can be used across data from multiple different storage systems without the need to move the data. It can query across data sources both on-prem and in cloud repositories including HDFS, Amazon S3, Kafka, Cassandra, Postgres, Oracle and Redis.
And it’s very reliable, Yigitbasi said.
From its use at these big companies, “Everything is battle-tested and proven to work correctly.”
The Linux Foundation is a sponsor of The New Stack.
Feature image via Pixabay