Crate Built a Distributed SQL Database System To Run Within Containers
Crate Technology has designed a database system for supporting Docker containers and microservices. The technology stresses ease of use, speed and scalability while retaining the ability to use SQL against very large data sets.
Crate was built to run in ephemeral environments, said Christian Lutz, Crate CEO. It was the ninth official Docker image in the Docker Registry and has been downloaded more than 350,000 times in the past six months. It can be managed with Docker tools, or with Kubernetes or Mesos.
“We believe that microservices architecture will definitely win,” Lutz said. “What you do with application containers already in production will very quickly create a requirement for databases [in containers] as well. If you don’t have the architecture for that, you can’t run in a scalable way in containers. You can take a database that doesn’t work in a container — obviously, SQL works in a container — but you have one node and one volume, but you can’t add hundreds of nodes. That’s why I think it will be very important when people start to put the database into containers.”
Solomon Hykes, founder and chief technology officer of Docker, is among the company’s investors; it just raised $4M from Dawn Capital and had garnered additional investment from Sunstone Capital, DFJ Esprit and Speedinvest. With this investment, it plans to open offices in San Francisco/Silicon Valley and Europe.
“We saw this challenge when people run a scalable, open source big data back end, which requires you to combine a bunch of technologies that you have to keep in sync and manage. And you need strong experience and good people to be able to run and scale,” Lutz said.
The architecture does not deploy the master/slave configuration used in most relational databases, thereby setting the stage for running the database system as a distributed system. It’s meant to allow companies to maintain their existing investments in SQL yet avoid expensive proprietary hardware and licensing fees through the use of commodity hardware and cheap cloud services.
“We built with the architecture called ‘shared nothing,’ which means you don’t have any master. Any node can be a master,” he explains. “This makes it very fast because you don’t have to wait for a master to acknowledge, and it makes it super easy to scale because you add 10 machines, 100 machines to horizontally increase your speed and performance. Usually, this is not achieved with standard SQL.”
It’s designed to be largely self-configuring and self-healing, so if one node fails, it automatically fails over to another node. “We wanted a database that just never goes down,” Lutz said.
Crate brings together a number of open source components, including the Elasticsearch search and analytics engine, Facebook’s Presto SQL parser engine, Lucene indexing and search technology, and the Netty asynchronous event-driven network application framework, according to Jason Stamper, analyst at 451 Research.
While companies could put those elements together themselves, it’s a lot of work and even then, the data platform would struggle with real-time writes rather than just reads and lack the distributed query engine of Crate, Stamper said.
The company was founded in 2013 by Lutz, Bernd Dorn, Jodok Batlogg, who was the former chief technology officer at StudiVZ (known as Germany’s Facebook). When Stamper met with the German company in September 2014, it was positioning the distributed database as being most suited to the storage and analysis of time-series data, such as that produced by sensors or machines used in the Internet of Things. In 2015, however, the company started to position the database as the perfect complement to Docker.
Stamper points to a number of databases that can be used with Docker. Couchbase Server can run under Docker using Triton, Joyent’s Docker container service. And MariaDB also supports Docker containers. PostgreSQL, MySQL, Redis and MongoDB all can be containerized, though they are not designed to be easily configured and run within containers, Lutz argued.
The company has spent the past year working with customer issues related to using Crate in production. Early customers include travel booking service AVUXI, which uses Crate to analyze 20 million geolocation events a day, and cloud security vendor Skyhigh Networks, which runs 4 to 5 billion records per day, he says.
Sekhar Sarukkai, Skyhigh’s chief scientist and co-founder, says Crate’s real-time SQL aggregations, its simple scalability and high availability make it a key element of his company’s stack.
One of the trends he’s seeing among customers is the convergence of operations data and analytical data, rather than using separate databases for the two. “I think it’s not enough anymore to run batch-oriented reports. It’s about knowing in real-time, what’s happening with your data,” Lutz said.
Lutz said Crate is ideal for customers in IoT and high-growth mobile and web applications – “companies that have to store very fast a lot of data at the same time allow tens of thousands or hundreds of thousands of concurrent users to use this data.”
He likes to compare Crate to a racetrack.
“If you design the data store with Cassandra, you decide what kind of queries you’re doing, how many will concurrently run and all these parameters, and it’s a racetrack where you can run very fast and efficiently query data. But the moment you change things, the nightmare begins,” he said.
“Crate lets you make changes ad hoc. You don’t have to already know what you’re going to search for. You can change your schema as you develop your application. You can add columns and indexes at any time and it makes it a very flexible and easy-to-use product.”
The database market is growing rapidly with a rash of new entrants including SQL database CockroachDB, NoSQL Riak KV and others. Antony Falco, CEO and co-founder of Orchestrate, which was acquired by CenturyLink Cloud, talked with The New Stack about the proliferation of databases of late.
Lutz says many databases are built around a specific use case and enterprises don’t see the difference between them. One of the challenges, for NoSQL companies, in particular, he says, is differentiating themselves from the pack.
Docker and Joyent are sponsors of The New Stack.