Flash-oriented Aerospike Now Supports GeoJSON
Before co-founding Aerospike, Brian Bulkowski worked for an advertising behavioral analytics company and saw that almost half the employees were there to maintain MySQL. Thinking there had to be a better way — and seeing the rise of flash storage — he and Srini V. Srinivasan created the company in 2009. Originally called Citrusleaf, it was renamed Aerospike in 2012.
“We built the company knowing that flash was going to happen and believing we could build the first truly Flash-optimized database that doesn’t just treat it like a file system, doesn’t treat it like RAM, but really treats flash like flash,” said Bulkowski, who is now chief technology officer for the company.
Asked to further explain, he said, ” You can’t just map it as RAM; it needs more parallelism. Flash fits well with the log-structured pattern within databases, so you can use some of the tricks you’d normally use, but certainly not all of them. And you certainly can’t do what some people have done and simply use a file system or basically memory map for this. We found that going directly to the device level, we could get a massive performance improvement.”
The company gained early traction with advertising companies but has since expanded its customer base to include the likes of Williams-Sonoma, Alcatel-Lucent and Kayak.
“Imagine the cases where you’ve got a huge analytics tier, you’re figuring out insights, you’re figuring out who likes what, you’re doing predictions. Then you need to move that to a real-time application because you’re on the Internet. Perhaps you’re doing product recommendations like Williams-Sonoma does with Aerospike. Maybe you’re trying to figure out what cab should pick up who based on likes. Or maybe you’re doing advertising personalization.
“Those are cases where you need a lot of database horsepower where you might start looking at a NoSQL database instead of a relational database, and cases where you might use Aerospike,” Bulkowski says.
The latest release of the database system, Aerospike’s version 3.7 muscles up on the geospatial support.
“At Aerospike, our marching orders are “speed at scale.” We think geospatial data deserves the same treatment,” Alvin Richards, vice president of product writes in a blog post, saying we’re entering the age of the “Internet of Moving Things.”
With the new release, Aerospike can store GeoJSON objects and execute various types of queries. It’s added the Google’s S2 library and Geohashing to encode and index points and regions. Queries can be combined with a User Defined Function (UDF) to filter the results, such as a search for bars, restaurants or churches near you and to further refine them, for example, to those that are open.
It’s added a demo code on Github in which the user tries to manage a fleet of drones to deliver letters and packages.
“It’s a category that’s crucial, but it’s the performance that’s been lacking,” said Bulkowski of the geospatial features on existing databases previously. He says the S2 library “allows us to use the super fast storage layer we have, then use RAM indexes for the geospatial layer inside the database using a lot of our core C technology.”
He says Aerospike can handle “tens to hundreds of thousands of requests per second even on a geospatial query.
“We’re especially fast on updates, so people who need to track something, like cab companies or logistics companies where the location is constantly changing, a lot of the old systems were analytics-based, so you could query at a reasonable speed, but you couldn’t write fast enough. So that’s the new age in NoSQL systems, these engagement systems.”
For developers, the company says its recently announced Async C Client 4.0 provides batching capabilities that provide dramatic improvements in efficient handling of bulk reads and writes.
Other new features include:
- Server-side list operations — Aerospike can store and retrieve lists and maps, and can allow developers to directly manipulate lists with a set of commands on the server while maintaining predictable performance at scale.
- Advanced public cloud stability — New algorithms to improve cluster stability for environments such as Google Compute Engine and Amazon EC2 as well as new features including improved management for data migration, the ability to add stop-writes threshold on a per-set basis and the ability to dynamically change unicast heartbeat addresses.
“What we found is that cloud environments in particular have very challenging networking and operational environments that we know. So we had to build new clustering algorithms that handle not just the fairly well-designed and maintained networks within enterprises, but also the noisy, congested networks that occur on public clouds,” according to Bulkowski, who said Aerospike has a record of very high availability and the ability to do live maintenance operationally.
“What we call adaptive clustering learns from the environment that it’s in to keep your database up even in these environments.”
Millions of Reads and Writes
Early adopter India-based Anya Soft, which is focused on intelligence for the consumer packaged goods industry, says its mobile app requires more speed than its current MongoDB infrastructure provides.
Anya Soft CEO Rakesh Reddy said there are actually two mobile apps: a simple mobile Point of Sale (mPOS) system for supermarkets and another providing an interface giving the general public access to data on available stock, price and offers in each store using its software. Both are free.
Asked how they use geospatial data, Reddy said that working with a latency target of under 200 milliseconds, no other database serves its purpose better than Aerospike.
“We map the exact location of every store connected to us, which would be in the millions, and we map every transaction occurring in those stores in real time by each product. And we give general consumers on a real-time basis the availability and price details of various products that they want to buy or just check the availability of for a later buy,” he said.
“Besides this, we produce actual and analytical data for each store or each street or each precinct, for a set of products that a particular manufacturer might be producing or that of a set of the manufacturers’ competitors, on a real-time basis, primarily tagging the data to geo-locations, providing sales guys with perspective and better understanding of the ground-level data.
“We are literally talking about handling several million reads and writes on several million nodes per second. Simply, put none better than Aerospike exists in the market today that can handle such amounts of data with such a small latency window.”