What’s Next: Managing Data in Cloud Native Environments

As information becomes critical in customer experience and business processes, managing data at scale is becoming increasingly important. The tolerance for exposure to data loss and downtime that blocks access to critical information has become low, while the efforts to get software faster and more resilient have focused on improving the management and simplifying the work of scaling large datasets that span multiple locations.
In this video presentation, we looked at the latest tools that are challenging the traditional approaches to meet the needs of data-intensive applications. We spoke with Wayne Duso, vice president of engineering at Amazon Web Services, Jordan Tigani, chief product officer at SingleStore and Kabir Shahani, chief executive officer at Amperity. The interviews were recorded at the AWS:ReInvent conference late last year. TNS Publisher Alex Williams led the interviews.
Watch our recap here and our lightly edited transcript of the video:
Alex Williams (host): Hey everyone, Alex Williams here with The New Stack, and today we’re talking about three things we heard at AWS re:Invent about data. What those three themes are, let’s take a look: Amazon Glacier and the accessibility of data in deep cold storage, application-first thinking and WebAssembly and machine learning (WASM ML) and how they are the new hotness. Number one: Glacier. AWS announced more features for Glacier, its cold storage, including the ability to pull data in close to real-time.
Wayne Duso, Amazon Web Services: We’ve come up with this storage class for S3 Glacier that allows them to retrieve that data once, twice, four times a year on average, in milliseconds by leveraging technologies that allow us to retrieve that data in that amount of time. And it is also important of S3 intelligent tiering capabilities. So, if you put data into S3 and you turn intelligent tiering on, that data will transition from various classes, all the way down through deep archive. So as your data becomes colder, that data can move further and further down. And you can put some policies around that so that for instance certain data will only go as low as say this new instant retrieval, or instant access here.
Williams: Number two: persistence. Veeam is establishing the data backup and recovery market. It’s secret sauce? Treat the application as the application; don’t treat all the individual elements as unique silos. Apps have many different kinds of storage and may be structured and may be unstructured. You might have object storage. And you see it in servers, such as VMware Data Mover which is applicable to both persistent and non-persistent data. It’s the application that matters.
The new hotness, SingleStore is a distributed relational SQL database management system that is now integrating support for WebAssembly. WebAssembly is a hot topic out there for those working in at-scale environments. The New Stack writer Mary Branscombe describes WebAssembly as a small, fast, efficient and very secure stack-based virtual machine that doesn’t care what CPU or OS it runs on. It’s designed to execute portable byte code compiled from code originally written in C, C++, Rust, Python, or Ruby, at near-native speed. WebAssembly doesn’t only run in the browser, it started on the client, but is proving very useful on the server as well. It makes it easier to work with data where it is so you can move the compute to the data itself.
Jordan Tigani, SingleStore: This allows you to put your business logic inside the database and allows you to not have to jump through hoops. So, if I write in Go, Rust, or JavaScript — in most other databases, you must align the programming language you’re using with the language of the database supports. And with WASM, you can run Python, you can run Rust or just about anything. It’s a kind of near bare-metal performance.
Williams: And in machine learning, the case is being made for models to ingest data from multiple sources without requiring a schema. No more understanding all the entities, all the sources and drawing some schematic with lines between boxes. Twenty years of that kind of work was enough, especially if the underlying data really can’t be used.
Kabir Shahani, Amperity: It turns out that a lot of the systems have bad data, dirty data, missing data. And by building a system that’s data first, and looking at all those systems, and the underlying atomic level pieces of information in each of those sources, traversing all of that, and using machine learning to understand the patterns in that data. One of the things that we did was, we commercialized a bunch of research from the University of Washington, the world’s leading expert in probabilistic databases is at the University of Washington. And we use his research to be able to figure out how we could train machines to intuit what is that data telling us about that customer and about that human being.
Williams: In summary, number one: we’re seeing the real need for data accessibility in deep cold storage systems, in particular Amazon Glacier. Number two: application first thinking is really taking root. And number three: all the new stuff from WebAssembly to machine learning models. Moving compute to the data is a path that’s beginning to emerge.