Data / Open Source / Sponsored / Contributed

4 Open Source Data Predictions for 2021

29 Dec 2020 9:00am, by

DataStax sponsored this post.

Patrick McFadin
Patrick is the VP of Developer Relations at DataStax, where he leads a team devoted to making users of Apache Cassandra successful. He has also worked as Chief Evangelist for Apache Cassandra and consultant for DataStax, where he helped build some of the largest and exciting deployments in production.

Over the years, I’ve worn all kinds of different hats in the world of technology. I cut my teeth doing digital communications in the US Navy, physically moving tapes of sonar data to minicomputers for analysis while floating on a destroyer in the North Atlantic. Data was important but old data wasn’t — and tapes were the fastest way to move it. After deciding to become a civilian again, I joined the internet-all-the-things wave in the 1990s, working as an Oracle DBA and developer for over 15 years. Never once did anyone want less data or slower transfer. It wasn’t open source though, which eventually led me to Apache Cassandra. I worked as a consultant and chief evangelist before taking the lead of DataStax developer relations.

I’m a huge believer in open source data. At a very core level, information wants to be free — not free as in beer, but free as in freedom. Open source makes data a commodity and gives users much more control over their environment.

If you are looking for some upside for 2020, it was the rapidly evolving relationship we have with data — and particularly how people are working with data on Kubernetes and creating stateful workloads there. More data and faster. Faster implementations. Faster insights.

Instead of worrying about the infrastructure needed to support applications, dev teams have the opportunity to instead focus on working with the data itself—which is super exciting to me because that’s what I’ve been after for most of my career. We have always gone through a lot of effort to make data usable, either physically like moving tapes or by building systems. I’m seeing how those days could be numbered and I couldn’t be more thrilled.

The story on open source data is just getting started. With that in mind, here are some of my predictions for what we’ll see in 2021.

1. New Ways of Storing Data on Kubernetes

Stateful applications require storage, which is where state is managed. Up to this point, storing data has been pretty loose in Kubernetes. But recently, there have been some significant changes to Kubernetes and the community wants the ability to pin storage to nodes.

Next year, I predict that we’re going to start seeing some really clever ways of doing storage in Kubernetes that maybe weren’t available on a physical box. Regardless of whether bona fide solutions emerge next year, we will at least see some real progress in this arena.

2. The DBA Role Will Continue to Evolve

Over the last few years, we’ve seen the role of the DBA (Database Administrator) evolve into the role of the SRE (Site Reliability Engineer). In 2021, we will see this evolution accelerate, with DBAs increasingly developing new skills and upleveling.

Looking ahead, SREs will continue to think about how to deploy a data service and how each service interacts with the rest of the stack. Since a Kubernetes deployment needs observability and monitoring to trace the calls from your database to your application, this is an area where I think DBAs/SREs will continue to focus.

I worked as a DBA for a long time. As a DBA, your job is to block developers from doing bad things to your database. That mentality is going to change moving forward. It has to. We’ll see more DBAs upleveling their existing skills. Helping organizations understand the role of data and the effect it has on the business from a quality and cost perspective.

3. OpenTelemetry Takes Center Stage

As Kubernetes continues to evolve, we’ll also see OpenTelemetry enter further into the mainstream.

We’ve all had conversations about how databases have their own world of monitoring. But that has to stop. No longer can we think of “an island of data.” With OpenTelemetry, you can see the river of data that flows throughout your environments. Data has a life-cycle and seeing every step of it will help SREs make better decisions.

OpenTelemetry is a collection of tools, APIs, and SDKs to instrument, generate, collect, and export telemetry data (metrics, logs, and traces) for analysis in order to understand your software’s performance and behavior. It has garnered broad language support such as Java,  C#, Go, JavaScript, Python, Rust and C++, as well as integrations with MySQL, Redis, Django, Kafka, Jetty, Akka, RabbitMQ, Spring, Flask and others.

4. Innovation, Innovation, Innovation

In November, DataStax released K8ssandra, an open source cloud native distribution of Apache Cassandra™ on Kubernetes. The solution is designed to make data cloud native.

To me, this is incredible. We’re finally getting to the point where deploying the things we need to run our applications can be accessed in essentially the same way you open up Chrome on your desktop.

In 2021, this reality is going to click for dev teams around the world. The lightbulbs will start turning on and I can’t wait to see what they are going to create.

Instead of spending weeks or months preparing the backbone of an application today, you can spin it up today. All of a sudden, you can wrap up an entire project in a couple of weeks. Think about the implications.

These nuggets were taken from an episode of the Open||Source||Data podcast I recently recorded with my colleague Sam Ramji, DataStax Chief Strategy Officer. To learn more about the future of open source data, listen to the episode.

Feature image via Pixabay.

A newsletter digest of the week’s most important stories & analyses.