Data / Development / Machine Learning / Contributed

Socialize Your Data? Why DataOps Improves Data Ethics

3 Jun 2020 11:16am, by

Antonios Chalkiopoulos
Antonios Chalkiopoulos is the CEO and co-founder of Lenses.io. With nearly 20 years of engineering experience, Antonios has led and contributed to big data and digital transformation projects in finance, media and government for organizations including Barclays, BSkyB and the Ministry of Education for Greece. Antonios has a BSc in Computer Science from Hull University, United Kingdom and a Master of Science in Distributed Systems and High Speed Networks from the University of Oxford.

One of the key findings for greatly increasing the success and adoption of streaming data architectures is socializing your real-time data.

Offering transparent and secure access over business and technical events to every single person within your organization might sound daunting initially, but the benefits and evidence show that this is the single most important step in truly democratizing your data projects.

The more eyes you have over your data, the more you can bring people and data together to deliver the best possible digital customer experience.

Adding visibility, and the “right lenses” on real-time data, can pay significant dividends to both security, data quality and business processes. Here are some notable examples that demonstrate that.

  • A junior quality assurance (QA) team member, on his first week at a company discovered a data quality issue (some events on a Kafka topic had null rather than the proper payload), that was affecting a production business process used in marketing. This poor digital customer experience was successfully rectified before the end of his first week.
  • A security engineer discovered that an application was leaking security tokens onto the logs at a DEBUG level, opening up the possibility for a security compromise one of the central databases with customer data.
  • A support engineer in an IoT company discovered that an industrial machine was unusually peaking temperatures, and issued a technical service request before this escalated into affecting the factory’s production line.

Democratizing Your Real-Time Data

Democratizing your data is now the holy grail for many modern organizations. I recommend solutions with a low knowledge threshold, such as SQL, that help make data accessible to a wider array of people in the business. But democratizing data isn’t enough.

We need to push our thinking beyond democratizing. We need to be thinking and talking more about data ethics. And to achieve that we need to both democratize and socialize our data in real-time.

An industry standard, role-based access control (RBAC) is just not enough. True benefits can only be realized when we consider the whole application and data fabric. How data is stored, transmitted and processed are all three important elements to socialize in an organization (so that through transparency we can all aim for data ethics).

And the data landscape is very rich: Multiple personas, multiple data storage and messaging technologies, multiple data processing technologies and multiple business processes are already in place.

Is your CDO (Chief Data Officer) or CIO (Chief Information Officer) pushing your Data Product Engineering into socializing your data and tackling the real data challenges in a future-proof way?

The DataOps Approach

DataOps is a new approach delivering data agility and data intensity, by focusing on socializing your data, in order to bring people and data together and realize efficient data usage and ethics.

Let’s be a bit technical, and consider the authentication layer. There is a vast amount of technologies and solutions that have been successfully adopted: LDAP, AD, SAML, oAuth, Kerberos, mutual certificates as an example. They bring many benefits such as Single-Sign-On via SAML, group assignment in AD or Okta.

Is Authentication and RBAC the Ideal Abstraction?

Think of a new employee joining a team. Are we promoting data intensity when it will take weeks for us to technically provide him or her access across a multitude of development and production environments (i.e. DEV/SIT/UAT/PROD) on-prem and on-cloud? How much time and resources do we spend to provide access to our Kafka, Elastic or MongoDB environments?

How Can We Realize an Ideal Future?

DataOps principles aim to think of people and data first. People are stewards of data: responsible, accountable or informed. They own the data, and data should be logically defined (i.e. via data-centric security) and then access to the data is delegated via appropriate owners.

In this DataOps world, people are not concerned about technicalities (i.e. creating, managing or revoking digital certificates) but focus on operating on the data level, via logically grouping and tagging their data meaningful.

That is the way of DataOps. It’s about abstracting away from the underlying technology and delivering a future-proof and data-centric (not a technology-centric) solution. In that world, a new employee needs to just join a group and automatically inherit the right amount of access across the entire data and application fabric that is in the scope of his or her projects. It’s what fits the new event-driven data-mesh approach.

MongoDB is a sponsor of The New Stack.

Feature image via Pixabay.

A newsletter digest of the week’s most important stories & analyses.