Data Engineering Survey Shows Security, Access Challenges
Immuta recently publicized the findings of its 2023 State of Data Engineering Survey. The report details common access and security obstacles organizations encounter while attempting to maximize investments in data-driven processes. The 2023 version, which was conducted by Vanson Bourne, is the third edition of this annual survey.
With responses from approximately 600 participants in Europe, the Middle East, Africa, and America, the report reveals that data access is a common inhibitor for organizations. Specifically, it indicates “how they’re wasting 6-10 hours per week to manage [data access] and it’s driving them nuts,” said Steve Touw, Immuta Chief Technology Officer. “Because of that, people are missing out on business results. And, they also feel like they don’t have the power in the cloud to be able to do what they need to do to make this all work.”
Other pertinent challenges uncovered in the survey pertain to issues of data ownership, federated models of data access and data governance, and the influx of regulatory compliance demands. The findings suggest that solutions that scale policy management, abstract access control policies from compute environments, and federate policy management can surmount these difficulties.
Data Access and Security Challenges
According to Touw, the survey focused on data engineering because it’s typically part of the data platform team that supplies organizations with data. Traditionally, data platform teams were comprised of IT personnel tasked with creating and managing policies for secure data access. The data platform teams “need to create those data products, if you will, that the downstream consumers, like the ML/AI engineers, need to do their jobs,” Touw explained. “That’s where all this complex layer of policy management comes into play.” The survey identified several areas of complexity for such policy management, including:
- Data Ownership: The findings show that simply determining which individual has ownership over a particular dataset, and soliciting his or her approval for data access, is far from straightforward. “It’s kind of all over the map,” Touw said. “It’s spread across IT, security, data compliance, privacy, [and] legal.” Moreover, a federated approach to data ownership and access, typified by the data mesh architecture, aggravates this issue. “Many of these organizations want to move to a federated governance model, where individual teams that own their own data products need to manage their own policies, but they struggle with the IT teams in charge,” Touw observed. “So, how do they federate?”
- Technological Implementation: The sheer number of sources, and different technologies involved for each environment, adds to the underlying complexity. According to Touw, managing this dimension isn’t easy because of “legacy approaches like Role-Based Access Control (RBAC) and role explosion.”
- Traditional Responses: Traditionally, organizations respond to these challenges by either failing to adequately control data access, or doing so too strictly. “Most take the conservative approach, which is why there’s a finding in [the report] about missed business results,” Touw commented.
The survey’s results suggest modern data access approaches can overcome these obstacles. Attribute-Based Access Control (ABAC) — in which dynamic runtime decisions are applied to access according to data attributes — provides this benefit in three ways:
- Scalability: Users can scale policy management with ABAC more than they can with RBAC, which involves constantly creating user roles and permissions for each requirement. “You can deal with maybe one or two roles, but it typically takes hundreds or thousands of roles,” Touw mentioned.
- Write Once, Deploy Anywhere: Properly implemented ABAC also reduces the time spent writing policies as executable code in sources. Instead of coding policies for each source, “You build the policy once and it works in all places, rather than dealing with individual implementations in every database,” Touw specified. Users can write one policy and deploy it in Snowflake, Databricks, and other sources.
- Federation: ABAC is also primed for federating the policy management of secure data access controls. It supports respective domains issuing data products via controlled access through the data mesh architecture. Business units “can build rules and those rules apply only to those tables, but you layer on organizational rules that span all the data domains,” Touw noted.
The results of the 2023 State of Data Engineering Survey also imply there’s a convergence of different aspects of data access. “Access control and security are merging,” Touw reflected. “I think these privacy regulations are one of the reasons for that. Now, the data team, the data owners, don’t need to only think about outsiders stealing their information. They have to implement all these rules for inside employees, too.”