Modal Title
Data / Python / Security

Satori Launches Universal Data Permissions Scanner

Users can point and click at an array of sources to quickly glean which users have access to which data assets.
May 11th, 2023 7:00am by
Featued image for: Satori Launches Universal Data Permissions Scanner
MongoDB is a sponsor of The New Stack.

Satori recently unveiled its Universal Data Permissions Scanner (UDPS), which was designed to simplify visibility into data access. The open source offering allows organizations to point it at different repositories and scan them to see which users have access to which data assets.

At present, UDPS supports Databricks, Google BigQuery, Snowflake, Amazon Redshift, Amazon S3, and MongoDB. The sheer scalability of these sources, the number of users they have, and the amounts of data they contain, can present complications to understanding the basics of data access to them.

According to Satori CTO Yoav Cohen, “The fundamental question UDPS answers, which is hard to answer for most companies, is who has access to my data and how did they get access? This is a hard problem because each database has its own authorization model.”

Some of the different authorization models used include Role Based Access Controls, Attribute Based Access Controls, group permissions, and more. Also, even if sources involve the same access model — which is the case with BigQuery and Snowflake — their implementations are oftentimes “completely different,” Cohen pointed out.

The point-and-scan functionality of UDPS, in addition to the homogenous nature of its output across sources, reduces the complexity of understanding data access concerns. Organizations can employ this information to fortify various aspects of data governance, data protection, data privacy, and regulatory compliance.

An Open Source Engine

UDPS is a permissions scanner engine that relies on a Python command line utility. Satori has furnished connectors to the aforementioned sources. Once users point the scanner at their database of choice, “you just have to give it the credentials to scan those tables in the database where authorization information like permissions is stored,” Cohen said. “Then, the tool performs the scan and provides the output with all the authorization paths for all the users to all the data assets.”

Oftentimes, those assets involve specific tables. They also include the bevy of data objects found in cloud storage buckets. The authorization path between users and the source data effectively standardizes results across the sources. The entire process “leverages the existing functionality in the databases,” Cohen mentioned.

The Authorization Path

The universal nature of the scanner’s output via the authorization path is pivotal for comprehending data access. This benefit is particularly meaningful because “every database has its own way of describing and defining authorization,” Cohen commented. “You have Role Based Access Control in BigQuery and you have roles. Then you have Snowflake and you’ve got completely different roles.”

Other points of differentiation include the usage of data according to the roles and the people in them. Because the authorization path output of UDPS is the same despite these and other distinctions in sources, “you can aggregate those outputs into a single view and have a unified view of authorization across your data landscape,” Cohen remarked. “This is super important because the security team doesn’t care if it’s BigQuery or Snowflake. They just want to understand who has access to what data.”

Manual Methods

Moreover, they want to understand this information quickly. The automation UDPS supplies is preferable to ad-hoc, manual methods of determining user access, which is exacerbated when doing so for what Cohen termed “thousands of users across tens of thousands of data assets.” In addition to the lack of consistency in how authorization information is presented across sources, oftentimes that information is not stored in a readily consumable manner.

This level of complexity can prolong efforts to ascertain, for instance, something as simple as if a particular user has access to a certain table. “I would go into Snowflake and… I would have to map all the roles that he’s a member of and all the roles that they inherit because in Snowflake they have role hierarchy,” Cohen explained. “Then I’d have to find out which role provides you permissions to the customer’s table.” Such a recursive process is far from straightforward, is time-consuming, and simply isn’t scalable for modern enterprise deployments in these predominantly cloud-based sources.

An Evolving Landscape

With contributions from the open source community and Satori, UDPS has the potential to obsolete such manual approaches. It’s also a credible addition to the limitations of traditional data access platforms in which users can understand access for data on-boarded into the platforms — but have difficulty doing so with data that is not.

UDPS can assist with this issue. “As an industry, we’re expanding our scope of visibility to not just help companies focus on what’s under the flashlight, but also things in the periphery,” Cohen said.

Group Created with Sketch.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.