Piiano Flows Scans for Sensitive Data Leaks in Git Code
When his cybersecurity company NorthBit was acquired by Florida-based augmented reality vendor Magic Leap, Gil Dabah became head of security there, supervising the work of 700 engineers. This was back when Europe’s privacy regulation GDPR was going into effect, and people were drilling down into how data was being protected and stored, along with who had access to what.
“So imagine 700 engineers on a daily basis. They push like tens of thousands of lines of code, and you need to be able to answer what’s happening with the data, the customer data that eventually we have in our servers, right?” he said. “And it’s very hard to do that.”
This became the genesis for his new company Piiano, which takes its name from its focus on securing sensitive private data.
“This is where we got the idea [that] if there are other CISOs, or security managers out there in enterprises at very big companies … and they are chasing hundreds or more of developers, they are losing the battle. They will never be able to catch up to understand what’s happening with the data that they collect about customers,” he said.
Detecting Data Leaks
The Israeli company most recently released a static code analyzer called Piiano Flows specifically to detect leaks of personally identifiable information (PII), credentials and financial information in code stored in your online repository. It points to the Duolingo hack earlier this year as the type of data loss it’s trying to prevent.
Available for free, Flows continuously tracks when, where and how sensitive data are being used and stored during the development process, enabling teams to take a more proactive approach to security before that code hits production.
With so many apps obtaining and storing myriad data, it’s hard to keep track of the use of the different types of data that developers are dealing with. And it’s easy to make errors like failing to remove debugging logs or inadvertently exposing sensitive data through public or third-party APIs.
Flows is a code scanner for data, not for vulnerabilities like traditional static application security testing (SAST) tools, Dabah said.
“Companies like Snyk, Checkmarx, Coverity, they scan for vulnerabilities and they’re running in development. And what we’re doing differently … is focusing on data, but during development, so we give you everything you need from finding the issues inside the code, all the way into remediating them or rectifying them with our data protection APIs,” he said. Those can be encryption APIs or tokenization APIs.
Its proprietary natural language processing (NLP) model and taint analysis algorithms highlight any code that touches sensitive data, including incoming, outgoing and stored data to help find privacy and security issues as well as blind spots that appear at runtime.
Why NLP? The troublesome data in code can appear as just another variable, he explained. Say this code contains a name, but a customer name might or might not need to be kept private. NLP looks at the words surrounding the variable to determine the type of data involved, he said, so the words surrounding a country might indicate it’s part of an address that perhaps should not be made public. Flows provides prioritized recommendations for remediation based on risk.
You can connect to code repositories on GitHub, GitLab, Assembla and BitBucket. Piiano Flows is also available as a standalone container that you can run locally on your code base. Its access is limited to the repository code, not production environments or production data stores containing sensitive customer data.
Assume You’ll Be Breached
Piiano’s approach to data security in its initial product, Piiano Vault, is that organizations must assume their perimeter will be breached. But how do you protect data after a breach?
Just as a redacted document is of little use to an unauthorized person trying to read it, Piiano focuses on making the data unusable should a hacker get to it. It does this with techniques such as field-level encryption, tokenization, masking and granular access control. It provides out-of-the-box support for regulatory requirements such as data subject access rights, consent, retention, minimization, traceability and more.
“Piiano vault is a data privacy vault. You can think of it as data protection APIs to make it simple. It means that any developer from the application level can just encrypt the data, just one second before they store it in a database,” Dabah said, adding the encrypted data can be stored in any database. “Now, the good thing is if you do it from the application level, any attacker coming into the database directly or the file system or the hard disk is not going to be able to read the data.”
He also calls it data protection as a service rather than encryption as a service because “the default [in data protection] is a modern way to store data in in a secure way that also gets you out-of-the-box functionality for privacy. So you don’t need to implement all the software stuff that GDPR requires,” he said. It’s built with California Consumer Privacy Act (CCPA) and other data privacy regulations in mind as well.
Dabah maintains that data protection must begin at the code level.
“Doing everything from the code, that’s the only place where you have the full context from the software into what’s going on. Eventually software writes and reads data to and from the database, right?” he said.
“So if we do the encryption here, anybody that gets directly to the database just cannot do it effectively. So that’s what we’re calling shifting left data security specifically. Because what we’re saying is that it’s a shared responsibility that every developer that is writing the application should be able to protect the data directly from the application and not trusting any firewall or worse, API security.”