Five-hundred million. That’s how many individuals recently found themselves getting a notice that their personal information had been compromised in the recent Marriott data leak. The seemingly endless disclosure of major breaches (another 100 million from Quora was announced as I started writing this article) are causing an awakening among consumers and regulators.
While Marriott’s database had been hacked and malicious actors had unfettered access to its data, many companies struggle to maintain control of the private data that their employees, partners and customers entrust them with. The sad fact is that customers no longer trust organizations to protect their data and therefore are very concerned about the type and volume of private data that organizations hold. It’s not enough to claim security best practices. Customers want to know what and why companies have their private data.
Data protection is the responsibility of all of the technical teams at a company. But data storage administration and configuration are crucial in ensuring that the private data is protected, whether it be customer PII or your research team’s IP. Here are five tips to help ensure that data is handled responsibly.
1. Know What, Where, and How Much Private and Sensitive Data Is Held by the Organization
This is often easier said than done. Traditional solutions like Data Loss Prevention have promised to find and classify our data, but scalability issues, the inability to identify data in motion and lack of accuracy continue to plague DLP offerings.
Data knows no boundaries.
Modern technologies such as Docker containers and Kubernetes clusters running in auto-scaling cloud platforms such as AWS, Azure and GCP, can eliminate scalability issues. We often find the largest volume and highest rate of data collection to be in big data lakes. It can be very useful to make use of the compute power built into such data lakes in the form of map reduce jobs to scan, label and classify data at scale.
Data knows no boundaries. Private and sensitive data can be anywhere. Efficiency means having visibility into your data — whether it’s structured or unstructured, in a traditional DBMS or big data lake, out in the cloud or in your data center.
So, while you’re in the process of discovering data, it’s not enough to look only where it should be. You must have the capability to search, discover and classify data everywhere it resides.
2. Map the Data Journey
Data is often the currency of business today, which means that data is constantly moving throughout the customer or product journey. Data is either a byproduct of customer activity or is actively requested and collected. Data is bought and sold to other organizations. And as a result of this data in motion, private data can be exposed in channels that aren’t designed to hold it.
While I wouldn’t ever put my private details such as account number or password into the chat box that seems to pop up on every website offering to help, my mother does this all the time. While providing such a service is important and valuable to the business, monitoring data traveling through these channels is critical to ensuring that private and sensitive data is kept in the proper location and scrubbed from areas such as chat logs.
It’s imperative to identify all the places that data moves in or out of your systems. Watching data as it moves across all touch points can provide verification that data is flowing in compliance with regulations, policy, contracts, or other obligation. Monitoring data in motion can help you stay ahead of any problems.
3. Check to See That the Data That Should Be Encrypted Actually Is
Encryption is certainly not a panacea for all sensitive data issues. But at the same time, encryption can be a powerful mitigating control — but only if the data that should be encrypted is encrypted. If all social security numbers (SSNs) in a table are meant to be encrypted, are they actually? You have to check to make sure.
This should start with checking the accuracy of the initial discovery and classification effort. If it’s just assumed that all SSNs in a table are in the correct column and that column is encrypted, can you be sure all SSN’s are encrypted? Data often finds its way into unexpected places. This leads not just to problems of encryption, but also mis-classification and mis-categorization. And this can be the data most vulnerable, as it’s often not watched as closely, leading to the next point.
4. Don’t Stop with Users
Sensitive data is not always attached to users, so, don’t limit your search to user-based data. Also, consider derived sensitive data as seemingly innocent data points can lead to very private information.
Organizations generally focus on user accounts and the data associated to those accounts. But as discussed earlier, data is often misplaced. Is an SSN any less sensitive because it’s not linked in the database to a first and last name? Of course not. On the other hand, seemingly non-sensitive data can become sensitive when it is linked to a user.
For example, it’s unlikely you have religion listed with employee names in your HR database. But you probably do have requested days off. It’s often easy to derive a religious preference from the PTO days an employee requests. And while this data might not be used or even understood by the employer, it will certainly be understood and used by a third party who might have access to this seemingly innocent data. Sensitive data is sensitive data and should be treated as such.
5. Know Your Data Obligations
Private and sensitive data comes with obligations from regulations, external requirements and internal policies. How do you know if you’re meeting all of these?
You’re most likely familiar with obligations in the form of internal policies. These policy obligations might be regarding which data elements should be encrypted, what data should be backed up, and the service level agreements on the restoration of such data.
And you might also be familiar with regulatory obligations like General Data Protection Regulation (GDPR), Health Insurance Portability and Accountability Act (HIPPA), Sarbanes Oxley (SOX), Payment Card Industry Data Security Standard (PCI-DSS) and others.
However, obligations can also be contractual. Are you buying, selling, or otherwise transacting data with other third parties or partners? There are typically contracts in effect that place obligations on that data. Obligations can also be public statements, such as a privacy statement made on the company’s website. A data privacy strategy should include visibility into such obligations and evidence that they are being met. Knowing the relationship of data to the obligations on that data can certainly make life easier when questions arise.
Building a system that protects private data is crucial. Whether you’re spinning up a new development environment for a new venture or simply conducting an audit to ensure compliance with the shifting regulations and privacy laws, how you structure your data storage and management technologies can have a significant impact on your company’s success. Making sure you’re protecting all of your data from various sources at all times is essential — and failing to do so can be costly.
Feature image via Pixabay.
The New Stack is a wholly owned subsidiary of Insight Partners, an investor in the following companies mentioned in this article: Docker.