Database Administrators (DBAs) ask themselves the same questions every day, just with different data: Where did this come from? Where did it originate? Is this U.S. data or European data? Is it sensitive data? In an increasingly regulated world, knowing the answers to these questions could mean the difference between smooth sailing or millions of dollars of fines. DBA’s are facing data discovery challenges on a daily basis, and difficulties will only compound as data volumes continue to grow.
Coupled with this, emerging regulations are on the horizon and will stack on top of existing data protection laws — creating even more complexity. To get ahead of even more data discovery struggles, DBAs will be on the front lines of implementing a comprehensive strategy for database management now, to ensure that all data is accounted for and meets the appropriate regulatory compliance.
To help DBAs along the sensitive data protection journey, here are four keys to smart data discovery:
- Understand that GDPR and incoming data regulations impact you.
This sounds obvious, but when the European Union General Data Protection Regulation (GDPR) was finally enacted, many organizations realized too late that GDPR did in fact impact them – and it led to stalled services in Europe. It’s tempting to take a similar mentality with the California Consumer Privacy Act (CCPA), but there are greater risks to not taking steps toward compliance now.
With less than a year until CCPA takes effect, and with the New York laws in the works, organizations that can get ahead of the curve now will avoid consequences later. It’s important to proactively assess your data, how much of it needs to be active and who has access to it, to get ready for the tasks ahead. Once you fully grasp how much data you’re working with, you can then focus on how much you really know about that data.
- Understand how different regulations define data differently – and know how your data fits into those categories.
In short, know your data. If you don’t know the kind of information that your databases contain, you can’t take the right steps toward compliance. Is the information sensitive? It may turn out to be company data meant to stay within the organization, or even within certain employees of the organization. It may be sensitive data pertaining to a customer or be personal data pertaining to an individual. GDPR has definitions for personal and sensitive data, while other regulations like HIPAA have their own – and it wouldn’t be surprising if the CCPA and New York Data Privacy legislation introduced even more ways that sensitive data is categorized.
Understanding what it means for data to be sensitive is ultimately just one side of the coin in knowing your data. In order to really know your organization’s data, you have to know where it lives. This means that first and foremost, you should be able to track data back to its original source. Its original source consists not only of its geographical location — whether the data was sourced in the U.S. or abroad — but also from which device, system, application or database it was collected from. Some data is streamed in through a network unique to the enterprise, and other times it can flow in from a web app or a particular endpoint in the system.
Knowing where your data comes from and how it’s categorized will enable you to have more flexibility and agility when it comes to organizing your data for compliance. This begins with location, and the different ways that location can be interpreted.
- Don’t operate alone: Seek the help of other internal teams.
You can’t know everything about your data and its whereabouts without the help of internal teams, such as DevOps, data protection and other application and infrastructure teams. Collaborating across departments will empower you to understand the movement and origins of the data.
Data is rarely static, but moves around and is used for analysis in different places. The various departments all handle data in different ways at different stages, so working together to retain visibility will keep data — and your organization — protected from malfeasance. The effort to unite forces may require a shift in culture to move from working in silos to a culture of collaboration.
One strategy to achieve this is to assign someone on each team to tend to the needs of data protection and compliance. This way, there’s a protocol in place to check compliance during the process. For example, you can designate a team member from the DevOps team as the “data protection (DP) lead,” who is responsible for coordinating data processes with you during the software development stage. The DP leads of various departments can then maintain open communication with each other, including the lead DBA of the organization, and keep everyone in lockstep to understand where data moves. Sometimes, data is streamed in through systems outside of the main database — for example, Office 365, SharePoint — and as with other data, it’s important to have a DP lead that can fortify data management in these systems, too.
Also, if your company is subject to GDPR and has an appointed Data Protection Officer, ensure you work with them to ensure you are implementing protective measures which correctly follow GDPR regulations.
- Utilize automation to track data movement, and access, in real time.
Traditionally, data has been stored in one place — the database — with backup copies on physical media. Fast-forward to today’s digital era, and data is continuously replicated to other locations, including the cloud, and even across multicloud deployments. That continuous movement makes it more difficult to identify and protect personal data. With so much passing of the baton, certain data can end up under different authorizations or accidental proprietors. This can happen at any stage in the pipeline, or in any department.
To combat this challenge, it’s important to consider using automation for its ability to seamlessly monitor the movement of data, including the details of who has access to it. Automation can greatly reduce repetitive processes of managing data, and can take the stress off of data teams so that they can pivot their focus toward results-driven initiatives like analytics. Using automation can also make it easier to track structural changes in the database, as well as changes to user access rights, in real time as data moves to different locations.
Simply put, automation ties up loose ends multitudes of times faster than tracking processes manually, and it more efficiently ensures that only the right personnel can view or manipulate the data. Besides day-to-day data management concerns, most organizations’ top priority is to protect their assets. Using automated routine checks can flag instances where personnel outside of the organization may have received access to data, and gives time back to employees to ensure that steps can be taken to secure data before a breach occurs.
Government entities and consumers are increasingly scrutinizing data privacy and protection measures, and it’s more necessary than ever that DBAs be prepared to track and manage their data, wherever it lives. As the components of data infrastructure evolve across multiple clouds and departments, so do the responsibilities and strategies of the DBA. Ultimately, the goal is to empower all DBAs to be able to discover data, protect data, and ensure it’s used properly to further the business.
Don’t know where your data lives? It’s time to find out.
The New Stack is a wholly owned subsidiary of Insight Partners. TNS owner Insight Partners is an investor in the following companies: Real.