Data / Programming Languages

IBM Offers Companies Help on Pending EU Data Governance

22 Jun 2017 1:51pm, by

Companies in the European Union are counting down the days until May 25, 2018. That’s when the EU formally adopts the new General Data Protection Regulation (GDPR). The update to this large data protection reg extends its protections to all companies that deal in data generated by EU citizens. It steps up protections for the storage and security around personal information.

As such, American enterprises dealing in EU personal information will also have to countdown to May 25, 2018. That’s why IBM has just released a new set of data protection services and software solutions, and even formed a consortium designed to kickstart the Apache Foundation’s Atlas data governance framework.

IBM had reasons for choosing Apache Atlas as an open source data governance framework, explained Seth Dobrin, vice president and chief data officer of IBM Analytics.

“We wanted an open source operating system for our Unified Governance Platform and there were really three viable options: start from scratch, use Atlas, or use Ground. In reality, starting from scratch would be a last resort. The two remaining options Ground and Atlas are both rather immature and it came down to two things: the ecosystem around Atlas was more mature (i.e. Ranger and Knox), and some of our clients were already building off of Atlas. But in reality both options need a lot of work to make the Enterprise ready,” said Dobrin.

But why does it take a consortium to improve an Apache project?

“We are forming a consortium to solve a real set of problems. That set of problems is all around governance. The real aim is to utilize an Apache project as the operating system for our Unified Governance Platform. This biggest road block to integrating the data environment of an enterprise is getting a seamless integration of proprietary metadata platforms,” said Dobrin. “We plan to leverage Apache Atlas as a metadata virtualization layer to allow the enterprise to consolidate their understanding of metadata without having to consolidate all the metadata. The goal for IBM then becomes to build value-added capabilities on top of this OS. It’s very similar to how we have leveraged Apache Spark as the OS for our analytics platform.”

Hortonworks is also participating in the effort.

Dobrin said that IBM is submitting some proprietary code and committing half a dozen developers to the project. The company “plans to grow that number; Hortonworks has committed 10 developers, and other clients are committing developers, specifications and testers.”

“The outcome,” said Dobrin, explaining the hoped-for end game, “Is an open source metadata virtualization that allows enterprises to get a holistic view of their data landscape at the metadata level. This provides the ability to stand up a searchable, ‘shop-for-data’ data catalog. We then intend to leverage our machine learning capabilities to automate the heck out our mundane tasks related to metadata discovery, data quality and master data management.”

The expanse of the EU’s new data protection laws also prompted IBM to release its own new data governance package: the Unified Governance Software Platform. The platform offers metadata analysis, data lineage tracking, integration services and policy enforcement to help enterprises keep personal data secure. The system is built on part atop the IBM StoredIQ data visibility tool, which received a pack of modules for identifying sensitive data across 15 different EU languages.

IBM is hoping to apply its experience in machine learning and data science to governance as well. The IBM Data Science Experience online data collaboration and workbook platform launched inside a London data center today, bringing the service to the EU and UK for the first time.

In its cloud offerings, IBM also announced, today, that it had added JSON support in IBM DB2, the company’s flagship database. This announcement came alongside a new release of DB2 on the IBM Cloud, which eliminates the need to allocate storage and compute power by hand. Instead, this new version of DB2 on IBM Cloud includes simple slider bars for configuring the horsepower underneath.

DB2 users will also have a chance to preview the new Hybrid Transactional/Analytical Processing (HTAP) for the first time today. This new addition to the DB2 database enables BLU Acceleration in-memory secondary index support. This should improve the performance of some queries running on data stores and analytics warehouse.

Feature image via Pixabay.


A digest of the week’s most important stories & analyses.

View / Add Comments

Please stay on topic and be respectful of others. Review our Terms of Use.