Doing Data Sharing Right

Cambridge Analytica is a name most people recognize today, and not in a positive way. The now-defunct company is a poster child for data sharing gone wrong.

For data professionals, it is also a reminder of the challenge of agile inter-organizational data sharing in today’s connected and data-driven business world.  The business value to be unlocked by efficient and agile data-sharing keeps growing but somewhat paradoxically, the penalties for getting it wrong are getting increasingly stricter. The reputation damage from misplaced data can wipe out billions and can sometimes destroy a company. And the arrival of privacy regulations like the GDPR and the CCPA adds legal teeth to this challenge.

And yet, between emailed spreadsheets, FTP uploads and sharing of dropbox folders, very few companies are doing data sharing right.

So let’s look at some principles of good data sharing and how Elten (www.elten.io) can help you put them into practice.

I. De-Identify

In many cases, the business need for sharing can be met without needing to disclose identities to the other party. In such cases, the shared data should be de-identified using redaction/masking/rounding and similar techniques. With Elten, these techniques and tools that were previously only available to data-engineers are now available to every business user in a simple, spreadsheet-like interface.

Elten analyzes the data, suggests suitable de-identification techniques and lets the user apply them in a few clicks.

II. Share your data ‘in-place’, don’t hand out copies

Once you hand out a copy of your data, you lose all control and visibility into its usage. Don’t do that. Modern, cloud architectures allow for ‘in-place’ use of data. With Elten, you can make your data accessible for ‘in place’ use by data scientists, analysts and other such users, all without “giving them a copy”.

With this approach, you effectively make your data available as a service. It remains available to its ‘subscribers’ for only as long as necessary. And techniques like geo-fencing make it so that access is restricted not only to the right people, but also to the right geographies. When the subscription expires, access is revoked automatically and no trace of the shared data remains.

III. Sandbox derived data

“Derived Data” : data that results from analyzing the base data can sometimes be as sensitive or even more sensitive than the base data and needs to be protected similarly. Elten’s sandboxing for Apache Spark makes it so that any data derived from the shared data is also protected like the original data.

With this approach, when the data subscription ends, no trace remains, neither of the original shared data , not any data derived from it.

IV. Watermark the shared data.

When the same data is shared out with multiple parties it becomes difficult to trace its lineage and origins. This is necessary for proving culpability in case of a data leak or enforcing data licensing agreements.  Elten’s patent-pending data watermarking technology does this for you.. Given a snippet of data, Elten can tell you if the data was ever shared out by your organization, and if so, with whom, when and why.

And this is done without storing a copy of the shared data. Learn more at www.elten.io.

V. Monitor and have a Paper Trail for everything:

Make sure there is a detailed paper trail for all your data shares and accesses. With Elten you have a detailed record of exactly who accessed which version of your shared data, how, from where, etc.

Elten’s upcoming continuous monitoring feature will also analyze the access patterns to automatically identify malicious usage patterns 

More about Elten at www.elten.io . Be sure to try out the free version.

Leave a comment