Big Data, Analytics, and Privacy: Do No Harm

January 19, 2011 at 7:33 pm 2 comments

Part 2 in an ongoing series on data privacy.

By Mary Ludloff

You may be wondering why all the focus on data privacy? I mean, since PatternBuilders is an analytics solution provider one might think that we should be leading the charge towards data transparency. And we are.

But you cannot have a discussion about big data and analytics without also considering data privacy. In a recent article in Mashable, Alistair Croll asks “Who owns your data?” Now, you may remember that in a previous post I said that “you” own your data and I stand by that statement. But Alistair has a point:

“But as we use the Internet for “free,” we have to remember that if we’re not paying for something, we’re not the customer. We are in fact the product being sold — or, more specifically, our data is.”

Words to “surf” by.

At the individual level, data is collected pretty much by each keystroke you make. That’s why we all need to understand that our privacy is at the “mercy” of all those websites we visit. Every one of them has a privacy policy but few of us read them. Ours is quite simple and quite clear: we don’t sell or share your information but we do analyze it to determine how to improve the user experience on our website as well as engage with you.

But for any company in the big data and big analytics space, that is just one layer of privacy we need to consider. The other is far more pervasive, confusing, and ever-changing. The data sets or sources we encounter on a daily basis for analysis purposes were gathered from somewhere for some purpose and with some assumption, at its most granular level, of privacy.

Now, when we deal with corporate data sets culled from in-house applications, the privacy policy is usually quite clear and easy for us to adhere to. But privacy assumptions for data sets outside the boundaries of the enterprise often are not so clear cut.

That is why we have a privacy code of conduct. It is equally quite simple and quite clear: Do No Harm. Why harm as opposed to borrowing one of Google’s business philosophy tenets “You can make money without doing evil?”

Well, to us do no evil implies purposeful intent. Harm, on the other hand, reminds us that data collection and use is amorphous and that we as a company in this business must constantly ask ourselves the following questions:

  • How was the data collected?
  • What is its intended use?
  • What is the privacy policy for this particular data set?
  • What do we need to do within our offering to ensure that we adhere to this policy?

Our Do No Harm privacy code of conduct comes into play when we cannot definitively answer these questions. Considering the amount of data sets available as well as the increased use of web scraping as a data collection method and companies, like Rapleaf, that are in the business of finding out all they can about “you” and then selling that data to others, we often find ourselves in the position of trying to determine whether some of the data we are working with may infringe upon an individual’s privacy. If this is the case, we have two options: anonymize the data (our next post will address how we technically do this) or don’t use it at all. Our decision is usually based on what we perceive to be the harm.

Huh? You might be thinking that this is about as clear as mud but here’s a case in point. Consider the WikiLeaks release of 250,000 U.S. diplomatic cables. For now, let’s put aside where you (or I) stand on the individual’s right to privacy versus the state’s right to secrecy. Let’s just consider the data set. We are always looking for interesting data sets that we can aggregate with others and then use to illustrate the power of analytics. For our purposes, the cables were a perfect choice. There was just one problem that we kept circling back to: our fear of doing harm because we were not at all sure how the information was collected and whether sensitive information had been redacted. How can you anonymize something when you don’t know what privacy issues need to be addressed? In the end, our decision was easy to make: we did not use the data set.

As a company in the Big Data and Big Analytics space, we deal with the issue of privacy every day. And we should because if we don’t, who will? There’s plenty of opinion on how privacy is going to play out in this space with some arguing for government regulation, others arguing for self-regulation, and even others arguing that those of us who use the Internet will force a standard of conduct for data privacy. I am not sure how this is all going to play out. That being said, I am sure that PatternBuilders will Do No Harm.

Are you concerned about how your data is being used? What are your thoughts on data privacy and who should regulate it? Does the company you work for have a privacy policy regarding all the data sets they own?

Entry filed under: Data. Tags: , , , , .

Data and Privacy: Who Owns “You?” Data Privacy Roundup: From Self-Policing to Regulation and Litigation

2 Comments Add your own

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Trackback this post  |  Subscribe to the comments via RSS Feed


Video: Big Data Made Easy

PatternBuilders Corporate

Follow us on Twitter

Special privacy section!

Enter your email address to subscribe.

Join 61 other followers

Recent Posts

Previous Posts


%d bloggers like this: