Big Data, Analytics, and Privacy: Do No Harm
Part 2 in an ongoing series on data privacy.
You may be wondering why all the focus on data privacy? I mean, since PatternBuilders is an analytics solution provider one might think that we should be leading the charge towards data transparency. And we are.
But you cannot have a discussion about big data and analytics without also considering data privacy. In a recent article in Mashable, Alistair Croll asks “Who owns your data?” Now, you may remember that in a previous post I said that “you” own your data and I stand by that statement. But Alistair has a point:
“But as we use the Internet for “free,” we have to remember that if we’re not paying for something, we’re not the customer. We are in fact the product being sold — or, more specifically, our data is.”
Words to “surf” by.
But for any company in the big data and big analytics space, that is just one layer of privacy we need to consider. The other is far more pervasive, confusing, and ever-changing. The data sets or sources we encounter on a daily basis for analysis purposes were gathered from somewhere for some purpose and with some assumption, at its most granular level, of privacy.
That is why we have a privacy code of conduct. It is equally quite simple and quite clear: Do No Harm. Why harm as opposed to borrowing one of Google’s business philosophy tenets “You can make money without doing evil?”
Well, to us do no evil implies purposeful intent. Harm, on the other hand, reminds us that data collection and use is amorphous and that we as a company in this business must constantly ask ourselves the following questions:
- How was the data collected?
- What is its intended use?
- What do we need to do within our offering to ensure that we adhere to this policy?
Our Do No Harm privacy code of conduct comes into play when we cannot definitively answer these questions. Considering the amount of data sets available as well as the increased use of web scraping as a data collection method and companies, like Rapleaf, that are in the business of finding out all they can about “you” and then selling that data to others, we often find ourselves in the position of trying to determine whether some of the data we are working with may infringe upon an individual’s privacy. If this is the case, we have two options: anonymize the data (our next post will address how we technically do this) or don’t use it at all. Our decision is usually based on what we perceive to be the harm.
Huh? You might be thinking that this is about as clear as mud but here’s a case in point. Consider the WikiLeaks release of 250,000 U.S. diplomatic cables. For now, let’s put aside where you (or I) stand on the individual’s right to privacy versus the state’s right to secrecy. Let’s just consider the data set. We are always looking for interesting data sets that we can aggregate with others and then use to illustrate the power of analytics. For our purposes, the cables were a perfect choice. There was just one problem that we kept circling back to: our fear of doing harm because we were not at all sure how the information was collected and whether sensitive information had been redacted. How can you anonymize something when you don’t know what privacy issues need to be addressed? In the end, our decision was easy to make: we did not use the data set.
As a company in the Big Data and Big Analytics space, we deal with the issue of privacy every day. And we should because if we don’t, who will? There’s plenty of opinion on how privacy is going to play out in this space with some arguing for government regulation, others arguing for self-regulation, and even others arguing that those of us who use the Internet will force a standard of conduct for data privacy. I am not sure how this is all going to play out. That being said, I am sure that PatternBuilders will Do No Harm.