Privacy, Big Data, Civil Rights, and Personalization Versus Discrimination: When does someone else’s problem become ours?
There has been a great deal of media attention on the benefits of big data (just look at our @bigdatapbi twitter stream) lately. Certainly, PatternBuilders has been busy helping financial markets become more efficient, working with data scientists on various research projects, as well as helping other businesses with their big data initiatives. In fact, there are a number of companies (like ours) that are making significant strides in reducing the costs associated with legacy big data systems, helping to move big data out of the early adopter phase and into the mainstream. But as technology innovates, there is usually some “bad” thrown in with all that good. Such is the case with big data and privacy.
Two thought provoking articles on privacy were published this month—both considering privacy through a civil rights prism. In “Big data is our generation’s civil rights issue, and we don’t know it,” Alistair Croll states that:
“Personalization” is another word for discrimination. We’re not discriminating if we tailor things to you based on what we know about you — right? That’s just better service.”
Croll then points out the myriad ways in which one person’s “personalization” is another person’s “discrimination:”
“We’re seeing the start of this slippery slope everywhere from tailored credit-card limits like this one to car insurance based on driver profiles. In this regard, big data is a civil rights issue, but it’s one that society in general is ill-equipped to deal with.”
Anders Sandberg, in “Asking the right question: big data and civil rights,” added to the discussion, asking this question: “What kinds of ethics do we need to safeguard civil rights in a world of big data?” Sandberg goes on to note that the crux of Croll’s argument is this:
“… this is not just the regular privacy debate: this is about what kind of information is allowed to be inferred about us and how different agents are allowed to act on it.”
If you have not had the time to read either of these articles, you should because both authors point out how easy it is to make inferences about us (often without our “knowledge”), offer great examples to support their points, and finally, consider ways in which “we” can address the “problem.”
I agree with Croll and Sandberg on all essential points but am left wondering whether anyone will care. Don’t misunderstand me—there are a subset of people, organizations, companies, and institutions that care very deeply about this but I keep coming back to this question: When will we get to some sort of tipping point when mainstream America joins in on the debate? Has personalization become so convenient that we forget there is another side to consider? Or are we, individually and collectively, assuming that the discrimination downside is someone else’s problem?
I consider myself a double-minority due to gender (female) and race (Hawaiian, but often assumed to be Hispanic or African-American). Discrimination, benign and not so benign, is something I’ve lived with all my life. Let me give you some examples (all true stories):
- Benign. “You really look pretty today.” (Said to me by a salesperson in a management meeting—knew the salesperson but the venue was not appropriate.) “I’ve been watching you and just cannot figure out what race you are.” (Said to me on numerous occasion over the years and it always gives me pause—why is it so important for a stranger to know what race I am?)
- Not-so-benign. Listening to disparaging remarks about my gender and understanding that if I complained, I would not be promoted. Being followed by a sales clerk in a store because she thought that I might steal the jeans I was taking to try on and then trying on the jeans while she waited just outside the door.
If you have not personally experienced discrimination in any form—whether it’s gender, race, religion, sexual identity, or fill-in-the blank—the idea that your data can be used against you is an academic exercise. There is no emotion attached to the issue. Don’t misunderstand me: I am not saying that just because you haven’t walked in my shoes you don’t understand what is at stake. Nor am I saying that I have never been guilty of acting, or thinking, in a discriminatory manner—I have and I am not proud of it. But I do think that when a harm is not fully personally experienced, it is easier to remain “one step removed” from addressing the problem.
Let me put this another way. While studies tell us that we are more concerned about online privacy, why are we are still quite willing to trade our “data” for work, play, convenience, business, safety, or security—often without a thought? As a self-professed privacy geek who is very aware of just how deep data collection and usage (through first and third parties) goes, even I am sometimes brought up short. For example:
- Lately my InBox has been deluged with offers for fast payday loans, 20+ per day. You may think that I might be short of cash. I suspect it’s because I googled the phrase underbanked+payday loans. Why? Because I listened to a podcast and was curious.
- Offline, in my “real” mailbox, I am being deluged by direct mail pieces from every financial manager under the sun and it’s not because they think I’m short of cash. Rather, another piece of data about me that is publicly available suggests that I am someone who needs wealth management services.
Both examples have data to support their fundamental hypotheses but both happen to be wrong. In other words, if I google cancer does it follow that I have cancer? Or if I visit state-sponsored Chinese websites and then websites run by Chinese dissidents, does it follow that I am a dissident? Or better yet, if you (or I) looked at the multitude of personal data out there on a person, would we not fall into the trap of making assumptions about them without ever meeting them? And if we do that, how can we be surprised that businesses and organizations do it too?
The issue with data, particularly personal data, is this: context is everything. And if you are not able to personally question me, you are guessing the context. Now for advertising, however personalized, the harm is benign. And as a card-carrying member of the marketing community I can tell you that we know that the more data we have about you translates to better guesses about what we can persuade you to buy. But we also know that some of that data is probably “dirty” and we don’t care. To us, that’s just the cost of doing business.
Now, this is where, as both Croll and Sandberg point out, that it gets tricky. What if that data is used to strip you of all anonymity? As Terence and I pointed out in “Privacy and Big Data,” three data points—gender, five-digit zip code, and birth date—can be used to uniquely identify us 87% of the time. It follows that with all this data we are putting out there it will be far easier to predict, not only what we’ll buy, but who we might vote for (whether we “like” it or not), how healthy we are, how financially stable we are (via credit scores or now the highly secret E-Score), or whether we might be more likely to commit a crime.
As I write this on my laptop, I am being “modeled” based on what I tweet or share, where I am surfing, what I am Googling, the streaming videos that I am watching, and the podcasts I am listening too. Of course my smartphone and other devices that I use are just adding to my own personal data deluge. Some of the assumptions made about me may be absolutely on point while others may be way off base but how will I know as the process is not transparent? As Sandberg points out:
“… analyzing questions can be done silently and secretly. It can be nearly impossible to tell that an agent has inferred sensitive information and uses it. Sometimes active measures are taken to keep analyzed people in the dark but in many cases the response to the questions can be invisible – nobody notices offers they do not get. And if these absent opportunities start following certain social patterns (for example not offering them to certain races, genders or sexual preferences) they can have a deep civil rights effect – just consider the case of education.”
So, not only do I not know what data is being used to make inferences about me, I may not know that I am actively being discriminated against. But here’s what I do know: If you have a digital presence you have already felt the sting, known or unknown, of benign and not-so-benign forms of discrimination. Or as a recent article in Time notes:
“Information is currency, but we tend to forget that… Most of the time, this information is used to sell you stuff. This has the potential to be sneaky — if it knows enough about you, a company can figure out what type of ad is most likely to sway you — but a lot of it isn’t inherently bad and might be helpful… But that’s not what concerns lawmakers and privacy experts. They worry that people’s virtual selves could get them written off as undesirable, whether the depiction is correct or not. There’s also the question of accuracy in general. Some consumer groups estimate that up to 25% of credit reports have errors, and those errors can lead to difficulty getting a loan or other type of credit. Without any way to look at our consumer profiles, people have no idea what marketers and other interested parties see and how they’re judging us.”
This is no longer someone else’s problem—it is all of ours. My question to you is this: what are we going to do about it?
Now if you’ve read this far, you may assume that I am a bit militant about digital privacy. You would be wrong. Like Croll and Sandberg, I too believe that big data has enormous benefits and I don’t want to see that access to it is limited or forbidden as the New York Times recently reported.
So what to do? Sandberg sums it up this way:
“Croll suggests that we should link our data with how it can be used. While technological solutions to this might sometimes be possible, and some standards like creative commons licenses are being spread, he thinks – and I agree – that the real monitoring and enforcement will be social, legal and ethical.”
Sandberg goes on talk about transparency as a way to monitor what is legal and ethical versus what is not:
“One approach would be reciprocity: demanding information back about how data is gathered and processed, allowing proper responses. If the government wants to monitor us, we should demand equal or greater oversight of government. If companies collect information about us we should be allowed to know what they have and what they use it for. Except that governments and companies are quite recalcitrant in doing this. At least companies might plausibly claim they would lose competitive edge if they told about what they were doing.”
The problem with transparency is this: in order for it to work everyone must be, well, transparent about data collection and use. Just this past year we have seen numerous examples by companies, government agencies, and other organizations where that is not the case (just look at my twitter feed–@mludloff—or any of my posts on data privacy as both provide many examples). That being said, the beauty of living in a data-driven world is that we can also predict when legal or ethical lines are crossed, use social media to amplify those issues, and vote with our pocketbooks.
But any solution begins and ends with us:
- What does privacy mean in the digital age?
- How can we manage the collection and use of our personal data?
- How do we enforce it?
- How do we punish entities when ethical and legal lines are crossed?
- How do we reward those entities that are truly transparent about data collection and use?
This is no longer someone else’s problem. What happens next is up to all of us. As Croll said at the end of his article:
“This should be fun.”
Coming soon from our blog:
- A look at big data and real estate—some correlations to ponder.
- Privacy and Big Data—we’re updating our book and talking at Strata East.
- The big data market—our view of the big data ecosystem.