An excerpt from
by Josh Sullivan and Angela Zutavern
Reprinted with permission from PublicAffairs, an imprint of Perseus Books LLC, a subsidiary of Hachette Book Group Inc. All rights reserved.
The standard-bearer of an organization protecting personal data is the U.S. Census Bureau. Bureau chief scientist John Abowd, on leave from his professorship in economics, statistics, and information science at Cornell, cites the growing difficulty of protecting personal data in a world where people are getting better and better at re-identifying “anonymized” entries. “What was implausible in the twentieth century is now plausible,” he says.
One of the Census Bureau’s projects is the Longitudinal Employer-Household Dynamics, or LEHD, Program, which publishes trends in employment, hiring, job creation and destruction, and earnings, by geography, with data on age, sex, and industry. Abowd helped create the data set by sewing together data from the Census Bureau with other data from the fifty states. From the Census Bureau’s perspective, zero release of the core personal data is acceptable because these data include the origin and destination addresses of individual citizens as they commute to and from work.
Abowd is a leader in creating protections for such data, in particular, by modifying the data sets so that, when released, the insights from the data are correct but the personal facts have been changed so that nothing can be inferred about any individual person. The approach is called “differential” privacy, something the Census Bureau was early in using for some products in the LEHD Program. Differential privacy gives the Census Bureau the high data security the U.S. public demands.
That said, Abowd notes the tradeoff in using rigorous security: The better you protect the data, the less useful it is for decision-making. You will always face the question of how far you go in making personal data absolutely irretrievable. If, for example, you can get a big boost in public benefit by instituting a bit less rigorous protection, maybe over-the-top privacy safeguards are self-defeating. Which comes first, personal or public welfare?
“The hard question,” Abowd says, “is how do you give meaningful information while limiting the extent to which you can learn about an individual record above what can be learned from other information out there?” When it comes to weighing total protection with maximum welfare, he notes there is no technically feasible way of having all of one and all of the other.
To serve public welfare, Abowd makes sure that, despite the Census Bureau’s rigorous protections, the data remain useful. In one case, a company was thinking about expanding a manufacturing plant in rural South Carolina, but it was having second thoughts. It was worried that the targeted rural town didn’t have the five hundred skilled workers needed. But the LEHD data set showed that enough workers lived within a fifty-minute commute distance for that area. So the company enlarged its plant — and it found all the employees it needed.
Your organization may not face the rigorous privacy requirements of the Census Bureau. But the same ethical questions confront you. When is your data handling ethical and when is it not? As it turns out, in most organizations ethical choices relate to a small number of age-old wrongs, typically forms of lying, deceiving, stealing, or harming. So when does each of these come into play in your organization? If we agree that lying, deceiving, stealing, and harming are unethical, which situations in your business become question marks?
A media lab in the Netherlands assembled a database of Dutch citizens’ birth dates, professions, addresses, sexual orientations, and even medical histories using only publicly available data. How personal, private, and usable are these data? Can you ethically use it? We think not.
A Danish graduate student assembled a database of 70,000 names from the OkCupid online dating site. The list included usernames, gender, location, age, desired relationship, personality traits, and other details. In a firestorm of criticism, the student removed the data from his open repository, even as he defended the data as public and the release as ethical. Is using these data okay?
In neither of these cases do we think using the data is ethical, and let’s explore why. In most cases, people set out to act ethically. But as you strive to run an organization, or build a business, or serve a government or nonprofit mission, you will always find yourself walking into gray areas. Ethical dilemmas arise, and so does the risk of making decisions you’ll regret. Almost always these situations stem simply — and innocently — from temptations to make work easier, accomplish an assigned task, make extra sales, or better serve an important constituency.
Say you decide to let slide a decision that gives customers an incorrect impression about your use of their data. This is easy to do. Customers, however, might view this as you deceiving them into agreeing to unexpected uses of their data. Or say you go slow in disclosing a data breach. This also has business justifications. But your customers might perceive this as exacerbating the risk of theft of their savings. Or say you withhold data about error-prone physicians in your hospital system. Your customers, or patients, would see this as putting them at the risk of harm.
Ethical questions might reside in every task of every business, government entity, and nonprofit organization. The challenge is how to draw the line in a foggy reality. When is data brokering questionable, even if the practice is disclosed in explicit “terms of use”? When is not sharing health data a harm?
None of these are legal questions; they are ethical questions, although the law can be a guide to what society thinks about various business practices. To the extent that laws are usually the result of some issues lawmakers felt strongly about, illegal acts are usually considered unethical acts. That said, the law sets a low bar for what’s “right”; it often allows behavior that is less than what’s considered upstanding and expected.
You can use two major schools of ethical reasoning as touchstones in ethical decision-making: action based and consequence based. In action-based reasoning, whether an action is perceived as right or wrong is based on the act itself, isolated from other factors. (You would decide deception is wrong, for instance, even if you engage in it, say, to create a better service for your customer or society.) In consequence-based reasoning, whether an action is wrong depends on future consequences: decisions that provide the greatest good for the greatest number, so-called utilitarianism, are considered ethical. (Deception is okay when you can expect, for example, that allowing hometown bias in an algorithm helps you to enlarge your community’s “fair share” of state funding.)
Action-based ethics (or duty-based ethics) stem from the philosophy of Immanuel Kant. Consequence-based ethics was developed by John Stuart Mill and Jeremy Bentham. What’s ethical often depends on which school of thought you side with — consciously or unconsciously — and if and when you decide to switch between the two.
If your approach is action based, then you would decide that tracking customer movements is always wrong, even if done anonymously. Period. But if you’re utilitarian minded, you might instead admit that tracking is wrong, but that it does have benefits to consumers and society and so in some cases is an ethical choice. A lot of people borrow from both schools of thought, often depending on the situation, although this may do more to confuse decision-making than clarify it. The same goes for borrowing from other schools, such as “rights-based” ethics, which hold that actions are right or wrong depending on whether they violate human rights.
Whether you’re on a startup journey or in an organization like the Census Bureau with thousands of employees, management must apply ethical reasoning to arrive at ethical answers. What does your internal compass tell you? Which kind of reasoning do you feel is correct? Presuming you follow the law, you still have to clarify your ethical philosophies and those on which your organization stands.
What are the big areas you should be thinking about? We urge you to think of the following four and decide on your ethical reasoning in handling ethical questions in advance:
- Transparency in data handling
- Protection of privacy
- Data ownership
- Data security
In the journey to becoming a mathematical organization, applying ethical reasoning in advance is not an exercise to perfect your ethics just to feel good about yourself. It is an exercise of introspection that improves your strategic decision-making to prevent mistakes.