The Problem With Too Much Data Privacy

Privacy has long dominated our social and legal debates about technology. The Federal Trade Commission and other central regulators aim to strengthen protections against the collection of personal data. Data minimization is the default set in Europe by the GDPR and a new bill before U.S. Congress, The American Data Privacy and Protection Act, similarly seeks to further privacy’s primacy.

Privacy is important when it protects people against harmful surveillance and public disclosure of personal information. But privacy is just one of our democratic society’s many values, and prohibiting safe and equitable data collection can conflict with other equally valuable social goals. While we have always faced difficult choices between competing values—safety, health, access, freedom of expression and equality—advances in technology make it increasingly possible for data to be anonymized and secured to balance individual interests with the public good. Privileging privacy, instead of openly acknowledging the need to balance privacy with fuller and representative data collection, obscures the many ways in which data is a public good. Too much privacy—just like too little privacy—can undermine the ways we can use information for progressive change.

We rightfully fear surveillance when it is designed to use our personal information in harmful ways. Yet a default assumption that data collection is harmful is simply misguided. We should focus on regulating misuse rather than banning collection. Take for example perhaps the most controversial technologies that privacy advocates avidly seek to ban: facial recognition. 20 cities and counties around the U.S. have passed bans on government facial recognition. In 2019, California enacted a three-year moratorium on the use of facial recognition technology in police body cameras. The two central concerns about facial recognition technology are its deficiencies in recognizing the faces of minority groups—leading, for example, to false positive searches and arrests—and its increase in population surveillance more generally. But the contemporary proposals of unnuanced bans on the technology will stall improvements to its accuracy and hinder its safe integration, to the detriment of vulnerable populations.

These outright bans ignore that surveillance cameras can help protect victims of domestic violence against abuser trespassing, help women create safety networks when traveling on their own, and reduce instances of abuse of power by law enforcement. Facial recognition is increasingly aiding the fight against human trafficking and locating missing people—and particularly missing children—when the technology is paired with AI that creates maturation images to bridge the missing years. There are also many beneficial uses of facial recognition for the disability community, such as assisting people with impaired vision and supporting the diagnosis of rare genetic disorders. While class action and ACLU lawsuits and reform proposals stack up, we need balanced policies that allow facial recognition under safe conditions and restrictions.

We also need to recognize that privacy can conflict with better, more accurate, and less biased, automation. In the contemporary techlash, in which algorithms are condemned as presenting high risks of bias and exclusion, the tension between protecting personal data and the robustness of datasets must be acknowledged. For an algorithm to become more accurate and less biased, it needs data that is demographically reflective. Take health and medicine for example. Historically, clinical trials and health-data collection have privileged male and white patients. The irony of privacy regulation as a solution to exclusion and exploitation is that it fails to address the source of much bias: partial and skewed data collection. Advances in synthetic data technology, which allows systems to artificially generate the data that the algorithm needs to train on can help alleviate some of these tensions between data collection and data protection. Consider facial recognition again: we need more representative training data to ensure that the technology becomes equally accurate across identities. And yet, we need to be deliberate and realistic about the need for real data for public and private innovation.

An overemphasis on privacy can hamper advances in scientific research, medicine, and public health compliance. Big data collected and mined by artificial intelligence is allowing earlier and more accurate diagnosis, advanced imaging, increased access to and reduced costs of quality care, and discovery of new connections between data and disease to discover novel treatments and cures. Put simply, if we want to support medical advances, we need more data samples from diverse populations. AI advances in radiology have resulted not only in better imaging but also in reduced radiation doses and faster, safer, and more cost-effective care. The patients who stand to gain the most are those who have less access to human medical experts.

In its natural state—to paraphrase the tech activist slogan “Information wants to be free” (and channeling the title of my own book Talent Wants to Be Free)—data wants to be free. Unlike finite, tangible resources like water, fuel, land or fish, data doesn’t run out because it is used. At the same time, data’s advantage stems from its scale. We can find new proteins for drug development, teach speech-to-text bots to understand myriad accents and dialects, and teach algorithms to screen breast mammograms or lung x-rays when we can harness the robustness of big data—millions, sometimes billions, of data points. During the COVID-19 pandemic, governments track patterns of the spread of the disease and fight against those providing false information and selling products under fraudulent claims about cures and protections. The Human Genome Project is a dazzling, paradigmatic leap in our collective knowledge and health capabilities enabled by massive data collection. But there is much more health information that needs to be collected, and privileging privacy may be bad for your health.

In health care, this need for data is perhaps intuitive, but the same holds true if we want to understand—and tackle—the root causes of other societal ills: pay gaps, discriminatory hiring and promotion, and inequitable credit, lending, and bail decisions. In my research about gender and racial-pay gaps, I’ve shown that more widespread information about salaries is key. Similarly, freely sharing information online about our job experiences can improve workplaces, and there are initiatives concerning privacy that may inadvertently backfire and result in statistical discrimination against more vulnerable populations. For example, empirical studies suggest that ban-the-box privacy policies about criminal background checks for hiring may have led to increased racial discrimination in some cities.

Privacy—and its pervasive offshoot, the NDA—has also evolved to shield the powerful and rich against the public’s right to know. Even now, with regard to the right to abortion, the legal debates around reproductive justice reveal privacy’s weakness. A more positive discourse about equality, health, bodily integrity, economic rights, and self-determination would move us beyond the sticky question of what is and is not included in privacy. As I recently described in a lecture about Dobbs v. Jackson Women’s Health Organization, abortion rights are far more than privacy rights; they are health rights, economic rights, equality rights, dignity rights, and human rights. In most circumstances, data collection should not be prevented but safeguarded, shared, and employed to benefit all.

While staunch privacy advocates emphasize tools like informed consent and opt-out methods, these policies rely on a fallacy of individual consent. Privacy scholars agree that consent forms—those ubiquitous boilerplate clickwrap policies—are rarely read or negotiated. Research also reveals that most consumers are quite agnostic to privacy settings. The behavioral literature calls this the privacy paradox, revealing that in practice people are regularly willing to engage in a privacy calculus, giving up privacy for perceived benefits. So privileging privacy is both over and under-inclusive: It neglects a fuller array of values and goals we must balance, but it also fails to provide meaningful assurances for individuals and communities who have an undeniable history of being oppressed by the state and privileged elite. The dominance of privacy policy can distort nuanced debates about distributional justice and human rights, as we continue to build our digital knowledge commons. Collection of important data to tackle our toughest social issues is a critical mandate of democracy.

ncG1vNJzZmismaKyb6%2FOpmZvamJpgXmAjp2YrZldpb%2BqwsCcsGaooqSvrbHMaA%3D%3D