Why the 2020 U.S. Census Bureau Opted for Differential Privacy

Written by mediabias | Published 2025/05/21
Tech Story Tags: algorithmic-transparency | differential-privacy | disclosure-avoidance-system | sociotechnical-systems | handoff-model | government-data-privacy | ai-governance | algorithmic-data-protection

TLDRThe U.S. Census Bureau adopted differential privacy (DP) to address growing threats to data confidentiality. This "handoff" shifted technical methods, data outputs, and the experts involved—moving from traditional statistical disclosure techniques to privacy-preserving algorithms. The change reduced data accuracy in some areas but aligned with broader trends in tech-driven governance and privacy protection.via the TL;DR App

Table of Links

Abstract and 1. Introduction

2. Related Work

3. Theoretical Lenses

3.1. Handoff Model

3.2. Boundary objects

4. Applying the Theoretical Lenses and 4.1 Handoff Triggers: New tech, new threats, new hype

4.2. Handoff Components: Shifting experts, techniques, and data

4.3. Handoff Modes: Abstraction and constrained expertise

4.4 Handoff Function: Interrogating the how and 4.5. Transparency artifacts at the boundaries: Spaghetti at the wall

5. Uncovering the Stakes of the Handoff

5.1. Confidentiality is the tip of the iceberg

5.2. Data Utility

5.3. Formalism

5.4. Transparency

5.5. Participation

6. Beyond the Census: Lessons for Transparency and Participation and 6.1 Lesson 1: The handoff lens is a critical tool for surfacing values

6.2 Lesson 2: Beware objects without experts

6.3 Lesson 3: Transparency and participation should center values and policy

7. Conclusion

8. Research Ethics and Social Impact

8.1. Ethical concerns

8.2. Positionality

8.3. Adverse impact statement

Acknowledgments and References

4 APPLYING THE THEORETICAL LENSES

In this section, we employ the handoff model to highlight the reconfigurations which took place during the Census Bureau’s change to a new disclosure avoidance system (DAS) as they adopted DP. Specifically, we analyze changes in the elements which relate directly to the DAS’s primary function: protecting the confidentiality of census responses. In addition, we consider the artifacts that the Bureau introduced as part of the DP implementation process.

4.1 Handoff Triggers: New tech, new threats, new hype

From Census Bureau documents and communications [e.g., 5–7, 10], we identify three triggers that spurred the handoff, i.e., the adoption of DP. The first trigger enabled the handoff: the development of DP in 2006 presented a promising new method [43]. Second, the Bureau became aware of mounting evidence that increased computational power and data access might lead to new threats to privacy and confidentiality [7, 69, 128]. These threats focused on future, potentially dangerous threats via reconstruction and/or re-identification attacks [40] as well as current, realizable threats. For instance, Latanya Sweeney showed in 2000 that almost 90 percent of respondents to the 1990 U.S. Census could be identified using zip code, birth date, and gender alone [119]. In response to potential future threats, the Bureau conducted experimental attacks on its own statistical releases. While the experimental details could not be made public, the Bureau claimed their attack demonstrated that the re-identification of census records was indeed a credible threat [7, 56]. Third, the social environment served as another trigger: specifically the ‘tech for good’ hype. A heightened cultural interest in framing and ‘solving’ policy problems using stylized computational methods preceded the Bureau’s decision to adopt DP. This interest was evidenced in part by the rise of programs such as Code for America in 2009, the United States Digital Service in 2014, and the Mechanism Design for Social Good initiative in 2016 [53], as well as by technical communities’ heightened attention to questions of fairness, accountability, transparency, and ethics of technology, from the FATML origins in 2014 to the establishment of ACM venues such as FAccT (previous FAT*) and AIES in 2018 [2, 14]. DP saw increased in tandem with this hype, promising to encode values such as transparency, accountability, and privacy into a mathematical formulation [108, 138]. In such an environment, the adoption of a theoretical computer science methodology to bolster privacy protection in the national Census - and a methodology which would increase transparency into the Bureau’s processes - aligned perfectly with prevailing trends.

4.2 Handoff Components: Shiing experts, techniques, and data

The sociotechnical ecosystem surrounding the handoff of the DAS has many actor-components, many of which remained largely unchanged throughout the transition to DP. For instance, the stakeholders and users who depend upon Census data products and the external agencies and groups (demographers, community groups, etc.) with whom the Bureau collaborates remained relatively stable throughout the handoff. However, the handoff also introduced new components, shifting the experts and technologies involved in delivering the DAS’s confidentiality function. Below we compared those DAS components before and after the shift from SDL to DP.

4.2.1 Technical Methods. A key shift in the DAS was the substitution of SDL tools with DP mechanisms, a new set of confidentiality-preserving tools built on a definition of privacy from theoretical computer science. Under the previous SDL methods, Bureau statisticians protected census response data through methods like suppressing and swapping individual records. Under DP, however, randomly generated “noise” is algorithmically added to census data to preserve confidentiality. This transition introduced two new subcomponents of particular note. First, the tunable epsilon (휀) parameter is a direct measure of privacy loss in DP, in which small 휀 reflects low privacy loss (i.e., high privacy, low accuracy) while large 휀 reflects high privacy loss (i.e., low privacy, high accuracy). Second is post-processing, a new step added to the data pipeline under DP to further modify the confidential data after the injection of randomized noise. This step ensures all final data products are non-negative integers, in order to assuage human interpreters who might be confused or put off by, for instance, a table reporting -48.12 people residing within a particular geography

4.2.2 Data Invariants. The Census’s outputs have significant consequences: because certain statistics inform resource allocation, accurate representation of population is important to a number of stakeholders, including voting rights advocates, state and municipal governments, tribal leadership, and even disaster recovery and public health personnel [95, 97, 137]. For the 2010 census and those prior, some particularly significant counts (such as total state populations) were held invariant under the DAS;[2] in other words, they were not manipulated from their value ‘as counted’ [93, 94]. However, invariants are incompatible with traditional DP – zero noise requires an infinite privacy budget – meaning that any count held invariant complicates DP’s confidentiality guarantees [3, 93]. [3] As a result, the 2020 DAS reduced the number of counts that would be held invariant, most notably no longer publishing the population of census blocks as counted. Thus, the reported population - as well as demographic characteristics such as race and age - of all geographies smaller than a state would be altered with DP before publication.

4.2.3 Experts. Finally, the adoption of DP thus meant the introduction of a new class of experts to the DAS: theoretical computer scientists, specifically those well-studied in DP formulations. As a result, computer scientists were slotted into DAS design processes, for instance: serving alongside social scientists, policy researchers, political advocates, and corporate leaders on DP working groups for two census oversight committees; and completing contractual work to directly assist in implementing DP for the 2020 Demographic and Housing Characteristics tabulations [55].

Authors:

(1) AMINA A. ABDU, University of Michigan, USA;

(2) LAUREN M. CHAMBERS, University of California, Berkeley, USA;

(3) DEIRDRE K. MULLIGAN, University of California, Berkeley, USA;

(4) ABIGAIL Z. JACOBS, University of Michigan, USA.


This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.


Written by mediabias | We publish deeply researched (and often vastly underread) academic papers about our collective omnipresent media bias.
Published by HackerNoon on 2025/05/21