Why the 2020 U.S. Census Bureau Opted for Differential Privacy

In this section, we employ the handoff model to highlight the reconfigurations which took place during the Census Bureau’s change to a new disclosure avoidance system (DAS) as they adopted DP. Specifically, we analyze changes in the elements which relate directly to the DAS’s primary function: protecting the confidentiality of census responses. In addition, we consider the artifacts that the Bureau introduced as part of the DP implementation process.

4.1 Handoff Triggers: New tech, new threats, new hype

From Census Bureau documents and communications [e.g., 5–7, 10], we identify three triggers that spurred the handoff, i.e., the adoption of DP. The first trigger enabled the handoff: the development of DP in 2006 presented a promising new method [43]. Second, the Bureau became aware of mounting evidence that increased computational power and data access might lead to new threats to privacy and confidentiality [7, 69, 128]. These threats focused on future, potentially dangerous threats via reconstruction and/or re-identification attacks [40] as well as current, realizable threats. For instance, Latanya Sweeney showed in 2000 that almost 90 percent of respondents to the 1990 U.S. Census could be identified using zip code, birth date, and gender alone [119]. In response to potential future threats, the Bureau conducted experimental attacks on its own statistical releases. While the experimental details could not be made public, the Bureau claimed their attack demonstrated that the re-identification of census records was indeed a credible threat [7, 56]. Third, the social environment served as another trigger: specifically the ‘tech for good’ hype. A heightened cultural interest in framing and ‘solving’ policy problems using stylized computational methods preceded the Bureau’s decision to adopt DP. This interest was evidenced in part by the rise of programs such as Code for America in 2009, the United States Digital Service in 2014, and the Mechanism Design for Social Good initiative in 2016 [53], as well as by technical communities’ heightened attention to questions of fairness, accountability, transparency, and ethics of technology, from the FATML origins in 2014 to the establishment of ACM venues such as FAccT (previous FAT*) and AIES in 2018 [2, 14]. DP saw increased in tandem with this hype, promising to encode values such as transparency, accountability, and privacy into a mathematical formulation [108, 138]. In such an environment, the adoption of a theoretical computer science methodology to bolster privacy protection in the national Census - and a methodology which would increase transparency into the Bureau’s processes - aligned perfectly with prevailing trends.

4.2 Handoff Components: Shiing experts, techniques, and data

The sociotechnical ecosystem surrounding the handoff of the DAS has many actor-components, many of which remained largely unchanged throughout the transition to DP. For instance, the stakeholders and users who depend upon Census data products and the external agencies and groups (demographers, community groups, etc.) with whom the Bureau collaborates remained relatively stable throughout the handoff. However, the handoff also introduced new components, shifting the experts and technologies involved in delivering the DAS’s confidentiality function. Below we compared those DAS components before and after the shift from SDL to DP.

4.2.1 Technical Methods. A key shift in the DAS was the substitution of SDL tools with DP mechanisms, a new set of confidentiality-preserving tools built on a definition of privacy from theoretical computer science. Under the previous SDL methods, Bureau statisticians protected census response data through methods like suppressing and swapping individual records. Under DP, however, randomly generated “noise” is algorithmically added to census data to preserve confidentiality. This transition introduced two new subcomponents of particular note. First, the tunable epsilon (휀) parameter is a direct measure of privacy loss in DP, in which small 휀 reflects low privacy loss (i.e., high privacy, low accuracy) while large 휀 reflects high privacy loss (i.e., low privacy, high accuracy). Second is post-processing, a new step added to the data pipeline under DP to further modify the confidential data after the injection of randomized noise. This step ensures all final data products are non-negative integers, in order to assuage human interpreters who might be confused or put off by, for instance, a table reporting -48.12 people residing within a particular geography

4.2.2 Data Invariants. The Census’s outputs have significant consequences: because certain statistics inform resource allocation, accurate representation of population is important to a number of stakeholders, including voting rights advocates, state and municipal governments, tribal leadership, and even disaster recovery and public health personnel [95, 97, 137]. For the 2010 census and those prior, some particularly significant counts (such as total state populations) were held invariant under the DAS;[2] in other words, they were not manipulated from their value ‘as counted’ [93, 94]. However, invariants are incompatible with traditional DP – zero noise requires an infinite privacy budget – meaning that any count held invariant complicates DP’s confidentiality guarantees [3, 93]. [3] As a result, the 2020 DAS reduced the number of counts that would be held invariant, most notably no longer publishing the population of census blocks as counted. Thus, the reported population - as well as demographic characteristics such as race and age - of all geographies smaller than a state would be altered with DP before publication.

4.2.3 Experts. Finally, the adoption of DP thus meant the introduction of a new class of experts to the DAS: theoretical computer scientists, specifically those well-studied in DP formulations. As a result, computer scientists were slotted into DAS design processes, for instance: serving alongside social scientists, policy researchers, political advocates, and corporate leaders on DP working groups for two census oversight committees; and completing contractual work to directly assist in implementing DP for the 2020 Demographic and Housing Characteristics tabulations [55].

Authors:

(1) AMINA A. ABDU, University of Michigan, USA;

(2) LAUREN M. CHAMBERS, University of California, Berkeley, USA;

(3) DEIRDRE K. MULLIGAN, University of California, Berkeley, USA;

(4) ABIGAIL Z. JACOBS, University of Michigan, USA.

This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.

Why the 2020 U.S. Census Bureau Opted for Differential Privacy

Table of Links

4 APPLYING THE THEORETICAL LENSES

4.1 Handoff Triggers: New tech, new threats, new hype

4.2 Handoff Components: Shiing experts, techniques, and data

4.2 Handoff Components: Shiing experts, techniques, and data