Theoretical Analysis of Fair Data Pruning with Gaussian Mixtures

tldt arrow

Too Long; Didn't Read

Explore our detailed theoretical analysis of fair data pruning for binary classification using a mixture of Gaussians
featured image - Theoretical Analysis of Fair Data Pruning with Gaussian Mixtures
Algorithmic Bias (dot tech) HackerNoon profile picture
0-item

Abstract and 1 Introduction

2 Data Pruning & Fairness

3 Method & Results

4 Theoretical Analysis

5 Discussion, Acknowledgments and Disclosure of Funding, and References

A Implementation Details

B Theoretical Analysis for a Mixture of Gaussians

B Theoretical Analysis for a Mixture of Gaussians

Consider the Gaussian mixture model and the hypothesis class of linear decision rules introduced in Section 4. Here we give a more formal treatment of the assumptions and claims made in that section.

B.1 Average risk minimization


Theorem B.1. If Equation 6 holds, define t ∗ (ϕ0/ϕ1) as in Equation 3. Then, t ∗ (ϕ0/ϕ1) is the statistical risk minimizer for the Gaussian mixture model if



Proof. For a decision rule t ∈ R ∪ {±∞}, the statistical risk in the given Gaussian mixture model is given in Equation 1. The global minimum is achieved either at ±∞ or on R. In the latter case, the minimizer is a solution to ∂R(t)/∂t = 0:



Rearranging and taking the logarithm on both sides yields



which is a quadratic equation in t with solutions



By repeating the same steps for an inequality rather than equality, we conclude that 0 < ∂R(t)/∂t if and only if


B.2 Worst-class optimal priors


For the second part, we start by proving the following lemma.


B.3 The Effect of SSP

Our setting allows us to adapt and analyze a state-of-the-art baseline pruning algorithm, Self Supervised Pruning (SSP) [Sorscher et al., 2022]. Recall that, in Section 4, we adopted a variant of SSP that removes samples within a margin M > 0 of their class means. SSP performs k-means clustering in the embedding space of an ImageNet pre-trained self-supervised model and defines the difficulty of each data point by the cosine distance to its nearest cluster centroid, or prototype. In the case of two 1D Gaussians, this score corresponds to measuring the distance to the closest mean.


We claimed and demonstrated in Figure 8 that the optimal risk minimizer remains around t ∗ (ϕ0/ϕ1) even after pruning. In this section, we make these claims more precise. To this end, note that, for a Gaussian variable with variance σ 2 , the removed probability mass is



Now, for sufficiently small M, we can assume that the average risk minimizer lies within the interval (µ0 + M, µ1 − M) (recall that the average risk minimizer of the original mixture lies between µ0 and µ1 owing to the assumption in Equation 7). In this case, the right tail of the easier class and the left tail of the more difficult class (these tails are misclassified for the two classes) are unaffected by pruning and, hence, the average risk after SSP R′ (t) should remain proportional to the original risk R(t). More formally, consider the class-wise risks after SSP:


B.4 Multivariate Isotropic Gaussians


The average risk given the class priors is thus



Therefore, this problem also reduces to the minimization in the univariate case from Section B.2.


This paper is available on arxiv under CC BY 4.0 DEED license.

Authors:

(1) Artem Vysogorets, Center for Data Science, New York University (amv458@nyu.edu);

(2) Kartik Ahuja, Meta FAIR;

(3) Julia Kempe, New York University, Meta FAIR.


Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks