Theoretical Analysis of Fair Data Pruning with Gaussian Mixtures

Table of Links

Consider the Gaussian mixture model and the hypothesis class of linear decision rules introduced in Section 4. Here we give a more formal treatment of the assumptions and claims made in that section.

B.1 Average risk minimization

Theorem B.1. If Equation 6 holds, define t ∗ (ϕ0/ϕ1) as in Equation 3. Then, t ∗ (ϕ0/ϕ1) is the statistical risk minimizer for the Gaussian mixture model if

Proof. For a decision rule t ∈ R ∪ {±∞}, the statistical risk in the given Gaussian mixture model is given in Equation 1. The global minimum is achieved either at ±∞ or on R. In the latter case, the minimizer is a solution to ∂R(t)/∂t = 0:

Rearranging and taking the logarithm on both sides yields

which is a quadratic equation in t with solutions

By repeating the same steps for an inequality rather than equality, we conclude that 0 < ∂R(t)/∂t if and only if

B.2 Worst-class optimal priors

For the second part, we start by proving the following lemma.

B.3 The Effect of SSP

Our setting allows us to adapt and analyze a state-of-the-art baseline pruning algorithm, Self Supervised Pruning (SSP) [Sorscher et al., 2022]. Recall that, in Section 4, we adopted a variant of SSP that removes samples within a margin M > 0 of their class means. SSP performs k-means clustering in the embedding space of an ImageNet pre-trained self-supervised model and defines the difficulty of each data point by the cosine distance to its nearest cluster centroid, or prototype. In the case of two 1D Gaussians, this score corresponds to measuring the distance to the closest mean.

We claimed and demonstrated in Figure 8 that the optimal risk minimizer remains around t ∗ (ϕ0/ϕ1) even after pruning. In this section, we make these claims more precise. To this end, note that, for a Gaussian variable with variance σ 2 , the removed probability mass is

Now, for sufficiently small M, we can assume that the average risk minimizer lies within the interval (µ0 + M, µ1 − M) (recall that the average risk minimizer of the original mixture lies between µ0 and µ1 owing to the assumption in Equation 7). In this case, the right tail of the easier class and the left tail of the more difficult class (these tails are misclassified for the two classes) are unaffected by pruning and, hence, the average risk after SSP R′ (t) should remain proportional to the original risk R(t). More formally, consider the class-wise risks after SSP:

B.4 Multivariate Isotropic Gaussians

The average risk given the class priors is thus

Therefore, this problem also reduces to the minimization in the univariate case from Section B.2.

This paper is available on arxiv under CC BY 4.0 DEED license.

Authors:

(1) Artem Vysogorets, Center for Data Science, New York University (amv458@nyu.edu);

(2) Kartik Ahuja, Meta FAIR;

(3) Julia Kempe, New York University, Meta FAIR.

Theoretical Analysis of Fair Data Pruning with Gaussian Mixtures

Too Long; Didn't Read

Table of Links

B Theoretical Analysis for a Mixture of Gaussians

B.1 Average risk minimization

B.2 Worst-class optimal priors

B.3 The Effect of SSP

B.4 Multivariate Isotropic Gaussians

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

Categories

Trending Topics

Theoretical Analysis of Fair Data Pruning with Gaussian Mixtures

Too Long; Didn't Read

Table of Links

B Theoretical Analysis for a Mixture of Gaussians

B.1 Average risk minimization

B.2 Worst-class optimal priors

B.3 The Effect of SSP

B.4 Multivariate Isotropic Gaussians

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

RELATED STORIES

Categories

Trending Topics