Latent Space Metrics for Single-Cell RNA-seq Model Evaluation

Table of Links

Abstract and 1. Introduction

2. Background

2.1 Amortized Stochastic Variational Bayesian GPLVM

2.2 Encoding Domain Knowledge through Kernels

3. Our Model and Pre-Processing and Likelihood

3.2 Encoder

4. Results and Discussion and 4.1 Each Component is Crucial to Modifies Model Performance

4.2 Modified Model achieves Significant Improvements over Standard Bayesian GPLVM and is Comparable to SCVI

4.3 Consistency of Latent Space with Biological Factors

4. Conclusion, Acknowledgement, and References

A. Baseline Models

B. Experiment Details

C. Latent Space Metrics

D. Detailed Metrics

C LATENT SPACE METRICS

In this work, we compare these latent spaces both qualitatively and quantitatively. For qualitative measurements, we refer to UMAP 2-D visualizations (McInnes et al., 2018). However, since UMAP is a stochastic mapping and the visualized distances between datapoints are not reflective of the true distances in the latent space (McInnes et al., 2018), we also turn to quantitative latent space measurements pertinent to single-cell data. In particular, we focus on five quantitative metrics used in single-cell integration benchmarking that measure how well the latent space clusters cell types and how well the latent space mixes samples by batch (Luecken et al., 2022). The measurements for cell-type separation in latent space are detailed in Bio Conservation Metrics (Section C.1) and measurements for batch mixing are detailed in Batch Correction Metrics (C.2).

In our experiments, following convention in Luecken et al. (2022), we average the batch variables to obtain an average batch metric score, and we average the bio-conservation metrics to obtain an average bio metric score. While Luecken et al. (2022) propose an overall latent space score obtained through a weighted average of these two metrics, we deviate from this approach. We observed that models that failed to learn meaningful information could still yield high batch mixing scores (due to indiscriminate mixing) and consequently lead to misleading total scores. Hence, we choose to report only the average batch metric score and average bio metric score, separately.

C.1 BIO CONSERVATION METRICS

We measure the latent space’s ability to separate by cell-type with three different bio metrics: normalized mutual information (NMI), adjusted rand index (ARI), and cell-type average silhouette width (cellASW).

The NMI and ARI metrics require comparing cell-type information with learned clusterings from the latent space. To help make the metrics comparable for different models, we define the learned clusters with the Leiden clustering method with default resolution = 1 (Traag et al., 2019) on the latent space projections. We also considered k-means clustering on the latent space projections but found that the resulting metrics were sometimes not reflective of the perceived clusters (e.g. when clearly-defined clusters are long and thin and close together width-wise, k-means outputs poor metrics).

Normalized Mutual Information (NMI). NMI compares the overlap of two clusterings, taking on values between 0 and 1 where 0 indicates no overlap and 1 indicates perfect overlap.

where

is the mutual information of T and C and

denotes the entropy of T (and is similarly defined for C).

Adjusted Rand Index (ARI). Adjusted Rand Index (ARI) also compares two clusterings but ARI (1) counts the pairwise agreements between the clusterings instead element-wise comparisons as done in NMI; and (2) adjusts for chance. The measurement usually takes on values between 0 and 1, and may extend to −0.5 for very different clusterings (Luecken et al., 2022).

For a given sample S of N samples, Rand index by itself captures the proportion of samples upon which the two clusterings X and Y capture similar information. More formally,

where

• a is the number of pairs that are in the same cluster in T and in the same cluster in C

• b is the number of pairs that are in different clusters in T and in different clusters in C

• c is the number of pairs that are in same cluster in T and in different clusters in C

• d is the number of pairs that are in different clusters in T and in the same cluster in C

ARI is the corrected-for-chance version of RI.

Cell Average Silhouette Width (Cell type ASW). The cell-type average silhouette width measures how compact the predicted clusters are by comparing the intra-cluster distances with inter-cluster distances. A score of 1 indicates well-separated and compact clusters while a score of 0 indicates misaligned or overlapping clusters. The clusters in this case are defined by the cell-types

For a cell n of cell type Cj, its silhouette score is defined as:

where a(n) is the average (Euclidean) distance between cell n and the other cells of the same celltype and b(n) is the minimum average (Euclidean) distance between cell n and a cell of a different cell type. More formally

Then, the average silhoutte width for each cell-type cluster Cj is defined as the average silhoutte scores for each cell of that type

Cell type ASW simply scales the average silhouette width over all cell-types so that instead of taking values between -1 and 1, it takes on values between 0 and 1:

where M is the total number of cell types.

C.2 BATCH CORRECTION METRICS

Batch Average Silhouette Width. Much like Cell-type ASW, Batch ASW also measures the compactness of the predicted clusters. However, for the case of batches, we want the clusters to be spread out, so the Batch ASW formula must be adjusted accordingly so that a score of 1 reflects well-mixed batches and a score of 0 reflects poorly mixed batches. This is done by first introducing the absolute silhouette width for a cell n,

so that 0 represents a perfectly-mixed batch and any other value represents some deviation from being well-mixed.

The Batch ASW for a cell-type j is then

The overall Batch ASW is given by

Graph Connectivity The graph connectivity score represents how well the kNN graph connects cells of the same type. If there is good batch mixing, we would expect the cells of the same type to be clustered together, representing well connected same cell-type subgraphs. Conversely, when batches are not corrected for, cells of the same type could be dispersed across the latent space and not connected by the kNN graph.

This idea is formally represented by the following graph connectivity metric:

This paper is available on arxiv under CC BY-SA 4.0 DEED license.

Authors:

(1) Sarah Zhao, Department of Statistics, Stanford University, (smxzhao@stanford.edu);

(2) Aditya Ravuri, Department of Computer Science, University of Cambridge (ar847@cam.ac.uk);

(3) Vidhi Lalchand, Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard (vidrl@mit.edu);

(4) Neil D. Lawrence, Department of Computer Science, University of Cambridge (ndl21@cam.ac.uk).