Interpretable Latent Spaces: BGPLVM & Biological Factors

by Amortize2mMay 21st, 2025
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Discover how our BGPLVM incorporates expert-labeled biological information, like cell cycle and treatment conditions
featured image - Interpretable Latent Spaces: BGPLVM & Biological Factors
Amortize HackerNoon profile picture
0-item

Abstract and 1. Introduction

2. Background

2.1 Amortized Stochastic Variational Bayesian GPLVM

2.2 Encoding Domain Knowledge through Kernels

3. Our Model and Pre-Processing and Likelihood

3.2 Encoder

4. Results and Discussion and 4.1 Each Component is Crucial to Modifies Model Performance

4.2 Modified Model achieves Significant Improvements over Standard Bayesian GPLVM and is Comparable to SCVI

4.3 Consistency of Latent Space with Biological Factors

4. Conclusion, Acknowledgement, and References

A. Baseline Models

B. Experiment Details

C. Latent Space Metrics

D. Detailed Metrics

4.3 CONSISTENCY OF LATENT SPACE WITH BIOLOGICAL FACTORS

An advantage of our model is the ability to incorporate biologically interpretable data to boost latent space interpretability and overall performance. In particular, we compared our learned latent space with previous expert-labelled inferences on the innate immunity dataset in Kumasaka et al. (2021). Pretraining on well-initialized latents and finetuning our model with a PerSE-ARD+Linear kernel allowed us to recover latents consistent with those inferred and biologically motivated in Kotliar et al. (2019) (Figure 4 (top row)) while also separating cells by their treatment conditions (Figure 4 (bottom row)). Moreover, as indicated by the color gradations in the right two UMAP plots in the bottom row, the model’s learned latent space is able to distinguish immune response pseudotime directions. This shows how initializations can be done on the amortized BGPLVM encoder-decoder models.


Figure 4: (Top row) Plots of log means and log variances (both parametrized by the same GP) versus learned cell-cycle pseudotime dimension for three specific genes (UBE2C, CDC6, FN1). The squares depict log variances and the circles depict log means of the library normalized data, both colored by the phases annotated in Kumasaka et al. (2021). We see that our model’s learned cell-cycle phases correspond roughly to the phases labelled in Kumasaka et al. (2021). (Bottom row) UMAP plots of our model’s learned latent space excluding directions identified with hidden technical effects (e.g. batch and plate border effects). Cells are colored by treatment condition (left), primary (middle) and secondary (right) pseudotime directions.


This paper is available on arxiv under CC BY-SA 4.0 DEED license.

Authors:

(1) Sarah Zhao, Department of Statistics, Stanford University, (smxzhao@stanford.edu);

(2) Aditya Ravuri, Department of Computer Science, University of Cambridge (ar847@cam.ac.uk);

(3) Vidhi Lalchand, Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard (vidrl@mit.edu);

(4) Neil D. Lawrence, Department of Computer Science, University of Cambridge (ndl21@cam.ac.uk).


Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks