Encoding Biological Knowledge in GPLVM Kernels for scRNA-seq

Written by amortize | Published 2025/05/20
Tech Story Tags: gplvm-kernels | scrna-seq-analysis | batch-effect-correction | cell-cycle-analysis | kernel-design | single-cell-genomics | interpretable-models | batch-correction

TLDRLearn how specialized kernel designs in GPLVMs can incorporate prior biological knowledge, like batch effects and cell-cycle phasesvia the TL;DR App

Table of Links

Abstract and 1. Introduction

2. Background

2.1 Amortized Stochastic Variational Bayesian GPLVM

2.2 Encoding Domain Knowledge through Kernels

3. Our Model and Pre-Processing and Likelihood

3.2 Encoder

4. Results and Discussion and 4.1 Each Component is Crucial to Modifies Model Performance

4.2 Modified Model achieves Significant Improvements over Standard Bayesian GPLVM and is Comparable to SCVI

4.3 Consistency of Latent Space with Biological Factors

4. Conclusion, Acknowledgement, and References

A. Baseline Models

B. Experiment Details

C. Latent Space Metrics

D. Detailed Metrics

2.2 ENCODING DOMAIN KNOWLEDGE THROUGH KERNELS

A key benefit of using GPLVMs is that we can encode prior information into the generative model, especially through the kernel design, allowing for more interpretable latent spaces and less training data. Here, we highlight kernels tailored to scRNA-seq data that correct for batch and cell-cycle nuisance factors as introduced by Lalchand et al. (2022a).

Batch correction kernel formulation In order to correct for confounding batch effects through the GP formulation, Lalchand et al. (2022a) proposed the following kernel structure with an additive linear kernel term to capture random effects:

Cell-cycle phase kernel When certain genes strongly reflect cell-cycle phase effects, obscuring key biological factors, a kernel designed to explicitly address a cell-cycle latent variable can effectively mitigate these effects. This motivates the use of adding a periodic kernel to the above kernel formulation. In particular, we specify the first latent dimension as a proxy for cell-cycle information and model our kernel as:

This paper is available on arxiv under CC BY-SA 4.0 DEED license.

Authors:

(1) Sarah Zhao, Department of Statistics, Stanford University, (smxzhao@stanford.edu);

(2) Aditya Ravuri, Department of Computer Science, University of Cambridge (ar847@cam.ac.uk);

(3) Vidhi Lalchand, Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard (vidrl@mit.edu);

(4) Neil D. Lawrence, Department of Computer Science, University of Cambridge (ndl21@cam.ac.uk).


Written by amortize | Spreading costs over time, breaking down big payments into smaller bits, managing debt and assets.
Published by HackerNoon on 2025/05/20