115 reads

Amortized BGPLVM: Improved Dimensionality Reduction for scRNA-seq

by AmortizeMay 21st, 2025
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

paper introduces an amortized BGPLVM that significantly improves single-cell RNA-seq dimensionality reduction by adapting to count data, batch effects, and library size.

People Mentioned

Mention Thumbnail
Mention Thumbnail

Company Mentioned

Mention Thumbnail

Coins Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - Amortized BGPLVM: Improved Dimensionality Reduction for scRNA-seq
Amortize HackerNoon profile picture
0-item

Abstract and 1. Introduction

2. Background

2.1 Amortized Stochastic Variational Bayesian GPLVM

2.2 Encoding Domain Knowledge through Kernels

3. Our Model and Pre-Processing and Likelihood

3.2 Encoder

4. Results and Discussion and 4.1 Each Component is Crucial to Modifies Model Performance

4.2 Modified Model achieves Significant Improvements over Standard Bayesian GPLVM and is Comparable to SCVI

4.3 Consistency of Latent Space with Biological Factors

4. Conclusion, Acknowledgement, and References

A. Baseline Models

B. Experiment Details

C. Latent Space Metrics

D. Detailed Metrics

5 CONCLUSION

This paper identifies a misalignment in the generative model of current GPLVMs used in single-cell data and proposes an amortized BGPLVM better adapted to the scRNA-seq dimensionality reduction setting. In particular, by drawing insight from commonly used single-cell-specific methods, including scVI, LDVAE, and Splatter single-cell simulations, our proposed model tackles three main aspects of single-cell data by (1) accounting for count data with an approximate Poisson likelihood, (2) incorporating batch effect modelling in both the encoder and GP kernel, and (3) normalizing the library size in the data via a pre-processing step. We demonstrate the importance of aligning modelling choices to domain-specific knowledge as the model achieves comparable performance to scVI on both a simulated dataset and real-world COVID dataset in both UMAP visualizations and commonly used latent space metrics.

ACKNOWLEDGMENTS

The authors would like to thank Emma Dann, Natsuhiko Kumasaka and the rest of the team at Sanger for help and guidance with our initial project and for providing the data and code, which we based this study on. AR is supported by the accelerate programme for scientific discovery. During the time of this work, SZ was supported by the Churchill Scholarship.

REFERENCES

Sumon Ahmed, Magnus Rattray, and Alexis Boukouvalas. Grandprix: scaling up the bayesian gplvm for single-cell data. Bioinformatics, 35(1):47–54, 2019.


Florian Buettner, Kedar N Natarajan, F Paolo Casale, Valentina Proserpio, Antonio Scialdone, Fabian J Theis, Sarah A Teichmann, John C Marioni, and Oliver Stegle. Computational analysis of cell-to-cell heterogeneity in single-cell rna-sequencing data reveals hidden subpopulations of cells. Nature biotechnology, 33(2):155–160, 2015.


Kieran Campbell and Christopher Yau. Bayesian gaussian process latent variable models for pseudotime inference in single-cell rna-seq data. bioRxiv, pp. 026872, 2015.


Junyue Cao, Malte Spielmann, Xiaojie Qiu, Xingfan Huang, Daniel M Ibrahim, Andrew J Hill, Fan Zhang, Stefan Mundlos, Lena Christiansen, Frank J Steemers, et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature, 566(7745):496–502, 2019.


Graham Heimberg, Rajat Bhatnagar, Hana El-Samad, and Matt Thomson. Low dimensionality in gene expression data enables the accurate extraction of transcriptional programs from shallow sequencing. Cell systems, 2(4):239–250, 2016.


James Hensman, Nicolo Fusi, and Neil D Lawrence. Gaussian processes for big data. arXiv preprint arXiv:1309.6835, 2013.


Brian Hie, Joshua Peters, Sarah K Nyquist, Alex K Shalek, Bonnie Berger, and Bryan D Bryson. Computational methods for single-cell rna sequencing. Annual Review of Biomedical Data Science, 3:339–364, 2020.


Matthew D Hoffman, David M Blei, Chong Wang, and John Paisley. Stochastic variational inference. Journal of Machine Learning Research, 2013.


Dylan Kotliar, Adrian Veres, M Aurel Nagy, Shervin Tabrizi, Eran Hodis, Douglas A Melton, and Pardis C Sabeti. Identifying gene expression programs of cell-type identity and cellular activity with single-cell rna-seq. Elife, 8:e43803, 2019.


Natsuhiko Kumasaka, Raghd Rostom, Ni Huang, Krzysztof Polanski, Kerstin B Meyer, Sharad Patel, Rachel Boyd, Celine Gomez, Sam N Barnett, Nikolaos I Panousis, et al. Mapping interindividual dynamics of innate immune response at single-cell resolution. bioRxiv, pp. 2021–09, 2021.


Vidhi Lalchand, Aditya Ravuri, Emma Dann, Natsuhiko Kumasaka, Dinithi Sumanaweera, Rik GH Lindeboom, Shaista Madad, Sarah A Teichmann, and Neil D Lawrence. Modelling technical and biological effects in scrna-seq data with scalable gplvms. arXiv preprint arXiv:2209.06716, 2022a.


Vidhi Lalchand, Aditya Ravuri, and Neil D Lawrence. Generalised gplvm with stochastic variational inference. In International Conference on Artificial Intelligence and Statistics, pp. 7841–7864. PMLR, 2022b.


Neil D Lawrence. Gaussian process models for visualisation of high dimensional data. Advances in Neural Information Processing Systems, 2004.


Romain Lopez, Jeffrey Regier, Michael B Cole, Michael I Jordan, and Nir Yosef. Deep generative modeling for single-cell transcriptomics. Nature methods, 15(12):1053–1058, 2018.


Malte D Luecken and Fabian J Theis. Current best practices in single-cell rna-seq analysis: a tutorial. Molecular systems biology, 15(6):e8746, 2019.


Malte D Luecken, Maren Buttner, Kridsadakorn Chaichoompu, Anna Danese, Marta Interlandi, ¨ Michaela F Muller, Daniel C Strobl, Luke Zappia, Martin Dugas, Maria Colom ¨ e-Tatch ´ e, et al. ´ Benchmarking atlas-level data integration in single-cell genomics. Nature methods, 19(1):41–50, 2022.


Aaron T.L. Lun, Karsten Bach, and John C. Marioni. Pooling across cells to normalize singlecell rna sequencing data with many zero counts. Genome Biology, 17(75), 2016. doi: https: //doi.org/10.1186/s13059-016-0947-7.


Leland McInnes, John Healy, and James Melville. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426, 2018.


Daniel T Montoro, Adam L Haber, Moshe Biton, Vladimir Vinarsky, Brian Lin, Susan E Birket, Feng Yuan, Sijia Chen, Hui Min Leung, Jorge Villoria, et al. A revised airway epithelial hierarchy includes cftr-expressing ionocytes. Nature, 560(7718):319–324, 2018.


Lindsey W Plasschaert, Rapolas Zilionis, Rayman Choo-Wing, Virginia Savova, Judith Knehr, ˇ Guglielmo Roma, Allon M Klein, and Aron B Jaffe. A single-cell atlas of the airway epithelium reveals the cftr-rich pulmonary ionocyte. Nature, 560(7718):377–381, 2018.


Emily Stephenson, Gary Reynolds, Rachel A Botting, Fernando J Calero-Nieto, Michael D Morgan, Zewen Kelvin Tuong, Karsten Bach, Waradon Sungnak, Kaylee B Worlock, Masahiro Yoshida, et al. Single-cell multi-omics analysis of the immune response in covid-19. Nature medicine, 27 (5):904–916, 2021.


Valentine Svensson, Roser Vento-Tormo, and Sarah A Teichmann. Exponential scaling of single-cell rna-seq in the past decade. Nature protocols, 13(4):599–604, 2018.


Valentine Svensson, Adam Gayoso, Nir Yosef, and Lior Pachter. Interpretable factor models of single-cell rna-seq via variational autoencoders. Bioinformatics, 36(11):3418–3421, 2020.


Amos Tanay and Aviv Regev. Scaling single-cell genomics from phenomenology to mechanism. Nature, 541(7637):331–338, 2017.


Vincent A Traag, Ludo Waltman, and Nees Jan Van Eck. From louvain to leiden: guaranteeing well-connected communities. Scientific reports, 9(1):5233, 2019.


Archit Verma and Barbara E Engelhardt. A robust nonlinear low-dimensional manifold for single cell rna-seq data. BMC bioinformatics, 21(1):1–15, 2020.


F Alexander Wolf, Philipp Angerer, and Fabian J Theis. Scanpy: large-scale single-cell gene expression data analysis. Genome biology, 19:1–5, 2018.


Luke Zappia, Belinda Phipson, and Alicia Oshlack. Splatter: simulation of single-cell rna sequencing data. Genome biology, 18(1):174, 2017.


This paper is available on arxiv under CC BY-SA 4.0 DEED license.

Authors:

(1) Sarah Zhao, Department of Statistics, Stanford University, (smxzhao@stanford.edu);

(2) Aditya Ravuri, Department of Computer Science, University of Cambridge (ar847@cam.ac.uk);

(3) Vidhi Lalchand, Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard (vidrl@mit.edu);

(4) Neil D. Lawrence, Department of Computer Science, University of Cambridge (ndl21@cam.ac.uk).


Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks