Gennady Gorin, PhD – California Institute of Technology – “Stochastic foundations for single-cell RNA sequencing”


Single-cell RNA sequencing, which quantifies cell transcriptomes, has seen widespread adoption, accompanied by a proliferation of analytic methods. However, there has been relatively little systematic investigation of its best practices and their underlying assumptions, leading to challenges and discrepancies in analysis. I motivate a set of generic, principled strategies for modeling the biological and technical stochasticity in sequencing experiments, and use case studies to illustrate their prospects for the discovery and interpretation of biophysical kinetics.

Hosted by:  Dan MacDonald, Gibson Lab

Research links:


Dr. Gennady Gorin is a chemical engineer working at the exciting intersection of bioinformatics, stochastic biophysics, and statistics. He completed his Ph.D. with Lior Pachter at the California Institute of Technology, adapting theory from fluorescence transcriptomics to the unique features of single-cell RNA sequencing. Prior, he completed a B.S./B.A. at Rice University and performed transcriptional modeling research in the Golding laboratory at Baylor College of Medicine. Gennady is transitioning to industrial bioinformatics, and excited about the prospects for rigorous, physics-informed methods in method development.

All Welcome! Note this event will take place on Zoom.

Click here to be added to our mail list.

For further information about this seminar series, contact

Weiruo Zhang, PhD, Stanford University-“Integrative spatial-omics analysis of cellular architecture mediating lymph node metastasis in head and neck cancer”

Spatial biology is a new frontier that has become accessible through advances in spatial profiling technologies, such as multiplexed in situ imaging spatial proteomics, which can provide single-cell resolution up to 60 markers. In this talk, I will introduce a computational analysis pipeline that performs integrative analysis of spatial proteomics and single-cell RNA sequencing to identify clinically-relevant cellular interactions. The pipeline features (1) CELESTA, an unsupervised machine learning method for cell type identification in multiplexed spatial proteomics data; (2) a geospatial statistical method to identify cell-cell colocalizations; and (3) an integrative coupling of spatial proteomics and single-cell RNA sequencing data that identified cell-cell crosstalk associated with lymph node metastasis in head and neck cancer which we have validated through mouse model studies.


Research link:

Dr. Zhang is currently a Research Engineer at the Department of Biomedical Data Science and the Center for Cancer Systems Biology, Stanford School of Medicine. Dr. Zhang received her M.S. and Ph.D. in Electrical Engineering, both from Stanford University, with a focus on bioinformatics and developing computational algorithms for metabolomics data analysis. Her current research at Stanford primarily focuses on developing and implementing computational methods to integrate and analyze single-cell and spatial multi-omics data, such as single-cell RNA sequencing, spatial proteomics and spatial transcriptomics. Her research aims to apply quantitative approaches that bridge multi-omics, imaging, machine learning, and artificial intelligence to decipher biology for cancer progression and guide treatment responses.


Yongju Lee, PhD, Genentech – “Contextual representation of pathology, immune repertoire by transformer and graph neural network, and transcriptomic contextual embedding via single-cell foundation model”

Square-framed headshot of Yongju Lee, PhD, in red collared shirt and glasses.The graph neural network (GNN) and transformer model are two renowned neural network architectures for obtaining contextual embeddings from biomedical data. However, each model has a trade-off in terms of the required dataset for training and representation power of the model. As examples, I will discuss the TEA-graph which employs GNN to define the contextual pathological features related to cancer patients’ survival, and GRIP, which utilizes a combination of GNN and transformer to define the set of immune receptors linked to patients’ survival.

Furthermore, an interesting and complex biomedical data rich in contextual information is genomics. Similar to how vision and language research leverages a transformer-based foundation model – the model trained with datasets ranging from millions to billions through self-supervised learning, showing powerful performance for a wide range of downstream applications – nowadays, we can train a large model using ~50M single-cell RNA-seq datasets. Some initial efforts have already shown promising results in understanding genetic mechanisms through perturbation prediction and in silico perturbations. With the contextual gene embedding obtained from the model, we can even transfer gene embedding for analyzing bulk RNA-seq datasets. Aligned with these efforts, I would like to share the recent progress to obtain meaningful contextual gene embedding utilizing the transformer architecture and discuss opportunities for multi-modal training to link transcriptomics with images or text.

Research links

Yongju Lee is a Postdoctoral Fellow at Genentech Research and Early Development, under the mentorship of Aviv Regev since spring 2023. He recently earned his Ph.D. from the Department of Electrical and Computer Engineering at Seoul National University, advised by Sunghoon Kwon. His research focuses on tailoring deep learning models for various biomedical data modalities and accelerating scientific and medical discovery by interpreting the deep learning model outcomes. He has developed methods for pathology image, immune repertoire, and spatial omics data. His ongoing research involves establishing a single-cell foundation model and expanding its capabilities to include biomedical images and text data.

All Welcome! Note this event will take place on Zoom:

Click here to be added to our mail list.

For further information about this seminar series, contact