Cell identity drives cell-cell communication and tissue architecture and is in return regulated by cell-extrinsic cues. Cell identity is determined by the combination of intrinsic developmentally established transcription factor use (TF) and constitutive as well as cell communication-dependent TF activities. Presented work shows two probabilistic models that we developed to advance the understanding of these processes using single-cell and spatial genomic data.
Spatial transcriptomic technologies promise to resolve cellular wiring diagrams of tissues in health and disease, but comprehensive mapping of cell types in situ remains a challenge. Here we present cell2location, a Bayesian model that can resolve fine-grained cell types in spatial transcriptomic data and create comprehensive cellular maps of diverse tissues. Cell2location accounts for technical sources of variation and borrows statistical strength across locations, thereby enabling the integration of single cell and spatial transcriptomics with higher sensitivity and resolution than existing tools. We assess cell2location in three different tissues and demonstrate improved mapping of fine-grained cell types. In the mouse brain, we discover fine regional astrocyte subtypes across the thalamus and hypothalamus. In the human lymph node, we spatially map a rare pre-germinal center B cell population. In the human gut, we resolve fine immune cell populations in lymphoid follicles. Collectively our results present cell2location as a versatile analysis tool for mapping tissue architectures in a comprehensive manner.
Python package is provided here: https://github.com/BayraktarLab/cell2location.
Cell identity and plasticity is regulated by a combinatorial code mediated by transcription factors and the cell communication environment. Systematically dissecting how the regulatory code robustly defines the vast complexity of cell populations across tissues is a long-standing challenge. Measured using the assay for transposase-accessible chromatin with sequencing (ATAC-seq), DNA accessibility provides a readout of intermediate gene regulation steps at single-cell resolution, with technologies measuring both RNA and ATAC providing the necessary evidence to build mechanistic models of regulation. Existing methods address one or several subproblems of modelling DNA accessibility. For example, the DNA sequence-based deep learning models represent combinatorial interactions and in-vivo TF-DNA recognition preferences. In contrast, GRN models use TF abundance profiles across cells and in-vitro-derived TF-DNA recognition preferences, optionally incorporating ATAC-seq data as a filter. All models learn cell-type specific weights and properties and don’t generalize to new TF abundance states such as new cell types. Therefore, we are missing an end-to-end mechanistic model that represents all steps of the biological process, that generalizes to both new DNA sequences and TF abundance combinations and can simultaneously characterize hundreds to thousands of cell states observed in single-cell genomics atlases. Here, we formulated cell2state, a mechanistic end-to-end probabilistic model of TF recruitment to a chromatin locus and downstream TF effect on DNA accessibility. Cell2state is designed to achieve the generalization of regulatory predictions to unseen cell types. Cell2state A) estimates TF nuclear protein abundance and models B) how TFs recognize DNA, C) how TF sites in DNA lead to TF recruitment to a chromatin locus, D) how the activity of DNA-associated TFs affects chromatin accessibility. To evaluate generalization, we defined the computational problem and developed a workflow for predicting the scATAC-seq readout for previously unseen chromosomes and cell types. We show that cell2state outperforms the state-of-the-art deep learning models (ChromDragoNN) at explaining DNA accessibility differences across cells. Finally, to look at cell state plasticity, we developed ways to use cell2state to simulate the possible chromatin states given TF abundance of source cell types.
Speaker: Vitalii Kleshchevnikov, PhD
Affiliation: Wellcome Sanger Institute
Position: Bioinformatician @ Bayraktar, Stegle, Teichmann group
Host: Daniel MacDonald, Gibson Lab
Date: Monday February 26, 2024
Time: 10:00AM-11:00AM ET
Meeting ID: 821 6367 6866
Vitalii Kleshchevnikov is driven by a deep interest in three key areas: i) understanding the regulatory code which allows a single genome to specify the full diversity of cell populations and their interaction, ii) formalizing the biology of these processes into mechanistic AI/ML models, and iii) accelerating the therapy development to address ageing alterations in these processes. Vitalii did his PhD jointly supervised by Dr Omer Bayraktar, Dr Oliver Stegle, Dr Sarah Teichmann at Wellcome Sanger Institute (2018-2023) and will present the published and ongoing work. Prior to PhD, Vitalii worked on the role of peptide motifs (SLiMs) in intracellular signaling (Dr Evangelia Petsalaki, EMBL-EBI), predicting CRISR KO mutational outcomes (Dr Leopold Parts, Wellcome Sanger Institute) and profiling protein interactions in accelerated ageing (A*STAR) – while completing MSc and BSc in Kyiv, Ukraine.