Gerber Lab at ICML Workshop on Computational Biology 2023

Gerber Lab at ICML Workshop on Computational Biology 2023

The ICML Workshop on Computational Biology (WCB) highlights how ML approaches can be tailored to making both translational and basic scientific discoveries with biological data, such as genetic sequences, cellular features or protein structures and imaging datasets, among others. It aims to bring together interdisciplinary ML researchers working in areas such as computational genomics; neuroscience; metabolomics; proteomics; bioinformatics; cheminformatics; pathology; radiology; evolutionary biology; population genomics; phenomics; ecology, cancer biology; causality; representation learning and disentanglement to present recent advances and open questions to the machine learning community.

The Gerber Lab had the following two papers accepted:

Gerber GK, Bhattarai SK, Du M, Glickman MS, Bucci V. Discovery of Host-Microbiome Interactions Using Multi-Modal, Sparse, Time-Aware, Bayesian Network-Structured Neural Topic Models. International Conference on Machine Learning Workshop on Computational Biology, 2023.

Uppal G, Urtecho G, Richardson M, Moody T, Wang HH, Gerber GK. MC-SPACE: Microbial communities from spatially associated counts engine. International Conference on Machine Learning Workshop on Computational Biology, 2023.

 

Algorithmic fairness in artificial intelligence for medicine and healthcare: Nature Biomedical Engineering

Algorithmic fairness in artificial intelligence for medicine and healthcare: Nature Biomedical Engineering

In healthcare, the development and deployment of insufficiently fair systems of artificial intelligence (AI) can undermine the delivery of equitable care. Assessments of AI models stratified across subpopulations have revealed inequalities in how patients are diagnosed, treated and billed. In this Perspective, we outline fairness in machine learning through the lens of healthcare, and discuss how algorithmic biases (in data acquisition, genetic variation and intra-observer labelling variability, in particular) arise in clinical workflows and the resulting healthcare disparities. We also review emerging technology for mitigating biases via disentanglement, federated learning and model explainability, and their role in the development of AI-based software as a medical device.

Chen RJ, Wang JJ, Williamson DFK, Chen TY, Lipkova J, Lu MY, Sahai S, Mahmood F. Algorithmic fairness in artificial intelligence for medicine and healthcare. Nat Biomed Eng. 2023 06; 7(6):719-742. PMID: 37380750.

Mahmood Lab develops deep learning model for transforming tissue images: Nature Biomedical Engineering 2022

Mahmood Lab develops deep learning model for transforming tissue images: Nature Biomedical Engineering 2022

Histological artefacts in cryosectioned tissue can hinder rapid diagnostic assessments during surgery. Formalin-fixed and paraffin-embedded (FFPE) tissue provides higher quality slides, but the process for obtaining them is laborious (typically lasting 12–48 h) and hence unsuitable for intra-operative use. Here we report the development and performance of a deep-learning model that improves the quality of cryosectioned whole-slide images by transforming them into the style of whole-slide FFPE tissue within minutes. The model consists of a generative adversarial network incorporating an attention mechanism that rectifies cryosection artefacts and a self-regularization constraint between the cryosectioned and FFPE images for the preservation of clinically relevant features. Transformed FFPE-style images of gliomas and of non-small-cell lung cancers from a dataset independent from that used to train the model improved the rates of accurate tumour subtyping by pathologists.

Ozyoruk, K.B., Can, S., Darbaz, B. et al. A deep-learning model for transforming the style of tissue images from cryosectioned to formalin-fixed and paraffin-embedded. Nat. Biomed. Eng 6, 1407–1419 (2022). https://doi.org/10.1038/s41551-022-00952-9

Mahmood Lab develops self-supervised deep learning algorithm: Nature Biomedical Engineering 2022

Mahmood Lab develops self-supervised deep learning algorithm: Nature Biomedical Engineering 2022

The adoption of digital pathology has enabled the curation of large repositories of gigapixel whole-slide images (WSIs). Computationally identifying WSIs with similar morphologic features within large repositories without requiring supervised training can have significant applications. However, the retrieval speeds of algorithms for searching similar WSIs often scale with the repository size, which limits their clinical and research potential. Here we show that self-supervised deep learning can be leveraged to search for and retrieve WSIs at speeds that are independent of repository size. The algorithm, which we named SISH (for self-supervised image search for histology) and provide as an open-source package, requires only slide-level annotations for training, encodes WSIs into meaningful discrete latent representations and leverages a tree data structure for fast searching followed by an uncertainty-based ranking algorithm for WSI retrieval. We evaluated SISH on multiple tasks (including retrieval tasks based on tissue-patch queries) and on datasets spanning over 22,000 patient cases and 56 disease subtypes. SISH can also be used to aid the diagnosis of rare cancer types for which the number of available WSIs is often insufficient to train supervised deep-learning models.

Chen, C., Lu, M.Y., Williamson, D.F.K. et al. Fast and scalable search of whole-slide images via self-supervised deep learning. Nat. Biomed. Eng 6, 1420–1434 (2022). https://doi.org/10.1038/s41551-022-00929-8

Gerber Lab’s “MDITRE: Scalable and Interpretable Machine Learning for Predicting Host Status from Temporal Microbiome Dynamics” is mSystems Editor’s Pick

Gerber Lab’s “MDITRE: Scalable and Interpretable Machine Learning for Predicting Host Status from Temporal Microbiome Dynamics” is mSystems Editor’s Pick

Longitudinal microbiome data sets are being generated with increasing regularity, and there is broad recognition that these studies are critical for unlocking the mechanisms through which the microbiome impacts human health and disease. However, there is a dearth of computational tools for analyzing microbiome time-series data. To address this gap, we developed an open-source software package, Microbiome Differentiable Interpretable Temporal Rule Engine (MDITRE), which implements a new highly efficient method leveraging deep-learning technologies to derive human-interpretable rules that predict host status from longitudinal microbiome data. Using semi-synthetic and a large compendium of publicly available 16S rRNA amplicon and metagenomics sequencing data sets, we demonstrate that in almost all cases, MDITRE performs on par with or better than popular uninterpretable machine learning methods, and orders-of-magnitude faster than the prior interpretable technique. MDITRE also provides a graphical user interface, which we show through case studies can be used to derive biologically meaningful interpretations linking patterns of microbiome changes over time with host phenotypes. 

Mahmood Lab’s Pan-cancer integrative histology-genomic analysis is featured on cover of Cancer Cell

Mahmood Lab’s Pan-cancer integrative histology-genomic analysis is featured on cover of Cancer Cell

The rapidly emerging field of computational pathology has demonstrated promise in developing objective prognostic models from histology images. However, most prognostic models are either based on histology or genomics alone and do not address how these data sources can be integrated to develop joint image-omic prognostic models. Additionally, identifying explainable morphological and molecular descriptors from these models that govern such prognosis is of interest. We use multimodal deep learning to jointly examine pathology whole-slide images and molecular profile data from 14 cancer types. Our weakly supervised, multimodal deep-learning algorithm is able to fuse these heterogeneous modalities to predict outcomes and discover prognostic features that correlate with poor and favorable outcomes. We present all analyses for morphological and molecular correlates of patient prognosis across the 14 cancer types at both a disease and a patient level in an interactive open-access database to allow for further exploration, biomarker discovery, and feature assessment.

Chen RJ, Lu MY, Williamson DFK, Chen TY, Lipkova J, Noor Z, Shaban M, Shady M, Williams M, Joo B, Mahmood F. Pan-cancer integrative histology-genomic analysis via multimodal deep learning. Cancer Cell. 2022 Aug 8;40(8):865-878.e6. doi: 10.1016/j.ccell.2022.07.004. PMID: 35944502.

Gerber lab study showing gut metabolites predict C. diff recurrence

Gerber lab study showing gut metabolites predict C. diff recurrence

Clostridioides difficile infection (CDI) is the most common hospital acquired infection in the USA, with recurrence rates > 15%. Although primary CDI has been extensively linked to gut microbial dysbiosis, less is known about the factors that promote or mitigate recurrence. Using broad metabolomics data and statistics and machine learning models, Jen Dawkins, a HST PhD student and member of the Gerber lab, showed the metabolites in the gut can accurately predict C. difficile recurrence. These findings have implications for development of diagnostic tests and treatments that could ultimately short-circuit the cycle of CDI recurrence, by providing candidate metabolic biomarkers for diagnostics development, as well as offering insights into the complex microbial and metabolic alterations that are protective or permissive for recurrence.

Dawkins JJ, Allegretti JR, Gibson TE, McClure E, Delaney M, Bry L, Gerber GK. Gut metabolites predict Clostridioides difficile recurrence. Microbiome. 2022 Jun 9;10(1):87. doi: 10.1186/s40168-022-01284-1. PMID: 35681218; PMCID: PMC9178838.

Mahmood Lab’s deep learning-enabled assessment of cardiac transplant rejection study is published in Nature Medicine

Mahmood Lab’s deep learning-enabled assessment of cardiac transplant rejection study is published in Nature Medicine

Endomyocardial biopsy (EMB) screening represents the standard of care for detecting allograft rejections after heart transplant. Manual interpretation of EMBs is affected by substantial interobserver and intraobserver variability, which often leads to inappropriate treatment with immunosuppressive drugs, unnecessary follow-up biopsies and poor transplant outcomes. Here we present a deep learning-based artificial intelligence (AI) system for automated assessment of gigapixel whole-slide images obtained from EMBs, which simultaneously addresses detection, subtyping and grading of allograft rejection. To assess model performance, we curated a large dataset from the United States, as well as independent test cohorts from Turkey and Switzerland, which includes large-scale variability across populations, sample preparations and slide scanning instrumentation. The model detects allograft rejection with an area under the receiver operating characteristic curve (AUC) of 0.962; assesses the cellular and antibody-mediated rejection type with AUCs of 0.958 and 0.874, respectively; detects Quilty B lesions, benign mimics of rejection, with an AUC of 0.939; and differentiates between low-grade and high-grade rejections with an AUC of 0.833. In a human reader study, the AI system showed non-inferior performance to conventional assessment and reduced interobserver variability and assessment time. This robust evaluation of cardiac allograft rejection paves the way for clinical trials to establish the efficacy of AI-assisted EMB assessment and its potential for improving heart transplant outcomes.

Lipkova, J., Chen, T.Y., Lu, M.Y. et al. Deep learning-enabled assessment of cardiac allograft rejection from endomyocardial biopsies. Nat Med 28, 575–582 (2022). https://doi.org/10.1038/s41591-022-01709-2

Mahmood Lab’s study on AI-based cancer origin prediction using conventional histology is published in Nature

Mahmood Lab’s study on AI-based cancer origin prediction using conventional histology is published in Nature

Cancer of unknown primary (CUP) origin is an enigmatic group of diagnoses in which the primary anatomical site of tumour origin cannot be determined1,2. This poses a considerable challenge, as modern therapeutics are predominantly specific to the primary tumour3. Recent research has focused on using genomics and transcriptomics to identify the origin of a tumour4–9. However, genomic testing is not always performed and lacks clinical penetration in low-resource settings. Here, to overcome these challenges, we present a deep-learning-based algorithm—Tumour Origin Assessment via Deep Learning (TOAD)—that can provide a differential diagnosis for the origin of the primary tumour using routinely acquired histology slides. We used whole-slide images of tumours with known primary origins to train a model that simultaneously identifies the tumour as primary or metastatic and predicts its site of origin. On our held-out test set of tumours with known primary origins, the model achieved a top-1 accuracy of 0.83 and a top-3 accuracy of 0.96, whereas on our external test set it achieved top-1 and top-3 accuracies of 0.80 and 0.93, respectively. We further curated a dataset of 317 cases of CUP for which a differential diagnosis was assigned. Our model predictions resulted in concordance for 61% of cases and a top-3 agreement of 82%. TOAD can be used as an assistive tool to assign a differential diagnosis to complicated cases of metastatic tumours and CUPs and could be used in conjunction with or in lieu of ancillary tests and extensive diagnostic work-ups to reduce the occurrence of CUP.

Lu, M.Y., Chen, T.Y., Williamson, D.F.K. et al. AI-based pathology predicts origins for cancers of unknown primary. Nature 594, 106–110 (2021). https://doi.org/10.1038/s41586-021-03512-4

Mahmood Lab’s CLAM method, A Deep-Learning-based Pipeline for Data Efficient and Weakly Supervised Whole-Slide-level Analysis, published in Nature Biomedical Engineering

Mahmood Lab’s CLAM method, A Deep-Learning-based Pipeline for Data Efficient and Weakly Supervised Whole-Slide-level Analysis, published in Nature Biomedical Engineering

Deep-learning methods for computational pathology require either manual annotation of gigapixel whole-slide images (WSIs) or large datasets of WSIs with slide-level labels and typically suffer from poor domain adaptation and interpretability. Here we report an interpretable weakly supervised deep-learning method for data-efficient WSI processing and learning that only requires slide-level labels. The method, which we named clustering-constrained-attention multiple-instance learning (CLAM), uses attention-based learning to identify subregions of high diagnostic value to accurately classify whole slides and instance-level clustering over the identified representative regions to constrain and refine the feature space. By applying CLAM to the subtyping of renal cell carcinoma and non-small-cell lung cancer as well as the detection of lymph node metastasis, we show that it can be used to localize well-known morphological features on WSIs without the need for spatial labels, that it overperforms standard weakly supervised classification algorithms and that it is adaptable to independent test cohorts, smartphone microscopy and varying tissue content.

Lu, M.Y., Williamson, D.F.K., Chen, T.Y. et al. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat Biomed Eng 5, 555–570 (2021). https://doi.org/10.1038/s41551-020-00682-w