Longitudinal microbiome data sets are being generated with increasing regularity, and there is broad recognition that these studies are critical for unlocking the mechanisms through which the microbiome impacts human health and disease. However, there is a dearth of computational tools for analyzing microbiome time-series data. To address this gap, we developed an open-source software package, Microbiome Differentiable Interpretable Temporal Rule Engine (MDITRE), which implements a new highly efficient method leveraging deep-learning technologies to derive human-interpretable rules that predict host status from longitudinal microbiome data. Using semi-synthetic and a large compendium of publicly available 16S rRNA amplicon and metagenomics sequencing data sets, we demonstrate that in almost all cases, MDITRE performs on par with or better than popular uninterpretable machine learning methods, and orders-of-magnitude faster than the prior interpretable technique. MDITRE also provides a graphical user interface, which we show through case studies can be used to derive biologically meaningful interpretations linking patterns of microbiome changes over time with host phenotypes.
The rapidly emerging field of computational pathology has demonstrated promise in developing objective prognostic models from histology images. However, most prognostic models are either based on histology or genomics alone and do not address how these data sources can be integrated to develop joint image-omic prognostic models. Additionally, identifying explainable morphological and molecular descriptors from these models that govern such prognosis is of interest. We use multimodal deep learning to jointly examine pathology whole-slide images and molecular profile data from 14 cancer types. Our weakly supervised, multimodal deep-learning algorithm is able to fuse these heterogeneous modalities to predict outcomes and discover prognostic features that correlate with poor and favorable outcomes. We present all analyses for morphological and molecular correlates of patient prognosis across the 14 cancer types at both a disease and a patient level in an interactive open-access database to allow for further exploration, biomarker discovery, and feature assessment.
Chen RJ, Lu MY, Williamson DFK, Chen TY, Lipkova J, Noor Z, Shaban M, Shady M, Williams M, Joo B, Mahmood F. Pan-cancer integrative histology-genomic analysis via multimodal deep learning. Cancer Cell. 2022 Aug 8;40(8):865-878.e6. doi: 10.1016/j.ccell.2022.07.004. PMID: 35944502.
Through a $3.3M grant from the Massachusetts Life Science Center and in-kind support from Brigham and Women’s Hospital and Mass General Brigham, the BWH Massachusetts Host-Microbiome Center (MHMC) and Division of Computational Pathology will establish a new lab to develop and apply advanced AI/deep learning technologies to microbiome research. Dr. Georg Gerber, Chief of BWH Computational Pathology and co-director of the MHMC will head the new lab.
The microbiome is inherently complex and dynamic. Multi-omic data characterizing microbes in culture systems, animal models, and human populations can provide unique and complementary insights into these rich host-microbial ecosystems. However, to fully realize the potential of these data, sophisticated computational approaches are needed.
Artificial Intelligence (AI), and in particular Deep Learning (DL), are revolutionizing many fields, such as speech and image recognition. These technologies are also increasingly impacting the biomedical sciences.
The Lab aims to unleash the power of AI and DL technologies for the microbiome field.
Anchored by a dedicated large GPU with Tesla A100 nodes and CPU compute clusters, the Lab will develop custom AI/DL applications for the microbiome, deploy existing software in a managed and easy-to-use environment, and provide outreach and education to the microbiome community. The Lab will be staffed by principal investigators in the Division of Computational Pathology, as well as an application scientist and network engineers.
A joint initiative between the Brigham and Women’s Hospital (BWH) Division of Computational Pathology and the Massachusetts Host-Microbiome Center (MHMC), the Lab is funded by the Massachusetts Life Sciences Center and Brigham and Women’s Hospital/Mass General Brigham. Industry and academic users will be able to access the Lab through the MHMC’s existing core services model and through collaborations.
Clostridioides difficile infection (CDI) is the most common hospital acquired infection in the USA, with recurrence rates > 15%. Although primary CDI has been extensively linked to gut microbial dysbiosis, less is known about the factors that promote or mitigate recurrence. Using broad metabolomics data and statistics and machine learning models, Jen Dawkins, a HST PhD student and member of the Gerber lab, showed the metabolites in the gut can accurately predict C. difficile recurrence. These findings have implications for development of diagnostic tests and treatments that could ultimately short-circuit the cycle of CDI recurrence, by providing candidate metabolic biomarkers for diagnostics development, as well as offering insights into the complex microbial and metabolic alterations that are protective or permissive for recurrence.
Dawkins JJ, Allegretti JR, Gibson TE, McClure E, Delaney M, Bry L, Gerber GK. Gut metabolites predict Clostridioides difficile recurrence. Microbiome. 2022 Jun 9;10(1):87. doi: 10.1186/s40168-022-01284-1. PMID: 35681218; PMCID: PMC9178838.
Endomyocardial biopsy (EMB) screening represents the standard of care for detecting allograft rejections after heart transplant. Manual interpretation of EMBs is affected by substantial interobserver and intraobserver variability, which often leads to inappropriate treatment with immunosuppressive drugs, unnecessary follow-up biopsies and poor transplant outcomes. Here we present a deep learning-based artificial intelligence (AI) system for automated assessment of gigapixel whole-slide images obtained from EMBs, which simultaneously addresses detection, subtyping and grading of allograft rejection. To assess model performance, we curated a large dataset from the United States, as well as independent test cohorts from Turkey and Switzerland, which includes large-scale variability across populations, sample preparations and slide scanning instrumentation. The model detects allograft rejection with an area under the receiver operating characteristic curve (AUC) of 0.962; assesses the cellular and antibody-mediated rejection type with AUCs of 0.958 and 0.874, respectively; detects Quilty B lesions, benign mimics of rejection, with an AUC of 0.939; and differentiates between low-grade and high-grade rejections with an AUC of 0.833. In a human reader study, the AI system showed non-inferior performance to conventional assessment and reduced interobserver variability and assessment time. This robust evaluation of cardiac allograft rejection paves the way for clinical trials to establish the efficacy of AI-assisted EMB assessment and its potential for improving heart transplant outcomes.
Lipkova, J., Chen, T.Y., Lu, M.Y. et al. Deep learning-enabled assessment of cardiac allograft rejection from endomyocardial biopsies. Nat Med 28, 575–582 (2022). https://doi.org/10.1038/s41591-022-01709-2
Grant Abstract: With our increasing ability to measure biological data at scale and the digitalization of health records, computational thinking is becoming ever more important in the biological science and healthcare. The research directions proposed in this grant look to build robust machine learning models and tools for computational biology by including principles and analysis from other engineering fields, like control, that have a proven record of incorporating robustness into the systems they have automated. This increased robustness will save resources during the development of these machine learning models. It will also lead to more reliable diagnostics, clinical tools, and machine learning based biological discoveries. We have proposed three future research directions at the intersection of machine learning, control, and computational biology (a) modeling dynamical systems, (b) robust optimization schemes (c) control principles for in vivo modeling of microbial communities. The first proposed research area involves the development of flexible models for performing inference on dynamical systems models with time-series data. Dynamical systems models are able to learn mathematically causal relationships between variables, compared to other models whose parameters may only have correlative relationships. Our flexible models will be differentiable allowing them to be trained using the same efficient algorithms and hardware that have propelled deep learning models into the spotlight. These differentiable methods will allow for us to more easily integrate the uncertainty associated with biological measurements into our models. The second research area looks to develop more robust gradient optimization algorithms, the work horse for training deep neural networks. Many of the popular algorithms used to train deep neural networks were not explicitly designed to be robust. By developing more robust optimization techniques machine learning models trained on disparate data sets at different hospitals or labs will be more reproducible and will require less time for tuning parameters, ultimately saving resources as well. These robust optimization techniques will also aid in the certification of machine learning based tools that will ultimately be deployed in the clinic. The third research area we propose is an approach for the discovery and design of robust microbial communities. Communities of commensal, or engineered, bacteria have long been proposed as alternative therapies for the treatment of gut related illness (“bugs as drugs”). We propose a top down approach to identifying putative microbial consortia members from time-series experiments with germ free mice colonized by complex flora. By beginning the consortia design process in vivo we hope to overcome the challenge that many other attempts at consortia construction have encountered where in vitro designed communities do not reproduce their intended properties once transferred into living host organisms. The tools from this work will be built using open access software and all data will be made easily accessible and explorable to the public.
Grant Abstract: Approximately 150 million people annually experience urinary tract infections (UTI), the most common cause of which is uropathogenic Escherichia coli (UPEC). The gut is a known reservoir of UPEC, which typically reside at low abundance, but can transcend the periurethral area to invade the bladder. While the E. coli population within the gut can be diverse, it has been suggested that certain strains have a greater propensity to migrate and cause infection. This may be one driving factor to explain why half of those with an acute infection have a recurrence even after taking antibiotics that clear the first infection from the urinary tract. Being able to detect and track E. coli strains over time would have direct clinical applications for those patients who have frequent recurrences due to gut UPEC carriage. One such clinical application would be early detection and intervention before the onset of infection. Unfortunately, current metagenomic algorithms are not capable of performing strain tracking accurately enough for clinical relevance, especially for low abundance species such as E. coli. A major factor for this lack of accuracy is that all current state-of-the-art metagenomic tools completely ignore temporal dependence between samples. Even if it is known that multiple samples are from the same patient, current tools analyze those samples as if they were independent. Furthermore, many metagenomic tools ignore the sequence quality information that is provided for every nucleobase in every read. We propose to develop a more precise strain tracking algorithm that does take this additional information into account, making the tool host-time-quality aware. Finally, we will pilot and validate our algorithm on a clinically relevant gnotobiotic colonization model. Specifically, humanized germ-free mice will be undergoing two rounds of E. coli challenges with therapeutic perturbations from antibiotics or mannosides, a small molecule precision antibiotic-sparing therapeutic. We propose the following specific aims: (1) Develop the first purpose-built computational method for tracking bacterial strains in the microbiome over time, (2) Gnotobiotic mouse model undergoing UPEC challenges and a therapeutic perturbation. These aims would advance the microbiome field forward allowing for the future development of therapeutics and clinical diagnostics.
Cancer of unknown primary (CUP) origin is an enigmatic group of diagnoses in which the primary anatomical site of tumour origin cannot be determined1,2. This poses a considerable challenge, as modern therapeutics are predominantly specific to the primary tumour3. Recent research has focused on using genomics and transcriptomics to identify the origin of a tumour4–9. However, genomic testing is not always performed and lacks clinical penetration in low-resource settings. Here, to overcome these challenges, we present a deep-learning-based algorithm—Tumour Origin Assessment via Deep Learning (TOAD)—that can provide a differential diagnosis for the origin of the primary tumour using routinely acquired histology slides. We used whole-slide images of tumours with known primary origins to train a model that simultaneously identifies the tumour as primary or metastatic and predicts its site of origin. On our held-out test set of tumours with known primary origins, the model achieved a top-1 accuracy of 0.83 and a top-3 accuracy of 0.96, whereas on our external test set it achieved top-1 and top-3 accuracies of 0.80 and 0.93, respectively. We further curated a dataset of 317 cases of CUP for which a differential diagnosis was assigned. Our model predictions resulted in concordance for 61% of cases and a top-3 agreement of 82%. TOAD can be used as an assistive tool to assign a differential diagnosis to complicated cases of metastatic tumours and CUPs and could be used in conjunction with or in lieu of ancillary tests and extensive diagnostic work-ups to reduce the occurrence of CUP.
Deep-learning methods for computational pathology require either manual annotation of gigapixel whole-slide images (WSIs) or large datasets of WSIs with slide-level labels and typically suffer from poor domain adaptation and interpretability. Here we report an interpretable weakly supervised deep-learning method for data-efficient WSI processing and learning that only requires slide-level labels. The method, which we named clustering-constrained-attention multiple-instance learning (CLAM), uses attention-based learning to identify subregions of high diagnostic value to accurately classify whole slides and instance-level clustering over the identified representative regions to constrain and refine the feature space. By applying CLAM to the subtyping of renal cell carcinoma and non-small-cell lung cancer as well as the detection of lymph node metastasis, we show that it can be used to localize well-known morphological features on WSIs without the need for spatial labels, that it overperforms standard weakly supervised classification algorithms and that it is adaptable to independent test cohorts, smartphone microscopy and varying tissue content.
Lu, M.Y., Williamson, D.F.K., Chen, T.Y. et al. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat Biomed Eng 5, 555–570 (2021). https://doi.org/10.1038/s41551-020-00682-w
The Gerber lab in collaboration with the Wang lab at Columbia and the Gibson Lab at BWH have received a $2.9M grant from the National Science Foundation to develop and apply novel computational and experimental methods to elucidate fundamental rules governing the formation and maintenance of complex microbial ecosystems in the mammalian gut.
Abstract: Microbiomes, or the collections of trillions of bacteria and other micro-organisms living on, within and around us, have enormous impact on human life. For example, they help people digest food, promote the growth of farm animals and crops, and degrade pollutants in the environment. Despite the importance of microbiomes, the processes governing their formation and maintenance remain poorly understood. The mammalian gut is a particularly intriguing system for microbiome studies, since a diverse collection of microbes has evolved that specifically colonizes and functions in that environment. The goal of the project is to derive fundamental rules that describe and predict the dynamic process of microbial colonization of the mammalian gut. To achieve this goal, the team of investigators will develop new computer-based methods to automatically extract predictive and explanatory rules from large microbiome data sets. The team will also develop new experimental tools and generate data sets in mouse measuring how microbiomes change over time and across space in the mammalian gut. Overall, the project will further the understanding of the formation of microbiomes in mammals and can provide broader insights into the emergence of other microbial ecosystems, such as those in soil and marine environments. These insights could ultimately help scientists to rationally alter or maintain microbiomes in different environments to benefit human activities. The project will also generate practical resources for the scientific community (computer-based tools and datasets) and provide education on the microbiome to college and elementary school students through courses and hands-on labs.
A wealth of genomic data provides information as to which microbes are present in environments, but little insight into underlying factors that explain or predict complex assemblages of microbial consortia. This project aims to elucidate mechanistic factors that drive the dynamic process of microbial colonization of the mammalian gut. These determinants will be investigated at multiple systems scales, from the level of microbial communities down to the level of individual genes. The project will leverage high-throughput experimental methods developed by the investigators, to generate data characterizing functional genetic selection and spatial organization of microbiota in the mammalian gut. From the Computer Science perspective, the project will develop new computational methods to infer human-interpretable rules and other structured outputs from complex and noisy high-throughput microbiome datasets, using Bayesian and neural-style approaches that incorporate prior biological knowledge while scaling to massive datasets. This project has three main thrusts: 1) Learn microbial community-level rules that quantitatively predict population dynamics of mouse gut colonization and assess these rules across differing ranges of microbial diversity and composition, 2) Elucidate microbial gene-level mechanisms that predict mouse gut colonization dynamics, and 3) Profile microbial spatiotemporal organization and dynamics during gut colonization at the species and gene level to predict microbial community dynamics. The project is expected to establish a set of new computational and experimental tools and principles for understanding the rules of microbial colonization of the gut, with potential applications to other ecosystems including gut microbiota of non-mammalian species as well as complex environmental microbiota.