Dr. Camilo Valdes
Lawrence Livermore National Labs
Lecture Information
CASE 241
2025-01-24 14:00:00
Abstract
Deep neural networks are a proven technique for working with high dimensional data sets because of their ability to draw-out meaningful patterns and create abstract low-dimensional representations known as deep embeddings. The embeddings make it easier to work with learning tasks on large inputs as they capture the semantics and variance of the data by placing semantically similar inputs close together in the embedding space. An example of such inputs are microbiome abundance profiles that report what microbes are present, their quantities, and taxonomic lineages. The microbes are critical influencers of host physiology and impact a range of health conditions relevant to military service, including gastrointestinal health.
In this talk I’ll describe our work developing late-fusion machine learning models for creating low-dimensional embeddings of human-gut abundance profiles. Our fusion model is comprised of custom vision transformer and convolutional architectures, which along with hierarchical feature engineering is trained on large sets of disease data. The model is fine-tuned on a data set of subjects with traveler's diarrhea, a disease of high military relevance. We profiled 12,190 human microbiome samples, spanning 75 studies, 19 diseases, 31K microbial species, and over 200K microbial strains. The profiles are converted into 2D color image representations using a Hilbert curve visualization, and we train our models for feature selection in a hierarchical feature space aimed at identifying a reduced set of informative microbes. The fusion layers are specifically trained to classify profiles in the context of disease status, type, and geographical location. Embedding-derived clusters are generated and goodness of fit evaluated for new data sets unseen during training. Numerous aspects of military deployment exert unique changes on the human gut microbiome as service members experience stresses. Understanding these changes could provide military clinicians with improved tools for assessing and predicting health, and an embeddings model is an efficient way of classifying health states.
Biography
I’m a computer scientist at Lawrence Livermore National Laboratory (LLNL) working in the Physical Life Sciences directorate (PLS), Biosciences and Biotechnology Division (BBTD). At LLNL, my work focuses on developing machine learning models and analytical methods that identify and characterize meaningful features in high dimensional data sets. My projects include developing multi-task optimizations for in- silico antibody design and molecule microenvironment feature representation, developing computer vision models for characterizing metagenomics data sets, and wastewater simulations.
Disclaimer
This effort was supported by the Lawrence Livermore National Laboratory, Laboratory Directed Research and Development program. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. LLNL Release Number: LLNL-PRES-867369