Multimodal Biomedical AI
A new multimodal biomedical model by Google / DeepMind - called Med-PaLM Multimodal - approaches or exceeds state of the art performance on a wide range of biomedical tasks.
In biology and medicine, we are constantly faced with a variety of data, but unfortunately, most of today's AI models can only process one type of data. This significantly limits their application in the real world. Generalist foundation models open up the possibility to develop biomedical AI systems that can handle diverse, complex data. In a new study, researchers from Google / DeepMind report having made a big step in this direction with Med-PaLM Multimodal (Med-PaLM M): An all-round model that can process various biomedical data and handle diverse tasks, while showing performance that can keep up with the best specialized models or even surpass them.
To assess the performance of the model, the authors introduce MultiMedBench, a new multimodal biomedical benchmark “spanning multiple modalities including medical imaging, clinical text and genomics with 14 diverse tasks for training and evaluating generalist biomedical AI systems”. When testing Med-PaLM M on this benchmark, they find that it reaches performance competitive with or exceeding state-of-the-art (SOTA) specialist models on various tasks.
Med-PaLM M inherits the architectures and the general domain knowledge encoded in three previous models: Pathways Language Model (PaLM), Vision Transformer (ViT), and PaLM-E. It’s then fine-tuned using MultiMedBench. Using different versions of the base models, Med-PaLM M comes in three final sizes: 12B, 84B, and 562B.
The paper then looks at three evaluation scenarios: generalist capabilities, novel emergent capabilities, and radiology report generation capability. I’m going to focus here on the first two.
The model performs relatively well on all tasks. It is either exceeding the state of the art, or performs very close to it, which is impressive for a generalist, multimodal model:
Med-PaLM M is trained in such a way that it can combine its learned knowledge to tackle new tasks, showcasing zero-shot generalization to novel medical concepts and tasks. Key evidence for this is the model’s ability to detect tuberculosis in unknown chest X-ray images, without having previously encountered image presentations of the disease:
Med-PaLM M is an impressive advancement in the field of biomedical AI, capable of interpreting a variety of medical modalities and even beginning to master unknown concepts. Nevertheless, challenges remain, including scaling models in the face of scarce medical data and the need for extensive multimodal data sets.
From a patient perspective, it hardly matters whether accurate diagnoses are made with generalist models or specialized models. However, the potential of connecting the dots among health issues that have previously been regarded as unconnected is very exciting, and one of the many reasons why I’m watching this space very closely.
Notably, Google / DeepMind have not released the model or the weights, and we thus have to take this report at face value, without being able to independently verify the results. While I would generally have high trust in Google / DeepMind to accurately report on the facts, science advances by everyone being able to check each other’s results. I assume this will happen here sooner or later, but until then, we should keep this in mind when discussing the paper.
CODA
This is a newsletter with two subscription types. You can learn why here.
To stay in touch, here are other ways to find me:
Writing: I write another Substack on digital developments in health, called Digital Epidemiology.