Finally, we instruction-tune Chameleon to create the MedMax-7B model. We show that our model achieves SOTA performance on multiple downstream VQA tasks and beats GPT-4o and LLaVA-Med-1.5 by a large margin.
19.12.2024 18:19
π 0
π 0
π¬ 1
π 0
We also found that there is a general lack of support for multimodal biomedical evaluation. To address this, we create a robust evaluation suite consisting of visual question answering, captioning, generation, and visual chat. We make this suite publicly available.
19.12.2024 18:18
π 0
π 0
π¬ 1
π 0
Overall, MedMax covers a breadth of skills and knowledge bases that will be useful for a capable biomedical assistant. We illustrate the diversity of our dataset here:
19.12.2024 18:18
π 0
π 0
π¬ 1
π 0
We curate high-quality multimodal biomedical data from medical papers and YouTube to support tasks like image captioning, generation, visual chat, multimodal content creation, and report understanding across biomedical domains.
19.12.2024 18:18
π 0
π 0
π¬ 1
π 0
Firstly, we create MedMax-Instruct, a synthetic data that allows interleaved generation conditions for diverse domains (radiology, histopathology). In particular, we utilize the knowledge in the image-caption datasets, followed by LLM-based data filtering and generation.
19.12.2024 18:17
π 0
π 0
π¬ 1
π 0
Despite web-scale training, they underperform in biomedicine due to limited knowledge and user intent understanding.
To solve this, we curate MedMax π
, a large-scale biomedical vision-language dataset (1.5M instances, 1.7B tokens) for instruction-tuning a mixed-modal model.
19.12.2024 18:17
π 0
π 0
π¬ 1
π 0
Mixed-modal (natively multimodal) are a new class of generative models that can flexibly integrate information between diverse modalities (e.g., Chameleon, Transfusion, Gemini 2.0). Such models can process and generate output interleaved sequences of image πΈ and text βοΈcontent.
19.12.2024 18:16
π 0
π 0
π¬ 1
π 0
Natively multimodal models unlock new possibilities for AI biomedical π₯Όassistants, from answering questions about images to generating them for decision-making.
Thrilled to release MedMaxβan open state-of-the-art multimodal model designed for diverse biomedical tasks and domainsπ©»
(audio on π)
π§΅β¬οΈ
19.12.2024 18:16
π 0
π 0
π¬ 1
π 0
1/Open LLM evals often face data contamination concerns. Private curators (like ScaleAI) have addressed this with private + expert evaluations.
We argue that this shift poses new risks including financial incentives & eval bias.
w/ @hbxnov.bsky.social
π: pratyushmaini.github.io/blog/2024/ri... π§΅
27.11.2024 19:05
π 6
π 2
π¬ 1
π 0