Hritik (@hbxnov)

MedMax: Mixed-Modal Instruction Tuning for Training Biomedical Assistants Recent advancements in mixed-modal generative models have enabled flexible integration of information across image-text content. These models have opened new avenues for developing unified biomedical ...

Paper: arxiv.org/abs/2412.12661
Website: mint-medmax.github.io
Code: github.com/Hritikbansal...
Demo: huggingface.co/spaces/mint-...

Thanks to the great effort by our entire group at UCLA w/
Daniel Israel, Siyan Zhao, Shufan Li, Tung Nguyen, and Aditya Grover!

19.12.2024 18:19 👍 1 🔁 0 💬 0 📌 0

Finally, we instruction-tune Chameleon to create the MedMax-7B model. We show that our model achieves SOTA performance on multiple downstream VQA tasks and beats GPT-4o and LLaVA-Med-1.5 by a large margin.

19.12.2024 18:19 👍 0 🔁 0 💬 1 📌 0

We also found that there is a general lack of support for multimodal biomedical evaluation. To address this, we create a robust evaluation suite consisting of visual question answering, captioning, generation, and visual chat. We make this suite publicly available.

19.12.2024 18:18 👍 0 🔁 0 💬 1 📌 0

Overall, MedMax covers a breadth of skills and knowledge bases that will be useful for a capable biomedical assistant. We illustrate the diversity of our dataset here:

19.12.2024 18:18 👍 0 🔁 0 💬 1 📌 0

We curate high-quality multimodal biomedical data from medical papers and YouTube to support tasks like image captioning, generation, visual chat, multimodal content creation, and report understanding across biomedical domains.

19.12.2024 18:18 👍 0 🔁 0 💬 1 📌 0

Firstly, we create MedMax-Instruct, a synthetic data that allows interleaved generation conditions for diverse domains (radiology, histopathology). In particular, we utilize the knowledge in the image-caption datasets, followed by LLM-based data filtering and generation.

19.12.2024 18:17 👍 0 🔁 0 💬 1 📌 0

Despite web-scale training, they underperform in biomedicine due to limited knowledge and user intent understanding.
To solve this, we curate MedMax 🏅, a large-scale biomedical vision-language dataset (1.5M instances, 1.7B tokens) for instruction-tuning a mixed-modal model.

19.12.2024 18:17 👍 0 🔁 0 💬 1 📌 0

Mixed-modal (natively multimodal) are a new class of generative models that can flexibly integrate information between diverse modalities (e.g., Chameleon, Transfusion, Gemini 2.0). Such models can process and generate output interleaved sequences of image 📸 and text ✍️content.

19.12.2024 18:16 👍 0 🔁 0 💬 1 📌 0

Natively multimodal models unlock new possibilities for AI biomedical 🥼assistants, from answering questions about images to generating them for decision-making.

Thrilled to release MedMax—an open state-of-the-art multimodal model designed for diverse biomedical tasks and domains🩻

(audio on 🔈)

🧵⬇️

19.12.2024 18:16 👍 0 🔁 0 💬 1 📌 0

1/Open LLM evals often face data contamination concerns. Private curators (like ScaleAI) have addressed this with private + expert evaluations.

We argue that this shift poses new risks including financial incentives & eval bias.
w/ @hbxnov.bsky.social

📝: pratyushmaini.github.io/blog/2024/ri... 🧵

27.11.2024 19:05 👍 6 🔁 2 💬 1 📌 0

Hritik

Latest posts by Hritik @hbxnov