Trending
Hritik's Avatar

Hritik

@hbxnov

CS PhD @UCLA | Prev: Bachelors @IITDelhi, Student Researcher @GoogleDeepMind, Intern @AmazonScience | Multimodal ML, Language models | Cricket🏏 http://sites.google.com/view/hbansal

25
Followers
42
Following
9
Posts
27.11.2024
Joined
Posts Following

Latest posts by Hritik @hbxnov

Preview
MedMax: Mixed-Modal Instruction Tuning for Training Biomedical Assistants Recent advancements in mixed-modal generative models have enabled flexible integration of information across image-text content. These models have opened new avenues for developing unified biomedical ...

Paper: arxiv.org/abs/2412.12661
Website: mint-medmax.github.io
Code: github.com/Hritikbansal...
Demo: huggingface.co/spaces/mint-...

Thanks to the great effort by our entire group at UCLA w/
Daniel Israel, Siyan Zhao, Shufan Li, Tung Nguyen, and Aditya Grover!

19.12.2024 18:19 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Finally, we instruction-tune Chameleon to create the MedMax-7B model. We show that our model achieves SOTA performance on multiple downstream VQA tasks and beats GPT-4o and LLaVA-Med-1.5 by a large margin.

19.12.2024 18:19 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

We also found that there is a general lack of support for multimodal biomedical evaluation. To address this, we create a robust evaluation suite consisting of visual question answering, captioning, generation, and visual chat. We make this suite publicly available.

19.12.2024 18:18 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image Post image

Overall, MedMax covers a breadth of skills and knowledge bases that will be useful for a capable biomedical assistant. We illustrate the diversity of our dataset here:

19.12.2024 18:18 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image Post image

We curate high-quality multimodal biomedical data from medical papers and YouTube to support tasks like image captioning, generation, visual chat, multimodal content creation, and report understanding across biomedical domains.

19.12.2024 18:18 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Firstly, we create MedMax-Instruct, a synthetic data that allows interleaved generation conditions for diverse domains (radiology, histopathology). In particular, we utilize the knowledge in the image-caption datasets, followed by LLM-based data filtering and generation.

19.12.2024 18:17 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Despite web-scale training, they underperform in biomedicine due to limited knowledge and user intent understanding.
To solve this, we curate MedMax πŸ…, a large-scale biomedical vision-language dataset (1.5M instances, 1.7B tokens) for instruction-tuning a mixed-modal model.

19.12.2024 18:17 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Mixed-modal (natively multimodal) are a new class of generative models that can flexibly integrate information between diverse modalities (e.g., Chameleon, Transfusion, Gemini 2.0). Such models can process and generate output interleaved sequences of image πŸ“Έ and text ✍️content.

19.12.2024 18:16 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Video thumbnail

Natively multimodal models unlock new possibilities for AI biomedical πŸ₯Όassistants, from answering questions about images to generating them for decision-making.

Thrilled to release MedMaxβ€”an open state-of-the-art multimodal model designed for diverse biomedical tasks and domains🩻

(audio on πŸ”ˆ)

πŸ§΅β¬‡οΈ

19.12.2024 18:16 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

1/Open LLM evals often face data contamination concerns. Private curators (like ScaleAI) have addressed this with private + expert evaluations.

We argue that this shift poses new risks including financial incentives & eval bias.
w/ @hbxnov.bsky.social

πŸ“: pratyushmaini.github.io/blog/2024/ri... 🧡

27.11.2024 19:05 πŸ‘ 6 πŸ” 2 πŸ’¬ 1 πŸ“Œ 0