Koustuv Sinha's Avatar

Koustuv Sinha

@koustuvsinha.com

πŸ”¬Research Scientist, Meta AI (FAIR). πŸŽ“PhD from McGill University + Mila πŸ™‡β€β™‚οΈI study Multimodal LLMs, Vision-Language Alignment, LLM Interpretability & I’m passionate about ML Reproducibility (@reproml.org) 🌎https://koustuvsinha.com/

324
Followers
434
Following
17
Posts
17.11.2024
Joined
Posts Following

Latest posts by Koustuv Sinha @koustuvsinha.com

Our team is hiring a postdoc in (mechanistic) interpretability! The ideal candidate will have research experience in interpretability for text and/or image generation models and be excited about open science!

Please consider applying or sharing with colleagues: metacareers.com/jobs/2223953961352324

15.07.2025 20:11 πŸ‘ 11 πŸ” 5 πŸ’¬ 0 πŸ“Œ 0
Post image

Excited to share the results of my recent internship!

We ask πŸ€”
What subtle shortcuts are VideoLLMs taking on spatio-temporal questions?

And how can we instead curate shortcut-robust examples at a large-scale?

We release: MVPBench

Details πŸ‘‡πŸ”¬

13.06.2025 14:47 πŸ‘ 16 πŸ” 5 πŸ’¬ 1 πŸ“Œ 0
Post image

The HuggingFace/Nanotron team just shipped an entire pretraining textbook in interactive format. huggingface.co/spaces/nanot...

It’s not just a great pedagogic support, but many unprecedented data and experiments presented for the first time in a systematic way.

19.02.2025 19:12 πŸ‘ 39 πŸ” 9 πŸ’¬ 0 πŸ“Œ 0
Post image Post image

Excited to have two papers at #NAACL2025!
The first reveals how human over-reliance can be exacerbated by LLM friendliness. The second presents a novel computational method for concept tracing. Check them out!

arxiv.org/pdf/2407.07950

arxiv.org/pdf/2502.05704

19.02.2025 21:58 πŸ‘ 27 πŸ” 6 πŸ’¬ 2 πŸ“Œ 0

Congrats, nice and refreshing papers, especially the word confusion idea! We need better similarity methods, good to see developments in this front! Curious if the confusion similarity depends on the label size of the classifier?

20.02.2025 12:38 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

πŸ‘‹ Hello world! We’re thrilled to announce the v0.4 release of fairseq2 β€” an open-source library from FAIR powering many projects at Meta. pip install fairseq2 and explore our trainer API, instruction & preference finetuning (up to 70B), and native vLLM integration.

12.02.2025 12:31 πŸ‘ 4 πŸ” 2 πŸ’¬ 1 πŸ“Œ 2

Many many congratulations!! πŸ₯³πŸŽ‰πŸŽ‰

11.02.2025 01:40 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

another factor which makes simple mlps work is visual token length. if you care about shorter tokens, you need a better mapper. these days most llms are capable of long context, which reduces the need of compressing visual tokens.

02.02.2025 05:58 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

one hypothesis why simple mappers work is 1. unfreezing the LLM provides enough parameters for mapping, 2. richer vision representations are closer to llm internal latent space arxiv.org/abs/2405.07987

02.02.2025 05:58 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

good questions! from what I see some folks still use complex mappers like Perceivers, but often simple mlp works good enough. the variable which induces the biggest improvement is almost always the alignment data.

02.02.2025 05:58 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

This is actually a cool result - token length being a rough heuristic for confidence of models?

31.01.2025 22:26 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

I am shocked by the death of Felix Hill. He was one of the brightest minds of my generation.

His last blog post on the stress of working in AI is very poignant. Apart from the emptiness of working mostly to make billionaires even richer, there's the intellectual emptiness of 'scale is all you need'

14.01.2025 12:41 πŸ‘ 38 πŸ” 8 πŸ’¬ 0 πŸ“Œ 0

Lots of cool findings in our paper as well as in the website: tsb0601.github.io/metamorph/

Excited to see how the community "MetaMorph"'s existing LLMs!

26.12.2024 20:02 πŸ‘ 4 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

We posted our paper on arxiv recently, sharing this here too: arxiv.org/abs/2412.141... - work led by our amazing intern Peter Tong. Key findings:

- LLMs can be trained to generate visual embeddings!!
- VQA data appears to help a lot in generation!
- Better understanding = better generation!

26.12.2024 20:01 πŸ‘ 8 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

I wonder if veo-2 would be better at these prompts!

17.12.2024 20:49 πŸ‘ 3 πŸ” 0 πŸ’¬ 2 πŸ“Œ 0
Preview
MLRC 2025 Machine Learning Reproducibility Challenge

Co-organized by @randomwalker.bsky.social @peterhenderson.bsky.social, @in4dmatics.bsky.social Naila Murray, @adinawilliams.bsky.social, Angela Fan, Mike Rabbat and Joelle Pineau. Checkout our website for CFP and more details: reproml.org

13.12.2024 19:06 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

🚨 We are pleased to announce the first, in-person event for the Machine Learning Reproducibility Challenge, MLRC 2025! Save your dates: August 21st, 2025 at Princeton!

13.12.2024 19:06 πŸ‘ 10 πŸ” 1 πŸ’¬ 3 πŸ“Œ 1

Our paper PRISM alignment won a best paper award at #neurips2024!

All credits to @hannahrosekirk.bsky.social A.Whitefield, P.RΓΆttger, A.M.Bean, K.Margatina, R.Mosquera-Gomez, J.Ciro, @maxbartolo.bsky.social H.He, B.Vidgen, S.Hale

Catch Hannah tomorrow at neurips.cc/virtual/2024/poster/97804

11.12.2024 16:20 πŸ‘ 67 πŸ” 9 πŸ’¬ 2 πŸ“Œ 0

Also, MLRC is now in πŸ¦‹ as well - do follow! :) @reproml.org

10.12.2024 16:53 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Online Proceedings | MLRC Machine Learning Reproducibility Challenge

Checkout the MLRC 2023 posters at #NeurIPS 2024 this week: reproml.org/proceedings/ - do drop by to these posters and say hi!

10.12.2024 16:15 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image Post image

The return of the Autoregressive Image Model: AIMv2 now going multimodal.
Excellent work by @alaaelnouby.bsky.social & team with code and checkpoints already up:

arxiv.org/abs/2411.14402

22.11.2024 09:44 πŸ‘ 46 πŸ” 8 πŸ’¬ 1 πŸ“Œ 0

Yes, that imo is one of the most exciting outcome for this direction - learning a new modality with much less compute. We have some really nice results, can’t wait to share it with everyone, stay tuned!

21.11.2024 05:47 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
Writing a good scientific paper

For those who missed this post on the-network-that-is-not-to-be-named, I made public my "secrets" for writing a good CVPR paper (or any scientific paper). I've compiled these tips of many years. It's long but hopefully it helps people write better papers. perceiving-systems.blog/en/post/writ...

20.11.2024 10:18 πŸ‘ 260 πŸ” 64 πŸ’¬ 4 πŸ“Œ 8

πŸ‘‹ hello! :)

20.11.2024 21:52 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

How do LLMs learn to reason from data? Are they ~retrieving the answers from parametric knowledge🦜? In our new preprint, we look at the pretraining data and find evidence against this:

Procedural knowledge in pretraining drives LLM reasoning βš™οΈπŸ”’

πŸ§΅β¬‡οΈ

20.11.2024 16:31 πŸ‘ 854 πŸ” 137 πŸ’¬ 36 πŸ“Œ 24

When I first read this paper, I instinctively scoffed at the idea. But the more I look at empirical results, the more I’m convinced this paper highlights something fundamentally amazing. Lots of exciting research on this direction will come very soon!

arxiv.org/abs/2405.07987

20.11.2024 00:29 πŸ‘ 3 πŸ” 0 πŸ’¬ 3 πŸ“Œ 1

All the ACL chapters are here now: @aaclmeeting.bsky.social @emnlpmeeting.bsky.social @eaclmeeting.bsky.social @naaclmeeting.bsky.social #NLProc

19.11.2024 03:48 πŸ‘ 107 πŸ” 37 πŸ’¬ 1 πŸ“Œ 3
Post image

Doing good science is 90% finding a science buddy to constantly talk to about the project.

09.11.2024 22:53 πŸ‘ 882 πŸ” 215 πŸ’¬ 22 πŸ“Œ 65

Same here! Lets make a club! πŸ˜…

17.11.2024 17:08 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0