Trending
Alessandro Sordoni's Avatar

Alessandro Sordoni

@murefil

ML Team MSR Montreal. Adjunct Prof UdeM MILA. Modularity & reasoning.

89
Followers
131
Following
4
Posts
18.11.2024
Joined
Posts Following

Latest posts by Alessandro Sordoni @murefil

Post image

Today is the International Day for the Elimination of Violence against Women. According to the UN, more than 50 000 women were killed by a partner or family member in 2023 news.un.org/en/story/202... This number is an underestimate given that only 37 countries reported in 2023.

26.11.2024 06:16 πŸ‘ 8 πŸ” 3 πŸ’¬ 0 πŸ“Œ 0

Yeah I suspected that, interesting!

25.11.2024 13:36 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

distributed learning for LLM?

recently, @primeintellect.bsky.social have announced finishing their 10B distributed learning, trained across the world.

what is it exactly?

🧡

25.11.2024 12:02 πŸ‘ 23 πŸ” 6 πŸ’¬ 1 πŸ“Œ 2

Instead of averaging outer gradients would fancier model merging techniques (eg TIES) apply here?

25.11.2024 12:50 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

having better tools for reviewer and ac assignment would definitely help, ultimately reducing # reviews per paper while striving for relevance and quality could increase reviewer / ac engagement and free up their time to do better reviews

24.11.2024 22:23 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Informatics: ILCC: Language Processing, Speech Technology, Information Retrieval, Cognition Study Informatics: ILCC: Language Processing, Speech Technology, Information Retrieval, Cognition at the University of Edinburgh. Our postgraduate degree programmes focus on natural language processin...

Last 5 days to apply for a PhD at #EdinburghNLP!

Deadline: November 25

www.ed.ac.uk/studying/pos...

If you are passionate about:

- adaptive tokenization and memory in foundation models
- modular deep learning
- computational typology

please message me or meet me at #NeurIPS2024!

21.11.2024 13:41 πŸ‘ 20 πŸ” 8 πŸ’¬ 0 πŸ“Œ 0
A sparse mask of attention scores based on VerticalAndSlashAttention and a plot of loss vs sparsity ratio for various methods.

A sparse mask of attention scores based on VerticalAndSlashAttention and a plot of loss vs sparsity ratio for various methods.

Another nano gem from my amazing student
Piotr Nawrot!

A repo & notebook on sparse attention for efficient LLM inference: github.com/PiotrNawrot/...

This will also feature in my #NeurIPS 2024 tutorial "Dynamic Sparsity in ML" with AndrΓ© Martins: dynamic-sparsity.github.io Stay tuned!

20.11.2024 12:51 πŸ‘ 42 πŸ” 8 πŸ’¬ 2 πŸ“Œ 3

Explore zero-shot routing of parameter-efficient experts with Phatgoose arxiv.org/abs/2402.05859 and Arrow arxiv.org/abs/2405.11157 w. github.com/microsoft/mttl

πŸ‘‰ github.com/sordonia/pg_mb…

Part of "Dynamic Sparsity in ML" tuto #neurips2024, feedback welcome and join for discussions! 😊

21.11.2024 15:47 πŸ‘ 5 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0