I’ll be at @neuripsconf.bsky.social all next week! Find me mostly at the @cohere.com booth / DM me to talk code / post-training / life at Cohere 🇨🇦
03.12.2024 15:17
👍 2
🔁 0
💬 0
📌 0
My PhD thesis "Modelling Cross-lingual Transfer For Semantic Parsing" is finally submitted! 🎉🎉🎉
31.01.2024 21:14
👍 2
🔁 1
💬 0
📌 0
TRAM is accepted to
#ICLR2024 as a Spotlight! See you in Vienna 🇦🇹! Thanks to @nsaphra.bsky.social, Pradeep Dasigi, Hao Peng and @ai2.bsky.social
Vision experiments, more discussion and visuals coming soon to the camera ready!
16.01.2024 15:36
👍 1
🔁 0
💬 0
📌 1
Really excited about this one and had such a blast working with @siree.sh @abertsch.bsky.social @davidthewid.bsky.social @strubell.bsky.social! Please read our paper and reach out with any questions, we'd love to chat! See y'all in Singapore :)
12.10.2023 15:38
👍 8
🔁 3
💬 1
📌 0
TRAM is part of my intern project with Hao Peng and Pradeep Dasigi at Allen AI with invaluable contributions from @nsaphra.bsky.social
11.10.2023 09:33
👍 0
🔁 0
💬 0
📌 0
TRAM also improves the OOD epsilon sharpness (where SAM has little effect) with a stronger ID and OOD sharpness correlation. This suggests that SAM is only sharpness-aware within the training distribution.
11.10.2023 09:32
👍 0
🔁 0
💬 1
📌 0
TRAM is SAM-style optimizer using an alternative to the rho hyperparameter. TRAM instead adapts to the trust region in the function space. TRAM strengthens the connection between task-specific performance and pre-trained structure for better zero-shot domain transfer and cross-lingual transfer.
11.10.2023 09:32
👍 1
🔁 0
💬 1
📌 0
🚨 new paper 🚨
Can we train for flat minima with less catastrophic OOD forgetting?
We propose Trust Region Aware Minimization for smoothness in parameters+representations.
TL;DR representations matter just as much!
arxiv.org/abs/2310.03646 w/
@nsaphra.bsky.social Pradeep Dasigi + Hao Peng
11.10.2023 09:31
👍 10
🔁 1
💬 1
📌 2