Tom Sherborne (@tomsherborne)

Member of Technical Staff, Agent Infrastructure Engineer At Cohere, we have one of the highest compute-to-engineers ratios in the world. We do not delineate strongly between engineering and research: everyone contributes to writing production code and condu...

We are hiring @cohere.com for an Agent Infrastructure Engineer! If you want to work on building the next generation of agent models for #RAG, #ToolUse #Code, #Reasoning and more then apply here. DM me if you have any Qs.

jobs.ashbyhq.com/cohere/3f797...

21.02.2025 11:31 👍 0 🔁 0 💬 0 📌 0

I’ll be at @neuripsconf.bsky.social all next week! Find me mostly at the @cohere.com booth / DM me to talk code / post-training / life at Cohere 🇨🇦

03.12.2024 15:17 👍 2 🔁 0 💬 0 📌 0

My PhD thesis "Modelling Cross-lingual Transfer For Semantic Parsing" is finally submitted! 🎉🎉🎉

31.01.2024 21:14 👍 2 🔁 1 💬 0 📌 0

TRAM is accepted to
#ICLR2024 as a Spotlight! See you in Vienna 🇦🇹! Thanks to @nsaphra.bsky.social, Pradeep Dasigi, Hao Peng and @ai2.bsky.social

Vision experiments, more discussion and visuals coming soon to the camera ready!

16.01.2024 15:36 👍 1 🔁 0 💬 0 📌 1

Really excited about this one and had such a blast working with @siree.sh @abertsch.bsky.social @davidthewid.bsky.social @strubell.bsky.social! Please read our paper and reach out with any questions, we'd love to chat! See y'all in Singapore :)

12.10.2023 15:38 👍 8 🔁 3 💬 1 📌 0

TRAM is part of my intern project with Hao Peng and Pradeep Dasigi at Allen AI with invaluable contributions from @nsaphra.bsky.social

11.10.2023 09:33 👍 0 🔁 0 💬 0 📌 0

TRAM also improves the OOD epsilon sharpness (where SAM has little effect) with a stronger ID and OOD sharpness correlation. This suggests that SAM is only sharpness-aware within the training distribution.

11.10.2023 09:32 👍 0 🔁 0 💬 1 📌 0

TRAM is SAM-style optimizer using an alternative to the rho hyperparameter. TRAM instead adapts to the trust region in the function space. TRAM strengthens the connection between task-specific performance and pre-trained structure for better zero-shot domain transfer and cross-lingual transfer.

11.10.2023 09:32 👍 1 🔁 0 💬 1 📌 0

🚨 new paper 🚨

Can we train for flat minima with less catastrophic OOD forgetting?   We propose Trust Region Aware Minimization for smoothness in parameters+representations.

TL;DR representations matter just as much!
arxiv.org/abs/2310.03646 w/
@nsaphra.bsky.social Pradeep Dasigi + Hao Peng

11.10.2023 09:31 👍 10 🔁 1 💬 1 📌 2

Tom Sherborne

Latest posts by Tom Sherborne @tomsherborne