Alasdair Paren's Avatar

Alasdair Paren

@alasdair-p

ML researcher @ University of Oxford

32
Followers
44
Following
2
Posts
20.11.2024
Joined
Posts Following

Latest posts by Alasdair Paren @alasdair-p

www.scientificamerican.com/article/hack...

New article by Deni Bechard at Scientific America covering our work on hijacking Multimodal computer agents published on Arxiv earlier this year. A massive effort by Lukas Aichberger, supported by myself Yarin Gal, Philip Torr, FREng, FRS & Adel Bibi

04.09.2025 15:32 ๐Ÿ‘ 1 ๐Ÿ” 1 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Preview
eurips.cc A NeurIPS-endorsed conference in Europe held in Copenhagen, Denmark

NeurIPS is endorsing EurIPS, an independently-organized meeting which will offer researchers an opportunity to additionally present NeurIPS work in Europe concurrently with NeurIPS.

Read more in our blog post and on the EurIPS website:
blog.neurips.cc/2025/07/16/n...
eurips.cc

16.07.2025 22:05 ๐Ÿ‘ 124 ๐Ÿ” 38 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 3
Post image

Excited to share our paper: "Chain-of-Thought Is Not Explainability"! We unpack a critical misconception in AI: models explaining their steps (CoT) aren't necessarily revealing their true reasoning. Spoiler: the transparency can be an illusion. (1/9) ๐Ÿงต

01.07.2025 15:41 ๐Ÿ‘ 83 ๐Ÿ” 31 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 5
AI is becoming dangerous. Are we ready?
AI is becoming dangerous. Are we ready? YouTube video by Sabine Hossenfelder

Not every day you see a paper you worked on featured by a youtube channel you've watched before :) youtu.be/KY7_ufxh_Rk?...

10.06.2025 17:52 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Preview
Shh, don't say that! Domain Certification in LLMs Domain Certification - A novel framework providing provable, adversarial defenses for LLMs safety.

Read more: cemde.github.io/Domain-Certi...

Thanks to my amazing collaborators:
- @alasdair-p.bsky.social, Preetham Arvind, @maximek3.bsky.social, Tom Rainforth, @philiptorr.bsky.social, @adelbibi.bsky.social at @ox.ac.uk
- Bernard Ghanem at KAUST
- Thomas Lukasiewicz at @tuwien.at.

(7/7)

04.04.2025 20:11 ๐Ÿ‘ 3 ๐Ÿ” 2 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Video thumbnail

โš ๏ธ Beware: Your AI assistant could be hijacked just by encountering a malicious image online!

Our latest research exposes critical security risks in AI assistants. An attacker can hijack them by simply posting an image on social media and waiting for it to be captured. [1/6] ๐Ÿงต

18.03.2025 18:25 ๐Ÿ‘ 8 ๐Ÿ” 8 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 3
Do we NEED International Collaboration for Safe AGI? Insights from Top AI Pioneers | IIA Davos 2025
Do we NEED International Collaboration for Safe AGI? Insights from Top AI Pioneers | IIA Davos 2025 YouTube video by Imagination in Action

A few weeks ago in Davos, Demis Hassabis highlighted the need to develop a "CERN for AGI" to ensure that advances at frontier level remain safe. I totally agree with him: We need this kind of international cooperation. youtu.be/U7t02Q6zfdc?...

19.02.2025 18:10 ๐Ÿ‘ 27 ๐Ÿ” 3 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Preview
Shh, don't say that! Domain Certification in LLMs Foundation language models, such as LLama, are often deployed in constrained environments. For instance, a customer support bot may utilize a large language model (LLM) as its backbone due to the...

The amazing collaborators: Preetham Arvind, @alasdair-p.bsky.social, Maxime Kayser, Tom Rainforth, Thomas Lukasiewicz, Philip Torr, Adel Bibi.

A @oxfordtvg.bsky.social production.

(6/6)

Link to paper:
openreview.net/forum?id=brD...

14.12.2024 01:18 ๐Ÿ‘ 3 ๐Ÿ” 1 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0