New paper on discretion in AI “alignment” — check out @maartenbuyl.bsky.social’s thread below!
New paper on discretion in AI “alignment” — check out @maartenbuyl.bsky.social’s thread below!
9/n Full paper here: 🔗 arxiv.org/abs/2502.10441. Huge thanks to my amazing team of co-authors: @hadikh.bsky.social, @lucasmpaes.bsky.social, @claudiomv.bsky.social, @caiocvm.bsky.social, and @fcalmon.bsky.social. Done at @harvard.edu
AI is built to “be helpful” or “avoid harm”, but which principles should it prioritize and when? We call this alignment discretion. As Asimov's stories show: balancing such principles for AI behavior is tricky. In fact, we find that AI has its own set of priorities. (comic by @xkcd.com)🧵👇
The standard practice in differential privacy of targeting ε at small δ is extremely lossy for interpreting the level of privacy protection. For many real-world algorithms (e.g., for DP-SGD), we can do much better!
We show how in the #NeurIPS2024 paper:
arxiv.org/abs/2407.02191
Short summary👇
This is joint work with Felipe Gomez, Georgios Kaissis, @fcalmon.bsky.social, and @carmelatroncoso.bsky.social
Happy to chat about it online, and in 🇨🇦+🇺🇸 next two weeks:
- At the #NeurIPS2024 Friday Dec. 13 evening poster session.
- Will also present in more detail on Tuesday Dec. 17 at Harvard.