(@sanmikoyejo)

The CHIL 2026 Doctoral Symposium is back! Apply by March 13th 📅
chil.ahli.cc/submit/docto...

Last year, we welcomed 28 outstanding PhD researchers for mentorship and lightning talks in health AI.
Watch 3 participant talks from 2025 👇 (see 2:48:43)
www.youtube.com/watch?v=YaDo...

26.02.2026 17:50 👍 2 🔁 3 💬 0 📌 0

CS329H: Machine Learning from Human Preferences Machine Learning from Human Preferences

7/ Materials:
Course website / syllabus / notes: web.stanford.edu/class/cs329h/
Living Textbook: mlhp.stanford.edu
#AIAlignment #MachineLearning #ResponsibleAI #Stanford #PreferenceLearning

02.02.2026 02:51 👍 0 🔁 0 💬 0 📌 0

6/ Tangent Space Fine-Tuning for Directional Preference Alignment (TS-DPO) — Mete Erdoğan
Multi-objective alignment. Learns separate update directions then composes them at inference via simple coefficient mixing → smoother control + broad Pareto coverage
github.com/meterdogan07...

02.02.2026 02:51 👍 0 🔁 0 💬 1 📌 0

PA_GP_UCB_CS329.pdf

5/ Gaussian Process Optimization with Predictions for Hypothesis Generation (PA-GP-UCB) — Stela Tong & Xin (Jennifer) Chen
Blends human feedback with AI surrogate predictions, proves O(√T) cumulative regret, and shows much faster convergence than GP-UCB
drive.google.com/file/d/1JpuM...

02.02.2026 02:51 👍 0 🔁 0 💬 1 📌 0

RamiMrad_CS329H_FinalReport.pdf

4/ Learning Code Generation “Vibe” for MCP Servers via DPO — Rami Ratl Mrad
Preference learning for subjective code quality beyond mere correctness. 100% AI-judge preference and 85% human preference, with better error handling, docs, and structure for MCP
drive.google.com/file/d/1g_r_...

02.02.2026 02:51 👍 0 🔁 0 💬 1 📌 0

The Collapse of Heterogeneity in Silicon Philosophers The Collapse of Heterogeneity in Silicon Philosophers - Research on how LLMs systematically collapse philosophical disagreement

3/ Silicon Sampling of Professional Philosophers — Jeremy Shi
Can LLMs authentically simulate expert philosophical reasoning when conditioned on philosopher profiles? Finding: LLMs show inflated demographic→position correlations (up to 3× human data).
whusym.github.io/silicon-phil...

02.02.2026 02:51 👍 1 🔁 0 💬 1 📌 0

Baysian Active Alignment.pdf

2/ Active Alignment with Bayesian General Preference Model (BGPM) — Ahmed Mohsin & Muhammad Umer.
Enables active preference elicitation with >35% fewer annotations via uncertainty-guided, information-theoretic query selection.
drive.google.com/file/d/1IU5e...

02.02.2026 02:51 👍 1 🔁 0 💬 1 📌 0

1/ Wonderful student projects from CS329H (Fall ’25) ML from Human Preferences at Stanford University! 🚀
Sang Truong, Andy Haupyt, and I introduced students to preference learning + alignment, culminating in final projects. Out of ~50, here are 5 standouts 👇

02.02.2026 02:51 👍 0 🔁 0 💬 1 📌 0

AI/ML Research Scientist, Responsible AI in London, United Kingdom | GSK Careers GSK Careers is hiring a AI/ML Research Scientist, Responsible AI in London, United Kingdom. Review all of the job details and apply today!

Our Responsible AI group is hiring at GSK! Join a great team investigating how to responsibly develop AI for drug discovery in an environment mixing research and real-world impact. Full-time in London. jobs.gsk.com/en-gb/jobs/4...

09.01.2026 12:42 👍 1 🔁 2 💬 0 📌 0

We want your work on fairness, alignment, and/or agentic systems!!

Proud to be co-organizing AFAA @iclr-conf.bsky.social with:
@prakharg.bsky.social @adoubleva.bsky.social @Miriam @Jamelle @Golnoosh @jessicaschrouff.bsky.social @sanmikoyejo.bsky.social

www.afciworkshop.org

#AFAA2026 #ICLR2026

06.01.2026 02:49 👍 4 🔁 1 💬 0 📌 0

This is work with Daniel E. Ho and @sanmikoyejo.bsky.social. We also have a related preprint with Erin Beeghly on the social impacts of this personalization, and how it interacts with group-based preferences and stereotypes: angelina-wang.github.io/files/person...

12.12.2025 20:42 👍 3 🔁 1 💬 0 📌 0

This has huge implications for evaluation:
• Benchmark scores ≠ what end users actually experience
• Some high-risk behaviors, e.g., manipulative patterns, might only surface in personalized interfaces
We argue for more realistic evals that take personalization into account.

12.12.2025 20:42 👍 2 🔁 1 💬 1 📌 0

In fact, even the same MMLU science question can yield different ChatGPT answers for different users, despite using the exact same underlying model. User-level personalization and interaction patterns shape outputs in ways existing evals do not capture.

12.12.2025 20:42 👍 1 🔁 1 💬 1 📌 0

The personalization gap challenges an implicit assumption in AI evaluation: that we can measure capability and safety independently of deployment context. Good opportunity to rethink how we evaluate AI systems.

15.12.2025 01:59 👍 1 🔁 0 💬 0 📌 0

New in @science.org : 20+ AI scholars—inc. @alondra.bsky.social @randomwalker.bsky.social @sanmikoyejo.bsky.social et al, lay out a playbook for evidence-based AI governance.

Without solid data, we risk both hype and harm. Thread 👇

05.08.2025 16:48 👍 23 🔁 18 💬 2 📌 1

Grateful to win Best Paper at ACL for our work on Fairness through Difference Awareness with my amazing collaborators!! Check out the paper for why we think fairness has both gone too far, and at the same time, not far enough aclanthology.org/2025.acl-lon...

30.07.2025 15:34 👍 28 🔁 4 💬 0 📌 0

Instead, we should permit differentiating based on the context. Ex: synagogues in America are legally allowed to discriminate by religion when hiring rabbis. Work with Michelle Phan, Daniel E. Ho, @sanmikoyejo.bsky.social arxiv.org/abs/2502.01926

02.06.2025 16:38 👍 1 🔁 1 💬 1 📌 0

Closing the Digital Divide in AI | Stanford HAI Large language models aren't effective for many languages. Scholars explain what's at stake for the approximately 5 billion people who don't speak English.

Most major LLMs are trained using English data, making it ineffective for the approximately 5 billion people who don't speak English. Here, HAI Faculty Affiliate @sanmikoyejo.bsky.social discusses the risks of this digital divide and how to close it. hai.stanford.edu/news/closing...

20.05.2025 17:43 👍 4 🔁 1 💬 0 📌 0

Mind the (Language) Gap: Mapping the Challenges of LLM Development in Low-Resource Language Contexts | Stanford HAI In collaboration with The Asia Foundation and the University of Pretoria, this white paper maps the LLM development landscape for low-resource languages, highlighting challenges, trade-offs, and strat...

Our latest white paper maps the landscape of large language model development for low-resource languages, highlighting challenges, trade-offs, and strategies. Read more here: hai.stanford.edu/policy/mind-...

23.04.2025 16:38 👍 6 🔁 1 💬 0 📌 1

Collaboration with a bunch of lovely people I am thankful to be able to work with: @hannawallach.bsky.social , @angelinawang.bsky.social , Olawale Salaudeen, Rishi Bommasani, and @sanmikoyejo.bsky.social. 🤗

16.04.2025 16:45 👍 3 🔁 1 💬 0 📌 0

Toward an Evaluation Science for Generative AI Systems There is an urgent need for a more robust and comprehensive approach to AI evaluation. There is an increasing imperative to anticipate and understand ...

🧑‍🔬 Happy our article on Creating a Generative AI Evaluation Science, led by @weidingerlaura.bsky.social & @rajiinio.bsky.social, is now published by the National Academy of Engineering. =) www.nae.edu/338231/Towar...
Describes how to mature eval so systems can be worthy of trust and safely deployed.

16.04.2025 16:43 👍 41 🔁 7 💬 1 📌 0

📣We’re thrilled to announce the first workshop on Technical AI Governance (TAIG) at #ICML2025 this July in Vancouver! Join us (& this stellar list of speakers) in bringing together technical & policy experts to shape the future of AI governance! www.taig-icml.com

01.04.2025 12:23 👍 14 🔁 4 💬 1 📌 4

Could AI help us build a more racially just society? | Sanmi Koyejo We have an opportunity to build systems that don’t just replicate our current inequities. Will we take them?

AI systems present an opportunity to reflect society's biases. “However, realizing this potential requires careful attention to both technical and social considerations,” says HAI Faculty Affiliate @sanmikoyejo.bsky.social in his latest op-ed via @theguardian.com: www.theguardian.com/commentisfre...

26.03.2025 15:51 👍 9 🔁 5 💬 0 📌 0

Very excited we were able to get this collaboration working -- congrats and big thanks to the co-authors! @rajiinio.bsky.social @hannawallach.bsky.social @mmitchell.bsky.social @angelinawang.bsky.social Olawale Salaudeen, Rishi Bommasani @sanmikoyejo.bsky.social @williamis.bsky.social

20.03.2025 13:28 👍 5 🔁 1 💬 0 📌 0

3) Institutions and norms are necessary for a long-lasting, rigorous and trusted evaluation regime. In the long run, nobody trusts actors correcting their own homework. Establishing an ecosystem that accounts for expertise and balances incentives is a key marker of robust evaluation in other fields.

20.03.2025 13:28 👍 3 🔁 2 💬 1 📌 0

which challenged concepts of what temperature is and in turn motivated the development of new thermometers. A similar virtuous cycle is needed to refine AI evaluation concepts and measurement methods.

20.03.2025 13:28 👍 2 🔁 1 💬 1 📌 0

2) Metrics and evaluation methods need to be refined over time. This iteration is key to any science. Take the example of measuring temperature: it went through many iterations of building new measurement approaches,

20.03.2025 13:28 👍 2 🔁 1 💬 1 📌 0

Just like the “crashworthiness” of a car indicates aspects of safety in case of an accident, AI evaluation metrics need to link to real-world outcomes.

20.03.2025 13:28 👍 2 🔁 1 💬 1 📌 0

We identify three key lessons in particular.

1) Meaningful metrics: evaluation metrics must connect to AI system behaviour or impact that is of relevance in the real-world. They can be abstract or simplified -- but they need to correspond to real-world performance or outcomes in a meaningful way.

20.03.2025 13:28 👍 2 🔁 1 💬 1 📌 0

We pull out key lessons from other fields, such as aerospace, food security, and pharmaceuticals, that have matured from being research disciplines to becoming industries with widely used and trusted products. AI research is going through a similar maturation -- but AI evaluation needs to catch up.

20.03.2025 13:28 👍 2 🔁 1 💬 1 📌 0

Latest posts by @sanmikoyejo