David Atkinson (@diatkinson) Following

Can @canrager

@tamarott

Arnab Sen Sharma @arnabsensharma

PhD Student at Northeastern, working to make LLMs interpretable

BlackboxNLP @blackboxnlp

The largest workshop on analysing and interpreting neural networks for NLP. BlackboxNLP will be held at EMNLP 2025 in Suzhou, China blackboxnlp.github.io

Hiba Ahsan @hibaahsan

PhD student @ Northeastern University, Clinical NLP https://hibaahsan.github.io/ she/her

Hye Sun Yun @hyesunyun

PhD candidate in CS at Northeastern University | NLP + HCI for health | she/her 🏃‍♀️🧅🌈

Yida Chen @yidachen

CS PhD student at Harvard. Interested in Interpretability 🔍, Visualizations 📊, Human-AI Interaction🧍🤖. All opinions are mine. https://yc015.github.io/

Chantal @chantalsh

PhD (in progress) @ Northeastern! NLP 🤝 LLMs she/her

Julius Adebayo @juliusad

ML researcher, building interpretable models at Guide Labs (guidelabs.bsky.social).

Maxime Méloux @maximemeloux

PhD student @LIG | Causal abstraction, interpretability & LLMs

Calvin McCarter @calvinmccarter

calvinmccarter.com

Derek Parfait @cole-haus

Trying to figure things out about how best we can live together

Lukas Bogacz @lukasbogacz.com

writing at loreley.one

Arjun Guha @guha-anderson.com

hacker / CS professor https://www.khoury.northeastern.edu/~arjunguha/

Laura Kopf @lkopf

PhD student in Interpretable Machine Learning at @tuberlin.bsky.social & @bifold.berlin https://web.ml.tu-berlin.de/author/laura-kopf/

claudia shi @claudiashi

machine learning, causal inference, science of llm, ai safety, phd student @bleilab, keen bean https://www.claudiashi.com/

Javier Ferrando @javifer

Interpretability

@neelnanda

@woog0

Tim Hua @timhua

Helping people is good I guess Trying to do AI interp and control Used to do economics timhua.me

Koyena Pal @koyena

CS Ph.D. Candidate @ Northeastern | Interpretability + Data Science | BS/MS @ Brown koyenapal.github.io

Ben Stewart 🔸 @benstew

AI Program Officer at Longview Philanthropy. Own views. 🔸 giving 10% of my lifetime income to effective charities via Giving What We Can

@yudkowsky

Alexander Berger @albrgr

CEO of Coefficient Giving

Julia Wise @juliawise

Trying for human-compatible humans

Lynette Bye @lynettebye

Buck Shlegeris @bshlgrs

Vidur Kapur @vidurkapur

Superforecaster at Good Judgment. Also forecasting at Swift Centre, Samotsvety, RAND and a hedge fund. Impartial beneficence enthusiast.

@jacobtref

blog.jacobtrefethen.com Managing Director, Coefficient Giving science!

Ryan Briggs @ryancbriggs.net

Raising kids & bread & grant money. Cleaning data & diapers & fish. EA (bed nets, not light cone). Social scientist. typos. twitter.com/ryancbriggs

tetra 💎 @thetetra.space

💎 here to believe true things and do good actions 💎 someone should probably solve AI alignment 💎 enjoying things rules! ☀️ but it's not snowing now english/toki pona/日本語

Carl Robichaud @carlrobi

Program Officer on nuclear policy at Longview Philanthropy (http://longview.org). Opinions are my own.

Astral Codex Ten | Scott Alexander | Substack @astralcodexten.com.web.brid.gy

P(A|B) = [P(A)*P(B|A)]/P(B), all the rest is commentary. Click to read Astral Codex Ten, by Scott Alexander, a […] [bridged from astralcodexten.com on the web: https://fed.brid.gy/web/astralcodexten.com ]

@wdmacaskill

Sebastian Farquhar @sebfar

Senior Research Scientist at Google DeepMind. AGI Alignment researcher. Views my dog's.

Garrison Lovely @garrisonlovely

Writing a book on AI+economics+geopolitics for Nation Books. Covers: The Nation, Jacobin. Bylines: NYT, Nature, Bloomberg, BBC, Guardian, TIME, The Verge, Vox, Thomson Reuters Foundation, + others.

Rossa O'Keeffe-O'Donovan @rossaokod

Research @ Open Philanthropy. Formerly economist at GPI / Nuffield College, Oxford. Interests: development econ, animal welfare, global catastrophic risks

Aaron Gertler @aarongertler

Comms officer @ Open Philanthropy, former Magic pro, webfiction connoisseur. https://aarongertler.net/

Aaron Bergman @aaronbergman18

👎: suffering | 👍: EA, AI alignment, decoupling, R, cringe, amateur pharmacology + programming | Georgetown '22 (math+econ+phil) | Career status: 🤷‍♂️

@ozziegooen

William Eden @weden

@dwarkesh

Aaron Scher @aaronscher

Technical AI Governance Research at MIRI Views are my own

Anka Reuel ➡️ NeurIPS @ankareuel

Computer Science PhD Student @ Stanford | Geopolitics & Technology Fellow @ Harvard Kennedy School/Belfer | Vice Chair EU AI Code of Practice | Views are my own

Adam Binksmith @binksmith.com

Building theaidigest.org and forecasting tools @aidigest.bsky.social https://binksmith.com

Epoch AI @epochai

We are a research institute investigating the trajectory of AI for the benefit of society. epoch.ai

Yo Shavit @yonashav

policy for v smart things @openai. Past: PhD @HarvardSEAS/@SchmidtFutures/@MIT_CSAIL. Posts my own; on my head be it

METR @metr.org

METR is a research nonprofit that builds evaluations to empirically test AI systems for capabilities that could threaten catastrophic harm to society.

catherine 🌀 @catherinebrewer

ai governance @openphil, unsupervised learner

Eli Lifland @elifland

Toby Ord @tobyord

Senior Researcher at Oxford University. Author — The Precipice: Existential Risk and the Future of Humanity. tobyord.com

Trevor Levin @trevorlevin

Trying to help the world navigate potentially transformative technologies, currently via AI Governance and Policy at Coefficient Giving. Enjoyer of acoustic guitars, history books, and plant-based foods.

Dean W. Ball @deanwb

Senior Policy Advisor for AI and Emerging Technology, White House Office of Science and Technology Policy | Strategic Advisor for AI, National Science Foundation https://hyperdimensional.co

harry law @harrylaw

Thinking about thinking machines | University of Cambridge and Leverhulme Centre for the Future of Intelligence | Previously Google DeepMind

Richard Ngo @richardngo

What would we need to understand in order to design an amazing future? Ex DeepMind, OpenAI

Samuel Hammond @hamandcheese

Social policy synthesizer. www.secondbest.ca

allie lawsen @lexlawsen

AI grantmaking at Coefficient Giving Previously 80,000 Hours lawsen.substack.com

Seb Krier @sebk

friendly deep sea dweller

Surya Ganguli @suryaganguli

Professor of Applied Physics at Stanford | Venture Partner a16z | Research in AI, Neuroscience, Physics

Guide Labs @guidelabs

AI systems and models that are engineered to be interpretable and auditable. www.guidelabs.ai

Ben Edelman @benedelman

Thinking about how/why AI works/doesn't, and how to make it go well for us. Currently: AI Agent Security @ US AI Safety Institute benjaminedelman.com

Caden @cadentj

@joshengels

PhD student at MIT. Working on mechanistic interpretability and AI safety.

wint @dril

Never Bullshit I challenge any and every one who wants to kick my ass to a debate . https://www.patreon.com/dril https://www.instagram.com/dril https://linktr.ee/drilreal

Somin W @sominw

cs phd @ northeastern.

Gonçalo Paulo @goncalo-paulo

Interpretability researcher at @eleutherai.bsky.social

Ekdeep Singh @ ICML @ekdeepl

Postdoc at CBS, Harvard University (New around here)

Serena Booth @reniebird

CS Prof at Brown University, PI of the GIRAFFE lab, former AI Policy Advisor in the US Senate, co-chair of the ACM Tech Policy Subcommittee on AI and Algorithms. PhD at MIT CSAIL '23, Harvard '16, former Google APM. Dog mom to Ducki.

James Michaelov @jamichaelov

Postdoc at MIT. Research: language, the brain, NLP. jmichaelov.com

Eran Malach @emalach

Research Fellow @ Kempner Institute, Harvard University Theory of Deep Learning / Learning of Deep Theory

Anna Tsvetkov @annatsv

Postdoc @ Princeton AI Lab Natural and Artificial Minds Prev: PhD @ Brown, MIT FutureTech Website: https://annatsv.github.io/

Zhaofeng Wu @zhaofengwu

PhD student @ MIT | Previously PYI @ AI2 | MS'21 BS'19 BA'19 @ UW | zhaofengwu.github.io

Martin Wattenberg @wattenberg

Human/AI interaction. ML interpretability. Visualization as design, science, art. Professor at Harvard, and part-time at Google DeepMind.

Sheridan Feucht @sfeucht

PhD student doing LLM interpretability with @davidbau.bsky.social and @byron.bsky.social. (they/them) https://sfeucht.github.io

Imke Grabe @imkegrabe

☆ °｡⋆ (mechanistic) interpretability + interaction (design) ⋆｡° ☆

@eleutherai

Clément Dumas @butanium

Master student at ENS Paris-Saclay / aspiring AI safety researcher / improviser Prev research intern @ EPFL w/ wendlerc.bsky.social and Robert West MATS Winter 7.0 Scholar w/ neelnanda.bsky.social https://butanium.github.io

Aaron Mueller @amuuueller

Postdoc at Northeastern and incoming Asst. Prof. at Boston U. Working on NLP, interpretability, causality. Previously: JHU, Meta, AWS

Mor Geva @megamor2

https://mega002.github.io

Niklas Stoehr @niklasstoehr

Gemini Post-Training ⚫️ Research Scientist at Google DeepMind ⚫️ PhD from ETH Zurich

Nina Rimsky @ninarimsky

AI Safety Research // Software Engineering

Naomi Saphra @nsaphra

Waiting on a robot body. All opinions are universal and held by both employers and family. ML/NLP professor. nsaphra.net

Dashiell @dashiells

Machine learning haruspex

Joe Stacey @joestacey

NLP PhD student at Imperial College London and Apple AI/ML Scholar.

Sweta Karlekar @swetakar

Machine learning PhD student @ Blei Lab in Columbia University Working in mechanistic interpretability, nlp, causal inference, and probabilistic modeling! Previously at Meta for ~3 years on the Bayesian Modeling & Generative AI teams. 🔗 www.sweta.dev

Nicolas Beltran-Velez @velezbeltran

Machine Learning PhD Student @ Blei Lab & Columbia University. Working on probabilistic ML | uncertainty quantification | LLM interpretability. Excited about everything ML, AI and engineering!

Daniel Johnson @ddjohnson

PhD student at Vector Institute / University of Toronto. Building tools to study neural nets and find out what they know. He/him. www.danieldjohnson.com

Alex Makelov @amakelov

Mechanistic interpretability Creator of https://github.com/amakelov/mandala prev. Harvard/MIT machine learning, theoretical computer science, competition math.

Andrew Lee @ajyl

Post-doc @ Harvard. PhD UMich. Spent time at FAIR and MSR. ML/NLP/Interpretability

Martina Vilas @martinagvilas

Computer Science PhD student | AI interpretability | Vision + Language | Cogntive Science. Prev. intern @MicrosoftResearch. https://martinagvilas.github.io/

Isabelle Lee @wordscompute

ml/nlp phding @ usc, currently visiting harvard, scientisting @ startup; interpretability & training & reasoning iglee.me

Pepa Atanasova @apepa

Assistant Professor, University of Copenhagen; interpretability, xAI, factuality, accountability, xAI diagnostics https://apepa.github.io/

Federico Adolfi @fedeadolfi

Computation & Complexity | AI Interpretability | Meta-theory | Computational Cognitive Science https://fedeadolfi.github.io On the job market!

Lee Sharkey @leesharkey

Scruting matrices @ Apollo Research

Kayo Yin @kayoyin

PhD student at UC Berkeley. NLP for signed languages and LLM interpretability. kayoyin.github.io 🏂🎹🚵‍♀️🥋

Julian Minder @jkminder

PhD at EPFL with Robert West, Master at ETHZ Mainly interested in Language Model Interpretability and Model Diffing. MATS 7.0 Winter 2025 Scholar w/ Neel Nanda jkminder.ch

Nishant Subramani @ ACL @nsubramani23

PhD student @CMU LTI - working on model #interpretability, student researcher @google; prev predoc @ai2; intern @MSFT nishantsubramani.github.io

Eric Todd @ericwtodd

CS PhD Student, Northeastern University - Machine Learning, Interpretability https://ericwtodd.github.io

David Atkinson

Following (98)