Clément Dumas (@butanium) Following

Jérôme Guyot @jerome-gyt

Master's student at ENS Paris-Saclay, interested in complexity theory, cryptography and quantum computing.

@ben300694

Arnab Sen Sharma @arnabsensharma

PhD Student at Northeastern, working to make LLMs interpretable

Explainable AI Berlin @xai-berlin

Explainable AI research from the machine learning group of Prof. Klaus-Robert Müller at @tuberlin.bsky.social & @bifold.berlin

Deniz Bayazit @bayazitdeniz

#NLProc PhD student @EPFL #interpretability

Amir Zur @amirzur

PhD @stanfordnlp.bsky.social‬

Verona Teo @veronateo

Psyho @psyho

Game Designer; Problem Solver; past: OpenAI (Dota), Pro Competitive Programmer, Poker

Maxime Méloux @maximemeloux

PhD student @LIG | Causal abstraction, interpretability & LLMs

David Duvenaud @davidduvenaud

Machine learning prof at U Toronto. Working on evals and AGI governance.

Antonin Poché @antoninpoche

PhD Student doing XAI for NLP at @ANITI_Toulouse, IRIT, and IRT Saint Exupery. 🛠️ Interpreto & Xplique library development team member. https://antoninpoche.github.io/

Kerem Sahin @keremsahin22

MS CS @ Northeastern

Byron Wallace @byron

Assoc. Prof in CS @ Northeastern, NLP/ML & health & etc. He/him.

Hadas Orgad @hadasorgad

Actionable Interpretability Workshop ICML2025 @actinterp

🛠️ Actionable Interpretability🔎 @icmlconf.bsky.social 2025 | Bridging the gap between insights and actions ✨ https://actionable-interpretability.github.io

Steve Byrnes @stevebyrnes

Researching Artificial General Intelligence Safety, via thinking about neuroscience and algorithms, at Astera Institute. https://sjbyrnes.com/agi.html

@aranguri

Martin Wattenberg @wattenberg

Human/AI interaction. ML interpretability. Visualization as design, science, art. Professor at Harvard, and part-time at Google DeepMind.

@dimkakha

Can @canrager

Cas (Stephen Casper) @scasper

AI safeguards & gov. research. PhD student @MIT_CSAIL (mnr. Public Policy) and Fellow at Harvard Berkman Klein. Fmr. UK AISI. https://stephencasper.com/

almost dribnet @dribnet

I moved here -> https://bsky.app/profile/drib.net <- here moved I

@vkrakovna

Research scientist in AI alignment at Google DeepMind. Co-founder of Future of Life Institute. Views are my own and do not represent GDM or FLI.

Shreyans @pyparrot

Interpretability, AI ethics, Reinforcement Learning

Sara Fish @sarafish

PhD student at Harvard interested in EconCS and ML / previously Caltech undergrad in math

Hidenori Tanaka @hidenori8tanaka

Group Leader, CBS-NTT "Physics of Intelligence" Program at Harvard website: https://sites.google.com/view/htanaka/home

Core Francisco Parkg @corefpark

https://cfpark00.github.io/

Ido Aizenbud @idoai

Computational Neuroscience PhD Student

@evhub

Alignment Stress-Testing Team Lead at Anthropic. Opinions my own. Previously: MIRI, OpenAI, Google, Yelp, Ripple. (he/him/his)

Aengus Lynch @aengusl

AI safety researcher

Tim Hua @timhua

Helping people is good I guess Trying to do AI interp and control Used to do economics timhua.me

@yimingliu

Praneet @praneet

ML PhD at McGill

Dennis Fucci @dennisfucci

Speech | XAI | Fairness in AI PhD student @fbk-mt.bsky.social

@emmabortz

Angie Boggust @angieboggust

MIT PhD candidate in the VIS group working on interpretability and human-AI alignment

Simon Schrodi @simonschrodi

🎓 PhD student @cvisionfreiburg.bsky.social @UniFreiburg 💡 interested in mechanistic interpretability, robustness, AutoML & ML for climate science https://simonschrodi.github.io/

Sarah Wiegreffe @sarah-nlp

Research in NLP (mostly LM interpretability & explainability). Assistant prof at UMD CS + CLIP. Previously @ai2.bsky.social @uwnlp.bsky.social Views my own. sarahwie.github.io

Patrick Kahardipraja @pkhdipraja

PhD student @ Fraunhofer HHI. Interpretability, incremental NLP, and NLU. https://pkhdipraja.github.io/

Sophie Hao @profsophie

Assistant professor of Linguistics and Data Science at Boston University. NLP, computational linguistics, interpretability, social bias and fairness. she/her. https://www.notaphonologist.com/

Jason Lee @jasondeanlee

Associate Professor at Princeton Machine Learning Researcher

Sophia Sanborn @naturecomputes

Searching for principles of neural representation | Neuro + AI @ enigmaproject.ai | Stanford | sophiasanborn.com

Jakub Łucki @jakublucki

Visiting Researcher at NASA JPL | Data Science MSc at ETH Zurich

Max Lamparth, Ph.D. @mlamparth

Research Fellow @ Stanford Intelligent Systems Laboratory and Hoover Institution at Stanford University | Focusing on interpretable, safe, and ethical AI/LLM decision-making. Ph.D. from TUM.

NDIF Team @ndif-team

The National Deep Inference Fabric, an NSF-funded computational infrastructure to enable research on large-scale Artificial Intelligence. 🔗 NDIF: https://ndif.us 🧰 NNsight API: https://nnsight.net 😸 GitHub: https://github.com/ndif-team/nnsight

Laura Kopf @lkopf

PhD student in Interpretable Machine Learning at @tuberlin.bsky.social & @bifold.berlin https://web.ml.tu-berlin.de/author/laura-kopf/

shuyhere @shuyhere

cs phd in KAUST

@danielchtan

Yuli Slavutsky @yulislavutsky

Stats Postdoc at Columbia, @bleilab.bsky.social Statistical ML, Generalization, Uncertainty, Empirical Bayes https://yulisl.github.io/

claudia shi @claudiashi

machine learning, causal inference, science of llm, ai safety, phd student @bleilab, keen bean https://www.claudiashi.com/

@beodu

Yoann Poupart @xmaster6y

XAI PhD Student & Entrepreneur

sqIRL Lab @sqirllab

We are sqIRL(squirrel), the Interpretable Representation Learning Lab based at IDLab - University of Antwerp & imec. Research Areas: #RepresentationLearning, #Interpretability, #explainability #ML #AI #XAI #mechinterp Website: https://sqirllab.github.io/

Andrea Santilli @asantilli

PhD student in NLP at Sapienza | Prev: Apple MLR, @colt-upf.bsky.social , HF Bigscience, PiSchool, HumanCentricArt #NLProc www.santilli.xyz

Jonas Rohweder @jonas.rohweder.io

Fabian Grob @fabiangrob

CS @ TUM | relAI MSc Fellow

Ana Lučić @a-lucic

Assistant professor at the University of Amsterdam. Previously at Microsoft Research, Partnership on AI.

attentionmech @attentionmech

Yifei Wang @ywren

Paul @notpaulmartin

NLP PhD @ Cambridge Language Technology Lab paulsbitsandbytes.com

@joshengels

PhD student at MIT. Working on mechanistic interpretability and AI safety.

Caden @cadentj

Ruizhe Li @ruizheli

Assistant Professor at University of Aberdeen | Postdoc at UCL | PhD at University of Sheffield | mechanistic interpretability & multimodal LLMs | https://www.ruizhe.space

Solal Nathan @solalnathan.com

PhD student @ U. Paris-Saclay / Inria, AI for social good, fairness, RecSys, congestion avoidance, optimal transport. ENS PS 2018. Free software advocate, linux user, cat owner. asso @ auro.re and crans.org. Bicyle, Bouldering, improv. solalnathan.com

Laura @lauraruis

PhD supervised by Tim Rocktäschel and Ed Grefenstette, part time at Cohere. Language and LLMs. Spent time at FAIR, Google, and NYU (with Brenden Lake). She/her.

Andreas Madsen @andreasmadsen

Ph.D. in NLP Interpretability from Mila. Previously: independent researcher, freelancer in ML, and Node.js core developer.

Tim Davidson @imtd

🌐 https://www.trdavidson.com 🔬research: deep generative learning; agentic systems; synthetic data PhD @EPFL on reliable magic Spent time @MSR, @Google machine learning & company building 🎓@NYU @UvA alumn

dribnet @drib.net

creations with code and networks

@jamesaoldfield

Visiting scholar @ UW-Madison & PhD student in machine learning @ QMUL. Interested in interpretability and AI safety. https://james-oldfield.github.io/

Simon Lermen @simonlermen

I work on AI safety and AI in cybersecurity

Sohee Yang @soheeyang

PhD student/research scientist intern at UCL NLP/Google DeepMind (50/50 split). Previously MS at KAIST AI and research engineer at Naver Clova. #NLP #ML 👉 https://soheeyang.github.io/

Kunvar Thaman @firstuserhere

Gonçalo Paulo @goncalo-paulo

Interpretability researcher at @eleutherai.bsky.social

Sekh (Sk) Mainul Islam @sekh-copenlu

PhD Fellow at the CopeNLU Group, University of Copenhagen; working on explainable automatic fact-checking . Prev: NYU Abu Dhabi, IIT Kharagpur. https://mainuliitkgp.github.io/

Yu-Min Tseng @ymtseng

Master Student @NTU_TW | Visiting Student @UVA | Seeking 25 fall CS PhD 🎯 🏠 www.ymtseng.com

Taufeeque @taufeeque

Research Engineer @ FAR.AI taufeeque9.github.io

Tal Haklay @talhaklay

NLP | Interpretability | PhD student at the Technion

Javier Ferrando @javifer

Interpretability

Yorguin-Jose Mantilla-Ramos @yjmantilla

veneco trying to get into interpretability, both for natural and artificial intelligence. currently a masters student at Université de Montréal.

Rob Bensinger @robbensinger

Andrew Critch @critch

Human being. Trying to do good. CEO @ Encultured AI. AI Researcher @ UC Berkeley. Listed bday is approximate ;)

Alex Irpan @alexirpan

Research Scientist @ Google DeepMind. Formerly Robotics, now AI Safety. Has a blog. Views are my own.

Eric Neyman @ericneyman

Professional reference class tennis player. I like non-fillet frozen fish, packaged medicaments, and other oily seeds.

Samuel Albanie @samuelalbanie

Elliott Thornley @elliottthornley

Research Fellow at Oxford University's Global Priorities Institute. Working on the philosophy of AI.

@neelnanda

akbir khan @akbir

dumbest overseer at @anthropic https://www.akbir.dev

Alex Turner @turntrout

Research scientist at Google DeepMind. All opinions are my own. https://turntrout.com

Tom Everitt @tom4everitt

AGI safety researcher at Google DeepMind, leading causalincentives.com Personal website: tomeveritt.se

Buck Shlegeris @bshlgrs

Epoch AI @epochai

We are a research institute investigating the trajectory of AI for the benefit of society. epoch.ai

David Lindner @davidlindner

Making AI safer at Google DeepMind davidlindner.me

Manoel Horta Ribeiro @manoelhortaribeiro

Assistant Professor @ Princeton Previously: EPFL 🇨🇭, UFMG 🇧🇷 Interests: Computational Social Science, Platforms, GenAI, Moderation

@cranchesco

Jean Kaddour @jeankaddour

https://github.com/PySpur-Dev/PySpur PhD Student at UCL // LLMs

Rifo Genadi @rifoag

M.Sc. Student at MBZUAI. I just started doing Mech Interp. I also do some stuff on low-resource language research.

@woog0

Carlos Marí @carlosmari

Grad Student carlosmari.com

Koyena Pal @koyena

CS Ph.D. Candidate @ Northeastern | Interpretability + Data Science | BS/MS @ Brown koyenapal.github.io

Clément Dumas

Following (99)