PhD Student at Northeastern, working to make LLMs interpretable
The largest workshop on analysing and interpreting neural networks for NLP.
BlackboxNLP will be held at EMNLP 2025 in Suzhou, China
blackboxnlp.github.io
PhD student @ Northeastern University, Clinical NLP
https://hibaahsan.github.io/
she/her
PhD candidate in CS at Northeastern University | NLP + HCI for health | she/her 🏃♀️🧅🌈
CS PhD student at Harvard. Interested in Interpretability 🔍, Visualizations 📊, Human-AI Interaction🧍🤖. All opinions are mine. https://yc015.github.io/
PhD (in progress) @ Northeastern! NLP 🤝 LLMs
she/her
ML researcher, building interpretable models at Guide Labs (guidelabs.bsky.social).
PhD student @LIG | Causal abstraction, interpretability & LLMs
Trying to figure things out about how best we can live together
hacker / CS professor https://www.khoury.northeastern.edu/~arjunguha/
PhD student in Interpretable Machine Learning at @tuberlin.bsky.social & @bifold.berlin
https://web.ml.tu-berlin.de/author/laura-kopf/
machine learning, causal inference, science of llm, ai safety, phd student @bleilab, keen bean
https://www.claudiashi.com/
Helping people is good I guess
Trying to do AI interp and control
Used to do economics
timhua.me
CS Ph.D. Candidate @ Northeastern | Interpretability + Data Science | BS/MS @ Brown
koyenapal.github.io
AI Program Officer at Longview Philanthropy. Own views.
🔸 giving 10% of my lifetime income to effective charities via Giving What We Can
CEO of Coefficient Giving
Trying for human-compatible humans
Superforecaster at Good Judgment. Also forecasting at Swift Centre, Samotsvety, RAND and a hedge fund. Impartial beneficence enthusiast.
blog.jacobtrefethen.com
Managing Director, Coefficient Giving
science!
Raising kids & bread & grant money. Cleaning data & diapers & fish. EA (bed nets, not light cone). Social scientist. typos. twitter.com/ryancbriggs
💎 here to believe true things and do good actions 💎 someone should probably solve AI alignment 💎 enjoying things rules! ☀️ but it's not snowing now
english/toki pona/日本語
Program Officer on nuclear policy at Longview Philanthropy (http://longview.org). Opinions are my own.
P(A|B) = [P(A)*P(B|A)]/P(B), all the rest is commentary. Click to read Astral Codex Ten, by Scott Alexander, a […] [bridged from astralcodexten.com on the web: https://fed.brid.gy/web/astralcodexten.com ]
Senior Research Scientist at Google DeepMind. AGI Alignment researcher. Views my dog's.
Writing a book on AI+economics+geopolitics for Nation Books.
Covers: The Nation, Jacobin. Bylines: NYT, Nature, Bloomberg, BBC, Guardian, TIME, The Verge, Vox, Thomson Reuters Foundation, + others.
Research @ Open Philanthropy. Formerly economist at GPI / Nuffield College, Oxford.
Interests: development econ, animal welfare, global catastrophic risks
Comms officer @ Open Philanthropy, former Magic pro, webfiction connoisseur. https://aarongertler.net/
👎: suffering | 👍: EA, AI alignment, decoupling, R, cringe, amateur pharmacology + programming | Georgetown '22 (math+econ+phil) | Career status: 🤷♂️
Technical AI Governance Research at MIRI
Views are my own
Computer Science PhD Student @ Stanford | Geopolitics & Technology Fellow @ Harvard Kennedy School/Belfer | Vice Chair EU AI Code of Practice | Views are my own
Building theaidigest.org and forecasting tools @aidigest.bsky.social
https://binksmith.com
We are a research institute investigating the trajectory of AI for the benefit of society.
epoch.ai
policy for v smart things @openai. Past: PhD @HarvardSEAS/@SchmidtFutures/@MIT_CSAIL. Posts my own; on my head be it
METR is a research nonprofit that builds evaluations to empirically test AI systems for capabilities that could threaten catastrophic harm to society.
ai governance @openphil, unsupervised learner
Senior Researcher at Oxford University.
Author — The Precipice: Existential Risk and the Future of Humanity.
tobyord.com
Trying to help the world navigate potentially transformative technologies, currently via AI Governance and Policy at Coefficient Giving. Enjoyer of acoustic guitars, history books, and plant-based foods.
Senior Policy Advisor for AI and Emerging Technology, White House Office of Science and Technology Policy | Strategic Advisor for AI, National Science Foundation
https://hyperdimensional.co
Thinking about thinking machines | University of Cambridge and Leverhulme Centre for the Future of Intelligence | Previously Google DeepMind
What would we need to understand in order to design an amazing future? Ex DeepMind, OpenAI
Social policy synthesizer. www.secondbest.ca
AI grantmaking at Coefficient Giving
Previously 80,000 Hours
lawsen.substack.com
friendly deep sea dweller
Professor of Applied Physics at Stanford | Venture Partner a16z | Research in AI, Neuroscience, Physics
AI systems and models that are engineered to be interpretable and auditable.
www.guidelabs.ai
Thinking about how/why AI works/doesn't, and how to make it go well for us.
Currently: AI Agent Security @ US AI Safety Institute
benjaminedelman.com
PhD student at MIT.
Working on mechanistic interpretability and AI safety.
Never Bullshit
I challenge any and every one who wants to kick my ass to a debate .
https://www.patreon.com/dril
https://www.instagram.com/dril
https://linktr.ee/drilreal
Interpretability researcher at @eleutherai.bsky.social
Postdoc at CBS, Harvard University
(New around here)
CS Prof at Brown University, PI of the GIRAFFE lab, former AI Policy Advisor in the US Senate, co-chair of the ACM Tech Policy Subcommittee on AI and Algorithms.
PhD at MIT CSAIL '23, Harvard '16, former Google APM. Dog mom to Ducki.
Postdoc at MIT. Research: language, the brain, NLP.
jmichaelov.com
Research Fellow @ Kempner Institute, Harvard University
Theory of Deep Learning / Learning of Deep Theory
Postdoc @ Princeton AI Lab
Natural and Artificial Minds
Prev: PhD @ Brown, MIT FutureTech
Website: https://annatsv.github.io/
PhD student @ MIT | Previously PYI @ AI2 | MS'21 BS'19 BA'19 @ UW | zhaofengwu.github.io
Human/AI interaction. ML interpretability. Visualization as design, science, art. Professor at Harvard, and part-time at Google DeepMind.
PhD student doing LLM interpretability with @davidbau.bsky.social and @byron.bsky.social. (they/them) https://sfeucht.github.io
☆ °。⋆ (mechanistic) interpretability + interaction (design) ⋆。° ☆
Master student at ENS Paris-Saclay / aspiring AI safety researcher / improviser
Prev research intern @ EPFL w/ wendlerc.bsky.social and Robert West
MATS Winter 7.0 Scholar w/ neelnanda.bsky.social
https://butanium.github.io
Postdoc at Northeastern and incoming Asst. Prof. at Boston U. Working on NLP, interpretability, causality. Previously: JHU, Meta, AWS
https://mega002.github.io
Gemini Post-Training ⚫️ Research Scientist at Google DeepMind ⚫️ PhD from ETH Zurich
AI Safety Research // Software Engineering
Waiting on a robot body. All opinions are universal and held by both employers and family. ML/NLP professor.
nsaphra.net
Machine learning haruspex
NLP PhD student at Imperial College London and Apple AI/ML Scholar.
Machine learning PhD student @ Blei Lab in Columbia University
Working in mechanistic interpretability, nlp, causal inference, and probabilistic modeling!
Previously at Meta for ~3 years on the Bayesian Modeling & Generative AI teams.
🔗 www.sweta.dev
Machine Learning PhD Student
@ Blei Lab & Columbia University.
Working on probabilistic ML | uncertainty quantification | LLM interpretability.
Excited about everything ML, AI and engineering!
PhD student at Vector Institute / University of Toronto. Building tools to study neural nets and find out what they know. He/him.
www.danieldjohnson.com
Mechanistic interpretability
Creator of https://github.com/amakelov/mandala
prev. Harvard/MIT
machine learning, theoretical computer science, competition math.
Post-doc @ Harvard. PhD UMich. Spent time at FAIR and MSR. ML/NLP/Interpretability
Computer Science PhD student | AI interpretability | Vision + Language | Cogntive Science. Prev. intern @MicrosoftResearch.
https://martinagvilas.github.io/
ml/nlp phding @ usc, currently visiting harvard, scientisting @ startup;
interpretability & training & reasoning
iglee.me
Assistant Professor, University of Copenhagen; interpretability, xAI, factuality, accountability, xAI diagnostics https://apepa.github.io/
Computation & Complexity | AI Interpretability | Meta-theory | Computational Cognitive Science
https://fedeadolfi.github.io
On the job market!
Scruting matrices @ Apollo Research
PhD student at UC Berkeley. NLP for signed languages and LLM interpretability. kayoyin.github.io
🏂🎹🚵♀️🥋
PhD at EPFL with Robert West, Master at ETHZ
Mainly interested in Language Model Interpretability and Model Diffing.
MATS 7.0 Winter 2025 Scholar w/ Neel Nanda
jkminder.ch
PhD student @CMU LTI - working on model #interpretability, student researcher @google; prev predoc @ai2; intern @MSFT
nishantsubramani.github.io
CS PhD Student, Northeastern University - Machine Learning, Interpretability https://ericwtodd.github.io