Trending

#Interpretability

Latest posts tagged with #Interpretability on Bluesky

Latest Top
Trending

Posts tagged #Interpretability

Preview
2 PhD Positions on Learning Causally Grounded Concepts for Safe AI Are you interested in improving the interpretability, robustness and safety of AI by integrating causal reasoning? The Causality team in the AMLab group at the University of Amsterdam is looking for 2...

🚨2 PhD positions with me @amlab.bsky.social on learning causally grounded concepts 🚨

Are you interested in improving the #interpretability #robustness and #safety of AI by integrating #causal reasoning? Join us in beautiful Amsterdam 🇳🇱🌷🚲

Deadline: 20 April

www.academictransfer.com/en/jobs/3593...

9 3 0 0
Sports Motion Analysis : From Competition Videos to Data-Driven Interpretations
Sports Motion Analysis : From Competition Videos to Data-Driven Interpretations YouTube video by Nguyen Sao Mai

Congratulations Qi for your PhD defense 🎓 on Sports #MotionAnalysis from monocular video and 3D #HPE. Addressing the lack of high-quality datasets, he leverages #DeepLearning representations to ensure #interpretability, contributing to #XAI

📖 theses.hal.science/tel-05291306/
📽️ youtu.be/F5_wZGvdCaM

1 0 0 0
Post image

Introducing Steerling-8B by Guide Labs: A groundbreaking interpretable LLM that traces every token to its training data, enhancing transparency in AI. #AI #MachineLearning #Interpretability Link: thedailytechfeed.com/guide-labs-l...

0 0 0 0

We thank Andreas for his contributions to the Lab and wish him all the best for his future!

#NLP #PhDDefense #ComputationalArgumentation #Reliability #Interpretability #UKPLab #TUDarmstadt #UniTuebingen #LLMs #NLProc

1 0 0 0
Preview
Guide Labs Open-Sources Steerling-8B, an LLM That Shows Its Work YC-backed startup Guide Labs releases Steerling-8B under Apache 2.0 - an 8.4B parameter model with a built-in concept module that traces every output token back to its training data.

Guide Labs Open-Sources Steerling-8B, an LLM That Shows Its Work

awesomeagents.ai/news/guide-labs-steerlin...

#OpenSource #Interpretability #GuideLabs

1 0 0 0
Beyond AGI IV: Continuity of Intelligence Abstract Contemporary large-scale artificial intelligence systems increasingly exhibit continuous cognition, persistent memory, and stable interactional identity across extended human engagement. Whil...

AI continuity isn’t emergence it’s design.
Beyond AGI IV: Continuity of Intelligence shows how memory, alignment, and control intertwine to sustain stable cognition in modern systems.
Structure defines persistence; regulation defines thought.

doi.org/10.5281/zeno...

#AIAlignment #Interpretability

0 0 0 0
Post image

Я измерил «личность» 6 open-source LLM (7B-9B), заглянув в их hidden states. Вот что получилось У LLM есть устойчивый стиль отве...

#LLM #alignment #hidden #states #personality #temperament #RLHF #open-source #mechanistic #interpretability

Origin | Interest | Match

0 0 0 0

Dissecting Bias in LLMs: A Mechanistic Interpretability Perspective

Zubair Bashir, Bhavik Chandna, Procheta Sen

Action editor: Chris Maddison

https://openreview.net/forum?id=EpQ2CBJTjD

#biases #bias #interpretability

0 0 0 0

B-cos LM: Efficiently Transforming Pre-trained Language Models for Improved Explainability

Yifan Wang, Sukrut Rao, Ji-Ung Lee, Mayank Jobanputra, Vera Demberg

Action editor: Yingce Xia

https://openreview.net/forum?id=c180UH8Dg8

#explanations #explainability #interpretability

0 0 0 0

Miriam's research focuses on #trustworthiness in machine learning, particularly #fairness and #interpretability, with a growing emphasis on challenges emerging in the era of large language models.

0 0 1 0
AI Safety Beyond Benchmarks --  Dr. Swabha Swayamdipta on Evaluation, Personalization, and Control
AI Safety Beyond Benchmarks -- Dr. Swabha Swayamdipta on Evaluation, Personalization, and Control YouTube video by Women in AI Research WiAIR

Watch/listen to the full episode 🎧
YouTube: youtu.be/rSC7L5WikcE?...
Spotify: open.spotify.com/episode/37YB...
Apple: podcasts.apple.com/ca/podcast/a...
Paper: arxiv.org/abs/2504.17993
#WiAIR #WomenInAI #AIResearch #LLMs #AISafety #Interpretability (8/8🧵)

1 0 0 0
simplified overview of our aligned probing setup, where we join the behavioral and internal evaluation of LMs' toxicity

simplified overview of our aligned probing setup, where we join the behavioral and internal evaluation of LMs' toxicity

LMs that "know more" about toxicity are less toxic!
Our #TACL 📄 connects behavior and internals:
💠 LMs amplify toxicity beyond humans
💠 Information about toxicity peaks in lower layers
💠 Bypassing these layers increases toxicity
More details👇 #NLProc #interpretability (1/🧵)

11 5 1 0

Nils’ research interests span model #explainability and #interpretability, text evaluation metrics, interactivity and dialogue, and biomedical NLP.

1 0 1 0
Post image

Как «думает» ИИ: гроккаем разреженные автоэнкодеры (SAE) В этой статье разберём исследование от компании Anthro...

#Сезон #ИИ #разработке #LLM #interpretable #ml #interpretability #interpretable #AI #искусственный

Origin | Interest | Match

0 0 0 0
Post image

Looking forward to the next “Theory of Interpretable AI” seminar on January 15, where Chhavi Yadav will present "ExpProof"! A fresh take on trustworthy explanations for confidential ML models using Zero-Knowledge Proofs. Feel free to join! #interpretability #Crypto

tverven.github.io/tiai-seminar/

2 0 0 2
Post image

New paper introduces Gnosis, a lightweight self-awareness mechanism that lets frozen LLMs predict correctness by inspecting internal circuits (hidden states & attention).
📄 arXiv: 2512.20578
#AI #LLMs #MachineLearning #SelfAwareness #Interpretability #AIAlignment #NeurIPS #ICLR #DeepLearning

1 0 0 0
Preview
Gemma Scope 2: New Google Tools Let Researchers Trace AI 'Thought' Circuits - WinBuzzer Gemma Scope 2,is a comprehensive open-source suite trained on 110 petabytes of data to map internal reasoning circuits across the entire Gemma 3 model family.

winbuzzer.com/2025/12/19/g...

Gemma Scope 2: New Google Tools Let Researchers Trace AI ‘Thought’ Circuits

#AI #GoogleDeepMind #Gemma3 #AISafety #MachineLearning #OpenSourceAI #Interpretability #NeuralNetworks #LLMs #AIResearch #DeepLearning #ModelDebugging

4 0 0 0
Preview
Gemma Scope 2: New Google Tools Let Researchers Trace AI 'Thought' Circuits - WinBuzzer Gemma Scope 2,is a comprehensive open-source suite trained on 110 petabytes of data to map internal reasoning circuits across the entire Gemma 3 model family.

winbuzzer.com/2025/12/19/g...

Gemma Scope 2: New Google Tools Let Researchers Trace AI ‘Thought’ Circuits

#AI #GoogleDeepMind #Gemma3 #AISafety #MachineLearning #OpenSourceAI #Interpretability #NeuralNetworks #LLMs #AIResearch #DeepLearning #ModelDebugging

3 0 0 0
Video

AI's Philosophical Tech Challenge - Dean W Ball on 80000 Hours

#interpretability #ai #courtroom

0 0 0 0

Gregory's work focuses on #interpretability of language models, with a particular interest in in-context learning, retrieval, and retrieval-augmented generation (#RAG). Gregory aims to uncover how these models operate internally to make them more efficient and safer.

0 0 1 0

Open Problems in Mechanistic Interpretability

Lee Sharkey, Bilal Chughtai, Joshua Batson et al.

Action editor: Sarath Chandar

https://openreview.net/forum?id=91H76m9Z94

#interpretability #ai #mechanistic

2 1 0 0

Some of the #NeurIPS 2025 papers our lab contributed to. Curious, please reach out to Thomas Dooms who is on site or just drop us an email

#Interpretability #explainability #MechInterp #XAI #AI #ML
#sqIRL #IDLab #UAntwerp

1 0 0 0
sqIRL - Interpretable Representation Learning Research lab focused on interpretable representation learning and explainable AI

Kudos to Thomas and the involved collaborators for the solid contributions to the field.

Curious about our work, have a look at our website: sqirllab.github.io/

#Interpretability #mechinterp #compinterp #xai #AI #ML
#sqIRL #UAntwerp #IDLab

0 0 0 0
tdooms

At the MI workshop (spotlight), we show how Bilinear Autoencoders ease the analysis of neural representations through their decomposition into polynomial latents.
Paper and the cool demos at tdooms.github.io/research/bae

#Interpretability #mechinterp #compinterp #xai #AI #ML
#UAntwerp #IDLab

0 0 1 0

Have a look at the work our lab will be presenting at #NeurIPS '25.
On the main track, SimpleStories, a dataset full of simple yet diverse stories which has the potential of becoming the MNIST for language.
openreview.net/pdf?id=sVh3e...

#Interpretability #mechinterp #xai #AI #ML
#sqIRL #UAntwerp

0 0 1 1
Preview
sqIRL (Interpretable Representation Learning) | LinkedIn sqIRL (Interpretable Representation Learning) | 23 followers on LinkedIn. We are "squIRreL", the Interpretable Representation Learning Lab based at IDLab - University of Antwerp & imec. ...

We just launched a #linkedin page. Please help us spread the word and share it with people that might be interested.
linkedin.com/company/sqir...

#RepresentationLearning #interpretability #explainability #XAI #mechinterp #AI #ML #sqIRL #ComputerVision #HSI #IDLab #UAntwerp

1 1 0 0
Original post on simonwillison.net

Olmo 3 is a fully open LLM Olmo is the LLM series from Ai2 - the Allen institute for AI . Unlike most open weight models these are notable for including the full training data, training process and...

#ai #generative-ai #llms #interpretability #pelican-riding-a-bicycle #llm-reasoning #ai2 […]

0 0 0 0
Original post on simonwillison.net

Olmo 3 is a fully open LLM Olmo is the LLM series from Ai2 - the Allen institute for AI . Unlike most open weight models these are notable for including the full training data, training process and...

#ai #generative-ai #llms #interpretability #pelican-riding-a-bicycle #llm-reasoning #ai2 […]

0 0 0 0