Sai Prasanna's Avatar

Sai Prasanna

@saiprasanna.in

See(k)ing the surreal Causal World Models for Curious Robots @ University of TΓΌbingen/Max Planck Institute for Intelligent Systems πŸ‡©πŸ‡ͺ #reinforcementlearning #robotics #causality #meditation #vegan

2,137
Followers
686
Following
287
Posts
04.07.2023
Joined
Posts Following

Latest posts by Sai Prasanna @saiprasanna.in

Preview
On the Pitfalls of Heteroscedastic Uncertainty Estimation with Probabilistic Neural Networks Capturing aleatoric uncertainty is a critical part of many machine learning systems. In deep learning, a common approach to this end is to train a neural network to estimate the parameters of a hetero...

arxiv.org/abs/2203.091...

10.09.2025 09:27 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Use Beta NLL for regression when you also predict standard deviations, a simple change to NLL that works reliably better.

10.09.2025 09:27 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

If open-endedness has to be fundamentally subjectively measured, what are the factors of the agent makes it so if we fix humans as the final arbiter or evaluator. Does embodiment/action space etc of the agent matter for a human evaluator of open-endedness?

02.08.2025 23:53 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
A Brief, Incomplete, and Mostly Wrong History of Robotics (An homage to one of my favorite pieces on the internet: A Brief, Incomplete, and Mostly Wrong History of Programming Languages)

🀣 generalrobots.substack.com/p/a-brief-in...

25.06.2025 11:00 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

But this is from the vibes of TΓΌbingen from 1.5 days of visit. I have lived in Freiburg for 3 years

27.03.2025 13:55 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image Post image Post image Post image

Freiburg

27.03.2025 13:55 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image Post image Post image Post image

TΓΌbingen

27.03.2025 13:50 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

TΓΌbingen: Freiburg:: Introvert:Extrovert

27.03.2025 13:50 πŸ‘ 4 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Had a discussion with a fellow not-so-political Indian colleague doing a PhD in computer science in Europe. He is now thinking twice on his plan to go for an exchange at an US lab

27.03.2025 09:29 πŸ‘ 18 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
Preview
Discworld Rules And LOTR is brain-rot for technologists

contraptions.venkateshrao.com/p/discworld-...

15.03.2025 09:47 πŸ‘ 9 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
Preview
Discworld Rules And LOTR is brain-rot for technologists

This might be the most fun I’ve had writing an essay in a while. Felt some of that old going-nuts-with-an-idea energy flowing.

open.substack.com/pub/contrapt...

08.03.2025 02:53 πŸ‘ 56 πŸ” 9 πŸ’¬ 4 πŸ“Œ 2
Preview
Model Predictive Control and Reinforcement Learning: A Unified Framework Based on Dynamic Programming In this paper we describe a new conceptual framework that connects approximate Dynamic Programming (DP), Model Predictive Control (MPC), and Reinforcement Learning (RL). This framework centers around ...

This week's #PaperILike is "Model Predictive Control and Reinforcement Learning: A Unified Framework Based on Dynamic Programming" (Bertsekas 2024).

If you know 1 of {RL, controls} and want to understand the other, this is a good starting point.

PDF: arxiv.org/abs/2406.00592

02.03.2025 16:19 πŸ‘ 43 πŸ” 8 πŸ’¬ 0 πŸ“Œ 0

Curious to know which show

02.03.2025 17:22 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

One strategy I guess is to have good stream of good (BS filter) and diverse (topics, areas) inputs (books, research papers, what not)

And not get bogged by the fact that I am too distracted to go deep into one input stream (book or podcast or article or paper) at a time

01.03.2025 22:12 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Do any of my fellow fox-brained folks (@vgr.bsky.social) have good strategies for aiding background processing? I think background processing feels more foxy thing intutively

@visakanv.com (not sure if you identify as a fox in the fox hedgehog dichotomy though)

01.03.2025 22:06 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

I guess the trick would be to do actions that makes the mind and emotional states to be fertile for the background processing to happen consistently!

01.03.2025 22:02 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

I realized how I background process tonnes of information, from work/research and emotional stuff. And it works well, leads to good research ideas, wise processing of tough situations! But It's so hard to learn to trust this as conscious thinking for solving problems feels more under my "control"

01.03.2025 22:02 πŸ‘ 1 πŸ” 0 πŸ’¬ 2 πŸ“Œ 0

Conditioning gap in latent space world models is due to how uncertainty can go into latent posterior distribution or the learnt prior (dynamics model) and not conditioning on the future would put the uncertainty incorrectly into dynamics model.

01.03.2025 21:50 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

To re-think I think the problems could be orthogonal. Clever hans pertains to teacher forcing during training leading to easy solutions for lot of the timesteps skewing it to not learning the hard timestep which is most important for test-time.

01.03.2025 21:50 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

(Shame that argmax.org/blog is down now!! They're a really nice less known research group in Volkswagen doing important stuff in world models.)

Anyways, If these two problems are related, just establishing that would be an amazing paper!

01.03.2025 21:29 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
A Tale of Gaps - argmax.org With variational auto-encoders (VAEs), it has become popular to approximate Bayesian inference with neural networks. This scales Bayesian inference to large datasets and deep generative models at the ...

Blog web.archive.org/web/20241108...

paper arxiv.org/abs/2101.07046

Applied to world models for pomdps web.archive.org/web/20241009...

01.03.2025 21:29 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Conditioning gap: When you train a value encoder that computes an approximate posterior that's conditioned partially (say on past tokens), then the posterior has a worse lower bound than one also conditioned on everything (also future tokens).

01.03.2025 21:29 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

It reminds me of another problem, and I'm not sure if it's equivalent or if it's some dual problem. It's called the conditioning gap in latent space inference.

01.03.2025 21:29 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

The fix involves modelling forward and backward directions. I haven't grokked it fully, but I learnt about the above problem there. I find this two papers a really nice sequence of a fundamental problem and then a solution!

01.03.2025 21:29 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
The Belief State Transformer We introduce the "Belief State Transformer", a next-token predictor that takes both a prefix and suffix as inputs, with a novel objective of predicting both the next token for the prefix and the previ...

And there is a new paper that claims to fix this for transformer architecture!!! They call it "belief state transformer". Apparently it fixes lots of practical problems arising due to clever hans cheat!

arxiv.org/abs/2410.23506

01.03.2025 21:29 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Since teacher forcing makes the model learn easy cheat for most easy tokens, the learning dynamics make it hard to find the correct strategy for the first token.

01.03.2025 21:29 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

But teacher forcing makes it easy to predict all tokens after the first branching token by paying attention only to previous token and remembering or attending to the edge with this. This strategy doesn't work for the first token where there are the start branches

01.03.2025 21:29 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

The easiest coorect solution for the model is to look at the edge with the goal (since it's star graph there is only one edge) and work the way backwards to the start (in it's computation) and output the path one by one forward.

01.03.2025 21:29 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
star graph illustrating the clever hans trick

star graph illustrating the clever hans trick

Imagine a task where you give a list of edges of a star graph, start and end node, and train a model with a teacher forcing you to predict the list of tokens in the path from the start to the end.

(edge 1, edge 2 ...) (start, goal) (start, intermediate1, intermediate 2 .. .goal)

01.03.2025 21:29 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

This failure occurs in distribution, not OOD. And it apparently is general for any model learning next-token prediction regardless of recurrence (linear or otherwise) or attention!!!

01.03.2025 21:29 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0