@saiprasanna.in
See(k)ing the surreal Causal World Models for Curious Robots @ University of TΓΌbingen/Max Planck Institute for Intelligent Systems π©πͺ #reinforcementlearning #robotics #causality #meditation #vegan
Use Beta NLL for regression when you also predict standard deviations, a simple change to NLL that works reliably better.
If open-endedness has to be fundamentally subjectively measured, what are the factors of the agent makes it so if we fix humans as the final arbiter or evaluator. Does embodiment/action space etc of the agent matter for a human evaluator of open-endedness?
But this is from the vibes of TΓΌbingen from 1.5 days of visit. I have lived in Freiburg for 3 years
Freiburg
TΓΌbingen
TΓΌbingen: Freiburg:: Introvert:Extrovert
Had a discussion with a fellow not-so-political Indian colleague doing a PhD in computer science in Europe. He is now thinking twice on his plan to go for an exchange at an US lab
This might be the most fun Iβve had writing an essay in a while. Felt some of that old going-nuts-with-an-idea energy flowing.
open.substack.com/pub/contrapt...
This week's #PaperILike is "Model Predictive Control and Reinforcement Learning: A Unified Framework Based on Dynamic Programming" (Bertsekas 2024).
If you know 1 of {RL, controls} and want to understand the other, this is a good starting point.
PDF: arxiv.org/abs/2406.00592
Curious to know which show
One strategy I guess is to have good stream of good (BS filter) and diverse (topics, areas) inputs (books, research papers, what not)
And not get bogged by the fact that I am too distracted to go deep into one input stream (book or podcast or article or paper) at a time
Do any of my fellow fox-brained folks (@vgr.bsky.social) have good strategies for aiding background processing? I think background processing feels more foxy thing intutively
@visakanv.com (not sure if you identify as a fox in the fox hedgehog dichotomy though)
I guess the trick would be to do actions that makes the mind and emotional states to be fertile for the background processing to happen consistently!
I realized how I background process tonnes of information, from work/research and emotional stuff. And it works well, leads to good research ideas, wise processing of tough situations! But It's so hard to learn to trust this as conscious thinking for solving problems feels more under my "control"
Conditioning gap in latent space world models is due to how uncertainty can go into latent posterior distribution or the learnt prior (dynamics model) and not conditioning on the future would put the uncertainty incorrectly into dynamics model.
To re-think I think the problems could be orthogonal. Clever hans pertains to teacher forcing during training leading to easy solutions for lot of the timesteps skewing it to not learning the hard timestep which is most important for test-time.
(Shame that argmax.org/blog is down now!! They're a really nice less known research group in Volkswagen doing important stuff in world models.)
Anyways, If these two problems are related, just establishing that would be an amazing paper!
Blog web.archive.org/web/20241108...
paper arxiv.org/abs/2101.07046
Applied to world models for pomdps web.archive.org/web/20241009...
Conditioning gap: When you train a value encoder that computes an approximate posterior that's conditioned partially (say on past tokens), then the posterior has a worse lower bound than one also conditioned on everything (also future tokens).
It reminds me of another problem, and I'm not sure if it's equivalent or if it's some dual problem. It's called the conditioning gap in latent space inference.
The fix involves modelling forward and backward directions. I haven't grokked it fully, but I learnt about the above problem there. I find this two papers a really nice sequence of a fundamental problem and then a solution!
And there is a new paper that claims to fix this for transformer architecture!!! They call it "belief state transformer". Apparently it fixes lots of practical problems arising due to clever hans cheat!
arxiv.org/abs/2410.23506
Since teacher forcing makes the model learn easy cheat for most easy tokens, the learning dynamics make it hard to find the correct strategy for the first token.
But teacher forcing makes it easy to predict all tokens after the first branching token by paying attention only to previous token and remembering or attending to the edge with this. This strategy doesn't work for the first token where there are the start branches
The easiest coorect solution for the model is to look at the edge with the goal (since it's star graph there is only one edge) and work the way backwards to the start (in it's computation) and output the path one by one forward.
star graph illustrating the clever hans trick
Imagine a task where you give a list of edges of a star graph, start and end node, and train a model with a teacher forcing you to predict the list of tokens in the path from the start to the end.
(edge 1, edge 2 ...) (start, goal) (start, intermediate1, intermediate 2 .. .goal)
This failure occurs in distribution, not OOD. And it apparently is general for any model learning next-token prediction regardless of recurrence (linear or otherwise) or attention!!!