Scott Jeen (@enjeeneer.io)

On Zero-Shot Reinforcement Learning Modern reinforcement learning (RL) systems capture deep truths about general, human problem-solving. In domains where new data can be simulated cheaply, these systems uncover sequential decision-makin...

Check it out here: arxiv.org/abs/2508.16496

03.09.2025 21:00 👍 1 🔁 0 💬 0 📌 0

It's dedicated to the late Barry Sealey CBE and Helen Sealey whose funding of my earlier postgraduate studies opened the door to a PhD. I'm hugely indebted to them for their kindness and generosity.

03.09.2025 21:00 👍 0 🔁 0 💬 1 📌 0

My PhD thesis--On Zero-Shot Reinforcement Learning--is now on arXiv.

03.09.2025 21:00 👍 2 🔁 0 💬 1 📌 0

Zero-Shot Reinforcement Learning Under Partial Observability Recent work has shown that, under certain assumptions, zero-shot reinforcement learning (RL) methods can generalise to any unseen task in an environment after reward-free pre-training. Access to Marko...

More detail in the paper, at the project page or in the repo!

Paper: arxiv.org/abs/2506.15446
Project Page: enjeeneer.io/projects/bfm...
Code: github.com/enjeeneer/bf...

with Tom Bewley and Jon Cullen.

31.07.2025 21:01 👍 0 🔁 0 💬 0 📌 0

We explored different sequence models: Transformers, GRUs, LSTMs, S4d, S5.

To our surprise, we found GRUs to be far-and-away the most effective, and Transformers to be disappointingly ineffective.

Why? The combined F^T x B representation seems unstable for all non-GRU methods.

31.07.2025 21:01 👍 1 🔁 0 💬 1 📌 0

We run experiments on amended ExORL environments with different types of partial observability. In particular, we explore partially observed states, and partially observed changes in dynamics.

In aggregate, we improve performance across all partially observed settings.

31.07.2025 21:01 👍 0 🔁 0 💬 1 📌 0

We solve both failure modes by replacing BFMs' standard MLPs with sequence models that condition on trajectories of observations and actions.

We call the resultant family of methods: Behaviour Foundation Models with Memory.

31.07.2025 21:01 👍 1 🔁 0 💬 1 📌 0

When Behaviour Foundation Models are fed unreliable observations, rather than states, they fail in two predictable ways.

We call these failure models *state* misidentification, and *task* misidentification.

Each inhibits performance in isolation; together they kill the model.

31.07.2025 21:01 👍 0 🔁 0 💬 1 📌 0

BFMs are amazing.

Train them on expressive (s,a,s′) data and you'll get the optimal policy for *any* reward function in an env.

But, what if instead of states you have observations, as is almost always the case in practice?

Excited to share our new @rl-conference.bsky.social paper! 🧵

31.07.2025 21:01 👍 2 🔁 0 💬 1 📌 0

20s I turned 30 today. Here are some particularly important moments from the last decade. Highs drives to KB on frosty mornings with hamilton, hayden and adam lcd soundsystem on the west side highway wit...

I turned 30 today. Here are some particularly important moments from the last decade.

enjeeneer.io/posts/2025/0...

31.03.2025 20:31 👍 3 🔁 0 💬 0 📌 0

Horace P. Yuen (1946-2025) A farewell to my PhD advisor.

I wrote down some of my memories and reflections after the passing of my PhD advisor, Horace Yuen: realizable.substack.com/p/horace-p-y...

07.02.2025 18:48 👍 35 🔁 6 💬 1 📌 0

time is a flat circle

28.01.2025 17:15 👍 400 🔁 18 💬 6 📌 0

22.01.2025 19:18 👍 9 🔁 1 💬 0 📌 0

It all feels a bit hacky though, yeh.

21.01.2025 16:31 👍 0 🔁 0 💬 0 📌 0

- It's probs not doing pure policy exploration in the classical RL sense. The prior provided by pre-training should reduce the effective search space hugely. I could imagine that small amounts of exploration on top of the reasoning traces provided by the base model could be enough to get signal.

21.01.2025 16:31 👍 2 🔁 0 💬 1 📌 0

I don't disagree, but a couple of possible explanations:
- Fig 3 could imply that it learns to solve questions that require shorter reasoning chains first, before moving to those that require longer reasoning chains.

21.01.2025 16:31 👍 1 🔁 0 💬 1 📌 0

Felix — Jane X. Wang From the moment I heard him give a talk, I knew I wanted to work with Felix . His ideas about generalization and situatedness made explicit thoughts that had been swirling around in my head, incohe...

A brilliant colleague and wonderful soul Felix Hill recently passed away. This was a shock and in an effort to sort some things out, I wrote them down. Maybe this will help someone else, but at the very least it helped me. Rest in peace, Felix, you will be missed. www.janexwang.com/blog/2025/1/...

03.01.2025 04:02 👍 63 🔁 11 💬 2 📌 0

Thank you for this Jane, it's beautiful and heart-wrenching. I didn't know Felix well, but my few interactions with him always left me awed by his all-round brilliance. My thoughts are with you and everyone who knew him more closely. ❤️

03.01.2025 15:34 👍 1 🔁 0 💬 0 📌 0

#NeurIPS2024 wrapped up last week. I put together a curated reading list for #DeepRL and #reinforcementlearning work. (represents my interests).

Talks and workshops:
third-crowd-c77.notion.site/NeurIPS2024-...

Curated reading list
fracturedplane.notion.site/NeurIPS2024-...

#Holidayreading

23.12.2024 19:38 👍 70 🔁 14 💬 0 📌 0

NeurIPS revolves around demonstration. This year’s @rl-conference.bsky.social revolved around conversation. I much prefer the latter.

17.12.2024 14:54 👍 3 🔁 0 💬 0 📌 0

David Squires on … the power behind the throne at Arsenal: Nicolas Jover Our cartoonist on how reinventing the corner has been life-changing for the Gunners’ dead-ball guru

Here’s this week’s cartoon for @theguardian.com

www.theguardian.com/football/pic...

17.12.2024 11:48 👍 217 🔁 34 💬 19 📌 15

Meta Motivo A first-of-its-kind behavioral foundation model to control a virtual physics-based humanoid agent for a wide range of whole-body tasks.

If you think @AIatMeta ‘s Motivo looks cool in simulation, think how cool it’ll be when we make it work in the real-world! Stop by our poster today and I’ll tell you how we do it.

Poster #6008
West Ballroom A-D
4:30-7:30pm

Demo: metamotivo.metademolab.com

#NeurIPS2024

13.12.2024 17:36 👍 2 🔁 0 💬 0 📌 0

Journee The beautiful internet

Try 💎 DIAMOND’s Counter Strike world model directly in your browser!

→ next.journee.ai/xyz-diamond ←

How long can you stay in distribution? Can you beat @snguyen.bsky.social’s 1000 frames?

@eloialonso.bsky.social and I are at NeurIPS! Poster #6306, Friday 11am-2pm, West Ballroom

11.12.2024 19:47 👍 9 🔁 5 💬 2 📌 0

My bad for messing up the photo!

10.12.2024 16:57 👍 0 🔁 0 💬 1 📌 0

First #runconference @neuripsconf.bsky.social #NeurIPS2024 was great! Will share tomorrow's deets later today, join us!
@zacharylipton.bsky.social @adamjelley.bsky.social @random-steve.bsky.social

10.12.2024 16:49 👍 29 🔁 2 💬 2 📌 1

Probabilistic weather forecasting with machine learning - Nature GenCast, a probabilistic weather model using artificial intelligence for weather forecasting, has greater skill and speed than the top operational medium-range weather forecast in the world and provid...

So excited to share our Google DeepMind team's new Nature paper on GenCast, an ML-based probabilistic weather forecasting model: www.nature.com/articles/s41...

It represents a substantial step forward in how we predict weather and assess the risk of extreme events. 🌪️🧵

10.12.2024 11:00 👍 110 🔁 16 💬 2 📌 1

I’m in Whistler/Vancouver for #NeurIPS2024, and I’ll be around all week to chat RL. Swing by our poster on Friday, or hit me up on here and we can find time for a coffee!

Poster #6008
West Ballroom A-D
Friday 13th Dec 4:30-7:30pm

More details: neurips.cc/virtual/2024...

09.12.2024 14:17 👍 3 🔁 0 💬 0 📌 0

Principles of Effective Research – Michael Nielsen

These aren't books, but Michael Nielsen's "Principles of Effective Research" is great (michaelnielsen.org/blog/princip...), as is John Schulman's "Opinionated Guide to ML Research" (joschu.net/blog/opinion...).

I'd be interested to read your own version of this kinda blog Eugene!

25.11.2024 21:52 👍 2 🔁 0 💬 0 📌 0

Scott Jeen

Latest posts by Scott Jeen @enjeeneer.io