Onno Eberhard (@onnoeberhard.com)

The deadline for all SIGBOVIK 2026 papers has officially been extended to March 18! Enjoy the extra procrastination time, and maybe consider starting to write your papers!

03.03.2026 15:23 👍 13 🔁 4 💬 1 📌 0

Exciting workshop for RL enthusiasts in Mannheim! 👇

Workshop on Reinforcement Learning 2026, taking place on 𝐅𝐞𝐛𝐫𝐮𝐚𝐫𝐲 𝟔, 𝟐𝟎𝟐𝟔, at the 𝐔𝐧𝐢𝐯𝐞𝐫𝐬𝐢𝐭𝐲 𝐨𝐟 𝐌𝐚𝐧𝐧𝐡𝐞𝐢𝐦, Germany.
Participation in the workshop is 𝐟𝐫𝐞𝐞 𝐨𝐟 𝐜𝐡𝐚𝐫𝐠𝐞!
Check the program and register: www.wim.uni-mannheim.de/doering/conf...

25.11.2025 13:51 👍 8 🔁 3 💬 2 📌 1

ILIR Workshop, Dec 2, 2025 This workshop covers current research topics in reinforcement learning and causality, and in particular questions at the interface of these research areas. Of particular interest this year are also qu...

Nicolo Cesa-Bianchi and Matteo Papini are putting together a great unconference workshop at the @ellis.eu day at @euripsconf.bsky.social
If you want to talk about RL, causality, bandits, online learning, join us there on December 2nd
sites.google.com/view/ilir-wo...

16.10.2025 22:04 👍 20 🔁 7 💬 1 📌 1

I had such a great time helping organize EWRL 2025 with an amazing team 🎉
Loved being part of it and meeting so many passionate reinforcement learning enthusiasts!
@ewrl18.bsky.social

22.09.2025 15:21 👍 8 🔁 1 💬 0 📌 0

The Secret to a Smooth Pasta Sauce Wins Ig Nobel Prize Italian researchers studied how the ingredients of the traditional Roman dish cacio e pepe emulsify into a creamy sauce, winning the 2025 Physics Ig Nobel Prize.

Truly chuffed for our fearless food physicists @mpipks.bsky.social + collabs from AT @istaresearch.bsky.social, IT & ES who won this year’s Ig Nobel - the #NobelPrize of hearts❤️for cracking the science of perfect pasta !🍝Kudos to all for intrepidly consuming lots of cheese in the name of science!😋

19.09.2025 12:40 👍 65 🔁 17 💬 0 📌 4

I wrote a short post on our newest ICML paper addressed at people who are not experts in machine learning. Check it out!

12.09.2025 15:19 👍 9 🔁 3 💬 0 📌 1

A cute little animation: a critically damped harmonic oscillator becomes unstable with integral control if the gain is too high. Here, at K_i = 2, a Hopf bifurcation occurs: two poles of the transfer function enter the right-hand s-plane and the closed-loop system becomes unstable.

09.09.2025 14:34 👍 3 🔁 2 💬 0 📌 0

PheedLoop PheedLoop: Hybrid, In-Person & Virtual Event Software

📣Registration for EWRL is now open📣
Register now 👇 and join us in Tübingen for 3 days (17th-19th September) full of inspiring talks, posters and many social activities to push the boundaries of the RL community!

13.08.2025 17:02 👍 8 🔁 4 💬 0 📌 1

I am going to present the poster during the next poster session. 11am Wed.
Poster W #707

16.07.2025 16:00 👍 6 🔁 2 💬 0 📌 0

I really, really like this paper and as an open question, would love to see it tested on more memory benchmarks

16.07.2025 06:55 👍 31 🔁 2 💬 1 📌 0

Onno and I will be presenting our poster at # W1005 tomorrow (Wed) morning.
He made a great thread about it, come chat with us about POMDP theory :)

16.07.2025 03:45 👍 19 🔁 5 💬 0 📌 0

*Wednesday!

16.07.2025 03:47 👍 3 🔁 0 💬 0 📌 0

Partially Observable Reinforcement Learning with Memory Traces · Onno Eberhard ML & Mathematics

This is joint work with @claireve.bsky.social and Michael Muehlebach. If you are at ICML, please come to our poster tomorrow morning (W-1005, Tuesday, 11am-1:30pm). Paper, code, and more can be found at onnoeberhard.com/memory-traces.

16.07.2025 01:35 👍 6 🔁 0 💬 1 📌 0

Memory traces are trivially simple to implement, and we ran some experiments that demonstrate that they are an effective drop-in replacement for sliding windows ("frame stacking") in deep reinforcement learning.

16.07.2025 01:35 👍 4 🔁 0 💬 1 📌 0

However, if we allow larger values of 𝜆, then we do find environments where memory traces are considerably more powerful than sliding windows!

16.07.2025 01:35 👍 3 🔁 0 💬 1 📌 0

Our second result goes the other way: when 𝜆 < 1/2, then there is no environment where memory traces are more efficient than sliding windows. In other words, if 𝜆 < 1/2, then learning with sliding windows and memory traces is equivalent!

16.07.2025 01:35 👍 1 🔁 0 💬 1 📌 0

Using this result, we can finally compare learning with sliding windows to learning with memory traces! Our first result shows that there is no environment where sliding windows are generally more efficient than memory traces (even when restricting to 𝜆 < 1/2).

16.07.2025 01:35 👍 1 🔁 0 💬 1 📌 0

The "resolution" of a function class is given by its Lipschitz constant. We thus consider the function class ℱ = {𝑓 ∘ 𝑧 ∣ 𝑓 : 𝒵 → ℝ, 𝑓 is 𝐿-Lipschitz}. This allows us to bound the metric entropy. (The constant 𝑑_λ is the Minkowski dimension of 𝒵 if 𝜆 < 1/2.)

16.07.2025 01:35 👍 2 🔁 0 💬 1 📌 0

Without forgetting, the learning is intractable: it is equivalent to keeping the complete history. However, to distinguish histories that differ only far in the past, we need to "zoom in" a lot, as shown here.

16.07.2025 01:35 👍 1 🔁 0 💬 1 📌 0

What about memory traces? Here, I am visualizing the space 𝒵 of all possible memory traces for the case where there are only 3 possible (one-hot) observations, 𝒴 = {a, b, c}. We can show that, if 𝜆 < 1/2, then memory traces preserve all information of the complete history! Nothing is forgotten!

16.07.2025 01:35 👍 1 🔁 0 💬 1 📌 0

We are interested in efficiently learning an accurate value estimate. Statistical learning theory tells us that efficient learning is easier if the *metric entropy* 𝐻(ℱ) is small. For window memory, the function class ℱ is ℱₘ ≐ {𝑓 ∘ winₘ ∣ 𝑓: 𝒴ᵐ → ℝ}, and the metric entropy is 𝐻(ℱₘ) ∈ Θ(|𝒴|ᵐ).

16.07.2025 01:35 👍 2 🔁 0 💬 1 📌 0

We focus on the problem of policy evaluation with offline data where the environment ℰ is a hidden Markov model, and we assume that the observation space 𝒴 is one-hot. Thus, given a function class ℱ, our goal is to find the function 𝑓 ∈ ℱ that minimizes the return error.

16.07.2025 01:35 👍 1 🔁 0 💬 1 📌 0

While most theoretical work on memory in RL focuses on sliding windows of observations, winₘ(𝑦ₜ, 𝑦ₜ₋₁, … ) ≐ (𝑦ₜ, 𝑦ₜ₋₁, …, 𝑦ₜ₋ₘ₊₁), we analyze the effectiveness of *memory traces*, exponential moving averages of observations: 𝑧(𝑦ₜ, 𝑦ₜ₋₁, … ) = 𝜆𝑧(𝑦ₜ₋₁, 𝑦ₜ₋₂, … ) + (1 − 𝜆)𝑦ₜ.

16.07.2025 01:35 👍 2 🔁 0 💬 1 📌 0

I am in Vancouver at ICML, and tomorrow I will present our newest paper "Partially Observable Reinforcement Learning with Memory Traces". We argue that eligibility traces are more effective than sliding windows as a memory mechanism for RL in POMDPs. 🧵

16.07.2025 01:35 👍 59 🔁 12 💬 3 📌 3

Thanks!

16.06.2025 14:53 👍 1 🔁 0 💬 0 📌 0

This result should thus also transfer to approximate memory traces. However, the connection between memory traces and truncated histories only applies if the forgetting factor lambda is less than 1/2. The case of lambda > 1/2 is more interesting, but the connection to AIS is much less clear to me.

13.06.2025 15:13 👍 2 🔁 0 💬 1 📌 0

I believe that this case is indeed closely related to AIS. Our analysis describes a close connection between approximate memory traces and truncated histories. Under some conditions (e.g. gamma-observability), truncated histories constitute approximate information states (if I understand correctly).

13.06.2025 15:13 👍 1 🔁 0 💬 1 📌 0

I am not sure if there is a way to relate the case where these conditions are not met to AIS. However, we study the behavior of Lipschitz continuous functions of memory traces, which is closely related to quantizing the space of memory traces.

13.06.2025 15:13 👍 1 🔁 0 💬 1 📌 0

Interesting question! In the paper, we identify very general conditions under which the memory trace is an exact information state. For example, if the set of observations is linearly independent, then it suffices for the forgetting factor lambda to be rational.

13.06.2025 15:13 👍 1 🔁 0 💬 1 📌 0

Partially Observable Reinforcement Learning with Memory Traces · Onno Eberhard ML & Mathematics

For those not at RLDM, the paper (and the poster) can be found at onnoeberhard.com/memory-traces. 📄

13.06.2025 14:32 👍 3 🔁 0 💬 0 📌 0

Onno Eberhard

Latest posts by Onno Eberhard @onnoeberhard.com