Onno Eberhard
@onnoeberhard.com
PhD Student in Tübingen (MPI-IS & Uni Tü), interested in reinforcement learning. Freedom is a pure idea. https://onnoeberhard.com/
This is joint work with @claireve.bsky.social and Michael Muehlebach. If you are at ICML, please come to our poster tomorrow morning (W-1005, Tuesday, 11am-1:30pm). Paper, code, and more can be found at onnoeberhard.com/memory-traces.
Partially Observable Reinforcement Learning with Memory Traces · Onno Eberhard
ML & Mathematics
onnoeberhard.com
July 16, 2025 at 1:35 AM
This is joint work with @claireve.bsky.social and Michael Muehlebach. If you are at ICML, please come to our poster tomorrow morning (W-1005, Tuesday, 11am-1:30pm). Paper, code, and more can be found at onnoeberhard.com/memory-traces.
Memory traces are trivially simple to implement, and we ran some experiments that demonstrate that they are an effective drop-in replacement for sliding windows ("frame stacking") in deep reinforcement learning.
July 16, 2025 at 1:35 AM
Memory traces are trivially simple to implement, and we ran some experiments that demonstrate that they are an effective drop-in replacement for sliding windows ("frame stacking") in deep reinforcement learning.
However, if we allow larger values of 𝜆, then we do find environments where memory traces are considerably more powerful than sliding windows!
July 16, 2025 at 1:35 AM
However, if we allow larger values of 𝜆, then we do find environments where memory traces are considerably more powerful than sliding windows!
Our second result goes the other way: when 𝜆 < 1/2, then there is no environment where memory traces are more efficient than sliding windows. In other words, if 𝜆 < 1/2, then learning with sliding windows and memory traces is equivalent!
July 16, 2025 at 1:35 AM
Our second result goes the other way: when 𝜆 < 1/2, then there is no environment where memory traces are more efficient than sliding windows. In other words, if 𝜆 < 1/2, then learning with sliding windows and memory traces is equivalent!
Using this result, we can finally compare learning with sliding windows to learning with memory traces! Our first result shows that there is no environment where sliding windows are generally more efficient than memory traces (even when restricting to 𝜆 < 1/2).
July 16, 2025 at 1:35 AM
Using this result, we can finally compare learning with sliding windows to learning with memory traces! Our first result shows that there is no environment where sliding windows are generally more efficient than memory traces (even when restricting to 𝜆 < 1/2).
The "resolution" of a function class is given by its Lipschitz constant. We thus consider the function class ℱ = {𝑓 ∘ 𝑧 ∣ 𝑓 : 𝒵 → ℝ, 𝑓 is 𝐿-Lipschitz}. This allows us to bound the metric entropy. (The constant 𝑑_λ is the Minkowski dimension of 𝒵 if 𝜆 < 1/2.)
July 16, 2025 at 1:35 AM
The "resolution" of a function class is given by its Lipschitz constant. We thus consider the function class ℱ = {𝑓 ∘ 𝑧 ∣ 𝑓 : 𝒵 → ℝ, 𝑓 is 𝐿-Lipschitz}. This allows us to bound the metric entropy. (The constant 𝑑_λ is the Minkowski dimension of 𝒵 if 𝜆 < 1/2.)
Without forgetting, the learning is intractable: it is equivalent to keeping the complete history. However, to distinguish histories that differ only far in the past, we need to "zoom in" a lot, as shown here.
July 16, 2025 at 1:35 AM
Without forgetting, the learning is intractable: it is equivalent to keeping the complete history. However, to distinguish histories that differ only far in the past, we need to "zoom in" a lot, as shown here.
What about memory traces? Here, I am visualizing the space 𝒵 of all possible memory traces for the case where there are only 3 possible (one-hot) observations, 𝒴 = {a, b, c}. We can show that, if 𝜆 < 1/2, then memory traces preserve all information of the complete history! Nothing is forgotten!
July 16, 2025 at 1:35 AM
What about memory traces? Here, I am visualizing the space 𝒵 of all possible memory traces for the case where there are only 3 possible (one-hot) observations, 𝒴 = {a, b, c}. We can show that, if 𝜆 < 1/2, then memory traces preserve all information of the complete history! Nothing is forgotten!
We are interested in efficiently learning an accurate value estimate. Statistical learning theory tells us that efficient learning is easier if the *metric entropy* 𝐻(ℱ) is small. For window memory, the function class ℱ is ℱₘ ≐ {𝑓 ∘ winₘ ∣ 𝑓: 𝒴ᵐ → ℝ}, and the metric entropy is 𝐻(ℱₘ) ∈ Θ(|𝒴|ᵐ).
July 16, 2025 at 1:35 AM
We are interested in efficiently learning an accurate value estimate. Statistical learning theory tells us that efficient learning is easier if the *metric entropy* 𝐻(ℱ) is small. For window memory, the function class ℱ is ℱₘ ≐ {𝑓 ∘ winₘ ∣ 𝑓: 𝒴ᵐ → ℝ}, and the metric entropy is 𝐻(ℱₘ) ∈ Θ(|𝒴|ᵐ).
We focus on the problem of policy evaluation with offline data where the environment ℰ is a hidden Markov model, and we assume that the observation space 𝒴 is one-hot. Thus, given a function class ℱ, our goal is to find the function 𝑓 ∈ ℱ that minimizes the return error.
July 16, 2025 at 1:35 AM
We focus on the problem of policy evaluation with offline data where the environment ℰ is a hidden Markov model, and we assume that the observation space 𝒴 is one-hot. Thus, given a function class ℱ, our goal is to find the function 𝑓 ∈ ℱ that minimizes the return error.
While most theoretical work on memory in RL focuses on sliding windows of observations, winₘ(𝑦ₜ, 𝑦ₜ₋₁, … ) ≐ (𝑦ₜ, 𝑦ₜ₋₁, …, 𝑦ₜ₋ₘ₊₁), we analyze the effectiveness of *memory traces*, exponential moving averages of observations: 𝑧(𝑦ₜ, 𝑦ₜ₋₁, … ) = 𝜆𝑧(𝑦ₜ₋₁, 𝑦ₜ₋₂, … ) + (1 − 𝜆)𝑦ₜ.
July 16, 2025 at 1:35 AM
While most theoretical work on memory in RL focuses on sliding windows of observations, winₘ(𝑦ₜ, 𝑦ₜ₋₁, … ) ≐ (𝑦ₜ, 𝑦ₜ₋₁, …, 𝑦ₜ₋ₘ₊₁), we analyze the effectiveness of *memory traces*, exponential moving averages of observations: 𝑧(𝑦ₜ, 𝑦ₜ₋₁, … ) = 𝜆𝑧(𝑦ₜ₋₁, 𝑦ₜ₋₂, … ) + (1 − 𝜆)𝑦ₜ.
This result should thus also transfer to approximate memory traces. However, the connection between memory traces and truncated histories only applies if the forgetting factor lambda is less than 1/2. The case of lambda > 1/2 is more interesting, but the connection to AIS is much less clear to me.
June 13, 2025 at 3:13 PM
This result should thus also transfer to approximate memory traces. However, the connection between memory traces and truncated histories only applies if the forgetting factor lambda is less than 1/2. The case of lambda > 1/2 is more interesting, but the connection to AIS is much less clear to me.
I believe that this case is indeed closely related to AIS. Our analysis describes a close connection between approximate memory traces and truncated histories. Under some conditions (e.g. gamma-observability), truncated histories constitute approximate information states (if I understand correctly).
June 13, 2025 at 3:13 PM
I believe that this case is indeed closely related to AIS. Our analysis describes a close connection between approximate memory traces and truncated histories. Under some conditions (e.g. gamma-observability), truncated histories constitute approximate information states (if I understand correctly).