Marcel Hussing
marcelhussing.bsky.social
Marcel Hussing
@marcelhussing.bsky.social
PhD student at the University of Pennsylvania. Prev, intern at MSR, currently at Meta FAIR. Interested in reliable and replicable reinforcement learning, robotics and knowledge discovery: https://marcelhussing.github.io/
All posts are my own.
You know what would be funny? If it comes back and the reviews aren't out yet.
November 11, 2025 at 9:09 PM
Not sure there is a single good source. Maybe we should write one @cvoelcker.bsky.social
October 27, 2025 at 3:10 PM
I don't necessarily think it's dull but one would need a conference where work like that can be published. Only TMLR comes to mind to some extent.
October 27, 2025 at 2:57 PM
The cynic in me wants to say "because the paper needs to confuse the reviewer to get accepted" but I would of course never say that.
October 27, 2025 at 2:51 PM
I also think it's not that they don't work but there were a lot of entangled problems that over the years have been addressed. I'm convinced that many of these things need to be restudied with our new algorithmic/architectural insights that simply make learning stable.
October 27, 2025 at 2:45 PM
Yea 😂 we spent a lot of time on getting the exponents on the ridge regression small to avoid an explosion down the line but that worked out only semi well. 😅 I do think it's probably possible to get much smaller exponents but I suspect that will require a fundamentally different approach.
October 26, 2025 at 3:26 PM
This should of course say quantizing Q-values 🤦
October 26, 2025 at 2:34 PM
This was a fun collaboration between theory and practice with the theory group at Penn.

👩‍🎓👨‍🎓
@ericeaton.bsky.social
@mkearnsphilly.bsky.social
@aaroth.bsky.social
@sikatasengupta.bsky.social
@optimistsinc.bsky.social

(6/6)
October 26, 2025 at 2:16 PM
We also empirically evaluate the algorithms. We first demonstrate that the sample complexity bounds are not representative of average case performance. Then, we derive insights for deep RL with discrete action spaces.

💡 Quantizing actions leads to agreement across policies! (5/6)
October 26, 2025 at 2:16 PM
We build two objects near a reference point and apply randomized rounding. In ridge, the reference is the convex minimizer. With a Rademacher argument and uniform gradient convergence, this yields replicability. Also, the algorithm is replicable even if not fully accurate. (4/6)
October 26, 2025 at 2:16 PM
In this work, we show that we can get replicability guarantees even in function approximation settings with RL. The idea is to ensure replicability of ridge regression and uncentered covariance estimation first. Then, use these tools in common approaches that solve linear MDPs. (3/6)
October 26, 2025 at 2:16 PM
is motivated by the fact that in deep RL, variation from randomness can lead to drastically different solutions when executing the same algorithm twice. An algorithm is formally replicable, if (whp) it produces identical outcomes. I.e., run your algorithm twice and get the same policy twice. (2/6)
October 26, 2025 at 2:16 PM
A large chunk of CS theory is ordering alphabetically, some even order randomly. Without any common standard, ideas like these are just gonna disadvantage people.
October 26, 2025 at 1:58 PM
This one is so accurate it hurts my soul
October 23, 2025 at 2:31 AM
arxiv.org/abs/2207.04136 we always wondered how to discover the factored structure if not given. It's an intriguing question for which I have a few ideas but so far too little time.
August 23, 2025 at 2:20 AM