Lightnews — Scholar-powered news

Martin Klissarov

@martinklissarov.bsky.social

We cover methods that learn:

(1) directly from experience, (2) through offline datasets and (3) with foundation models (LLMs).

We present each methods through the fundamental challenges of decision making, namely:

(a) exploration (b) credit assignment and (c) transferability

June 27, 2025 at 8:16 PM

Martin Klissarov

@martinklissarov.bsky.social

In this 80+ pages manuscript, we cover the rich, diverse and many-decades old literature studying temporal structure discovery in AI.

When and in what way should we expect these methods to benefit agents? What are the trade-offs involved?

June 27, 2025 at 8:16 PM

Martin Klissarov

@martinklissarov.bsky.social

Humans constantly leverage temporal structure: we actuate muscles each millisecond, yet our plans can span days, months and even years.

Computers are built on this same principle.

How will AI agents discover and use such structure? What is "good" structure in the first place?

June 27, 2025 at 8:16 PM

Martin Klissarov

@martinklissarov.bsky.social

As AI agents face increasingly long and complex tasks, decomposing them into subtasks becomes increasingly appealing.

But how do we discover such temporal structure?

Hierarchical RL provides a natural formalism-yet many questions remain open.

Here's our overview of the field🧵

June 27, 2025 at 8:16 PM

Martin Klissarov

@martinklissarov.bsky.social

Finally, we analyze the choice of the LLM used to write code policies. We notice a scaling behaviour wherein only the largest LLM , Llama 3.1 405b, was able to define successful policies on all tasks.

With the advent of thinking models, it would be interesting to further investigate this.

February 4, 2025 at 7:22 PM

Martin Klissarov

@martinklissarov.bsky.social

An interesting discovery we came across was how the skills that were learned naturally emerged in a form of curriculum. Easier skills are the first to maximize their skill reward, paving the way for more complex skills to be learned.

TL;DR: Hierarchy affords learnability.

February 4, 2025 at 7:22 PM

Martin Klissarov

@martinklissarov.bsky.social

Evaluations in such complex tasks is only possibly thanks to the work of dedicated fans of NetHack, who have been building and upgrading the game since 1987 (it is still an ongoing and maintained repository). We show in this figure some of the complexities of NetHack.

February 4, 2025 at 7:22 PM

Martin Klissarov

@martinklissarov.bsky.social

We highlight the complexity of some of these tasks, which on average take more than a thousand steps for completion. Even methods that are trained specifically for each task are not able to make any kind of progress.

February 4, 2025 at 7:22 PM

Martin Klissarov

@martinklissarov.bsky.social

Once the skill policies are learned, MaestroMotif can adapt, zero-shot, to new instructions and solve complex tasks simply by re-combining skills, similarly to motifs in a composition. In other words, it writes a different code policy over skills which achieves a completely different task.

February 4, 2025 at 7:22 PM

Martin Klissarov

@martinklissarov.bsky.social

MaestroMotif is a scalable and effective algorithm for AI assisted skill design. It starts by leveraging an agent designer’s prior knowledge about a domain who defines a set of useful skills, or agents. Agents here are described on a high level in natural language.

February 4, 2025 at 7:22 PM

Martin Klissarov

@martinklissarov.bsky.social

MaestroMotif builds on our previous work, Motif, which pioneered learning RL policies from AI feedback. At the time, it set a new state-of-the-art on the open-ended domain of NetHack. With MaestroMotif, we improve on this performance by two orders of magnitude. But, how are these gains obtained?

February 4, 2025 at 7:22 PM

Martin Klissarov

@martinklissarov.bsky.social

Can AI agents adapt zero-shot, to complex multi-step language instructions in open-ended environments?

We present MaestroMotif, a method for skill design that produces highly capable and steerable hierarchical agents.

Paper: arxiv.org/abs/2412.08542
Code: github.com/mklissa/maestromotif

February 4, 2025 at 7:22 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news