But how do we discover such temporal structure?
Hierarchical RL provides a natural formalism-yet many questions remain open.
Here's our overview of the field🧵
But how do we discover such temporal structure?
Hierarchical RL provides a natural formalism-yet many questions remain open.
Here's our overview of the field🧵
But how do we discover such temporal structure?
Hierarchical RL provides a natural formalism-yet many questions remain open.
Here's our overview of the field🧵
We show that order matters in code gen. -- casting code synthesis as a sequential edit problem by preprocessing examples in SFT data improves LM test-time scaling laws
We show that order matters in code gen. -- casting code synthesis as a sequential edit problem by preprocessing examples in SFT data improves LM test-time scaling laws
We present MaestroMotif, a method for skill design that produces highly capable and steerable hierarchical agents.
Paper: arxiv.org/abs/2412.08542
Code: github.com/mklissa/maestromotif
We present MaestroMotif, a method for skill design that produces highly capable and steerable hierarchical agents.
Paper: arxiv.org/abs/2412.08542
Code: github.com/mklissa/maestromotif
arxiv.org/abs/2410.05656
arxiv.org/abs/2410.05656