bennorman451.bsky.social
@bennorman451.bsky.social
Results That Speak: 🔥 In challenging domains:
Achieves 2–10x higher cumulative reward across bandits, dark treasure rooms, and ray mazes.
Succeeds where existing methods fail to explore effectively or exploit consistently. 4/5
December 12, 2024 at 8:03 PM
The Idea: ✨ First-Explore addresses this by:
1. Training separate policies for exploration (gather info) and exploitation (maximize rewards).
2. Combining them after training to achieve high cumulative rewards. 3/5
December 12, 2024 at 8:03 PM
​​🎉 Big news! Our work, "First-Explore, then Exploit," is at #NeurIPS2024!
📍 Catch @jeffclune.com and I today at West Ballroom A-D #6407 from 4:30–7:30, or join virtually: neurips.cc/virtual/2024...
Achieves 2-10x higher reward (🔥🔥) vs. leading meta-RL algorithms!
👇 Here's the main idea:
December 12, 2024 at 8:03 PM