Achieves 2–10x higher cumulative reward across bandits, dark treasure rooms, and ray mazes.
Succeeds where existing methods fail to explore effectively or exploit consistently. 4/5
Achieves 2–10x higher cumulative reward across bandits, dark treasure rooms, and ray mazes.
Succeeds where existing methods fail to explore effectively or exploit consistently. 4/5
1. Training separate policies for exploration (gather info) and exploitation (maximize rewards).
2. Combining them after training to achieve high cumulative rewards. 3/5
1. Training separate policies for exploration (gather info) and exploitation (maximize rewards).
2. Combining them after training to achieve high cumulative rewards. 3/5
📍 Catch @jeffclune.com and I today at West Ballroom A-D #6407 from 4:30–7:30, or join virtually: neurips.cc/virtual/2024...
Achieves 2-10x higher reward (🔥🔥) vs. leading meta-RL algorithms!
👇 Here's the main idea:
📍 Catch @jeffclune.com and I today at West Ballroom A-D #6407 from 4:30–7:30, or join virtually: neurips.cc/virtual/2024...
Achieves 2-10x higher reward (🔥🔥) vs. leading meta-RL algorithms!
👇 Here's the main idea: