Working on the philosophy of AI.
DREST agents choose stochastically between trajectory-lengths, and maximize coins-collected conditional on each trajectory-length.
DREST agents choose stochastically between trajectory-lengths, and maximize coins-collected conditional on each trajectory-length.
‘DREST’ stands for ‘Discounted REward for Same-Length Trajectories.’
As the name suggests, the agent gets lower reward for repeatedly choosing same-length trajectories.
‘DREST’ stands for ‘Discounted REward for Same-Length Trajectories.’
As the name suggests, the agent gets lower reward for repeatedly choosing same-length trajectories.
But they only get 4 moves before they’re shutdown.
Unless they press the shutdown-delay button B4, in which case they get an extra 4 moves before shutdown.
But they only get 4 moves before they’re shutdown.
Unless they press the shutdown-delay button B4, in which case they get an extra 4 moves before shutdown.