TIL: The policy gradient used to update policy takes the general form of an expected weighted sum over the trajectory. The main summation term is the gradient of log-likelihood of policy actions. The summation weights depend on the policy optimization approach.
TIL: The policy gradient used to update policy takes the general form of an expected weighted sum over the trajectory. The main summation term is the gradient of log-likelihood of policy actions. The summation weights depend on the policy optimization approach.
TIL: In on-policy, the action for updating target policy becomes the next action (target = behavior policy). In off-policy, the action for updating target policy is not necessarily the next action (sampled from separate behavior policy).
TIL: In on-policy, the action for updating target policy becomes the next action (target = behavior policy). In off-policy, the action for updating target policy is not necessarily the next action (sampled from separate behavior policy).
Section: 04_mnist_basics
TIL: After ensuring differences between two tensors are between 0 and 1, the squared error "ups the contrast" of those differences relative to the absolute error. This will have implications on using L1 vs L2 norm.
Section: 04_mnist_basics
TIL: After ensuring differences between two tensors are between 0 and 1, the squared error "ups the contrast" of those differences relative to the absolute error. This will have implications on using L1 vs L2 norm.
Section: Chapter 3 - Probability and Information Theory
TIL: While KL-Divergence is sometimes referred to as a "distance" between distributions P and Q, this is not the best mental model since KL-divergence is asymmetric.
Section: Chapter 3 - Probability and Information Theory
TIL: While KL-Divergence is sometimes referred to as a "distance" between distributions P and Q, this is not the best mental model since KL-divergence is asymmetric.
Section: Chapter 3 - Probability and Information Theory
TIL: In mixture distribution models, the component identity variable c is a kind of latent variable!
Section: Chapter 3 - Probability and Information Theory
TIL: In mixture distribution models, the component identity variable c is a kind of latent variable!