L2ashobby
banner
l2ashobby.bsky.social
L2ashobby
@l2ashobby.bsky.social
Learning about machine learning
Reading: OpenAI Spinning Up Part 3
TIL: The policy gradient used to update policy takes the general form of an expected weighted sum over the trajectory. The main summation term is the gradient of log-likelihood of policy actions. The summation weights depend on the policy optimization approach.
October 22, 2025 at 3:07 PM
Reading: RL materials (David Silver RL slides, Spinning Up)
TIL: In on-policy, the action for updating target policy becomes the next action (target = behavior policy). In off-policy, the action for updating target policy is not necessarily the next action (sampled from separate behavior policy).
October 20, 2025 at 9:17 PM
Reading: FastAI Book [https://github.com/fastai/fastbook/]
Section: 04_mnist_basics
TIL: After ensuring differences between two tensors are between 0 and 1, the squared error "ups the contrast" of those differences relative to the absolute error. This will have implications on using L1 vs L2 norm.
September 9, 2025 at 4:33 PM
Reading: Deep Learning [https://www.deeplearningbook.org]
Section: Chapter 3 - Probability and Information Theory
TIL: While KL-Divergence is sometimes referred to as a "distance" between distributions P and Q, this is not the best mental model since KL-divergence is asymmetric.
July 20, 2025 at 4:59 PM
Reading: Deep Learning [https://www.deeplearningbook.org]
Section: Chapter 3 - Probability and Information Theory
TIL: In mixture distribution models, the component identity variable c is a kind of latent variable!
July 18, 2025 at 5:57 PM
hello world!
July 16, 2025 at 3:05 AM