Lightnews — Scholar-powered news

L2ashobby

@l2ashobby.bsky.social

17 followers 44 following 27 posts

Learning about machine learning

Posts Replies Media Videos

L2ashobby

@l2ashobby.bsky.social

Reading: OpenAI Spinning Up Part 3
TIL: The policy gradient used to update policy takes the general form of an expected weighted sum over the trajectory. The main summation term is the gradient of log-likelihood of policy actions. The summation weights depend on the policy optimization approach.

October 22, 2025 at 3:07 PM

L2ashobby

@l2ashobby.bsky.social

Reading: RL materials (David Silver RL slides, Spinning Up)
TIL: In on-policy, the action for updating target policy becomes the next action (target = behavior policy). In off-policy, the action for updating target policy is not necessarily the next action (sampled from separate behavior policy).

flow chart diagram showing SARSA, top, vs Q-learning, bottom, where on-policy SARSA's next action is used to update Q-values, while Q-learning's next action is instead sampled from a behavior policy. Q-learning uses a target policy to find the best action, which is used to update Q-values instead.

October 20, 2025 at 9:17 PM

L2ashobby

@l2ashobby.bsky.social

Reading: FastAI Book [https://github.com/fastai/fastbook/]
Section: 04_mnist_basics
TIL: After ensuring differences between two tensors are between 0 and 1, the squared error "ups the contrast" of those differences relative to the absolute error. This will have implications on using L1 vs L2 norm.

September 9, 2025 at 4:33 PM

L2ashobby

@l2ashobby.bsky.social

Reading: Deep Learning [https://www.deeplearningbook.org]
Section: Chapter 3 - Probability and Information Theory
TIL: While KL-Divergence is sometimes referred to as a "distance" between distributions P and Q, this is not the best mental model since KL-divergence is asymmetric.

July 20, 2025 at 4:59 PM

L2ashobby

@l2ashobby.bsky.social

Reading: Deep Learning [https://www.deeplearningbook.org]
Section: Chapter 3 - Probability and Information Theory
TIL: In mixture distribution models, the component identity variable c is a kind of latent variable!

July 18, 2025 at 5:57 PM

L2ashobby

@l2ashobby.bsky.social

hello world!

July 16, 2025 at 3:05 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news