Thông Nguyễn
machine1235.bsky.social
Thông Nguyễn
@machine1235.bsky.social
I like functions. I have trained functions to play Go, generate speech, and more. I also wrote a JAX library to train functions. I want to understand these functions.
We are... their moon. 😅
April 20, 2025 at 3:11 PM
These two reasons are just my speculation. I'd be happy if anyone could prove me wrong, or right.
April 20, 2025 at 3:07 PM
So when the model eventually generates a correct answer and receives a high reward, its internal hidden states already contain information about which past tokens were important in producing the correct final answer. Therefore, solving the credit assignment problem.
April 20, 2025 at 3:07 PM
Another reason could be the attention mechanism, which seems to help significantly with the credit assignment problem. During pretraining, LLMs learn to predict the next token, and the attention mechanism is trained to use past tokens to improve the prediction of the current token.
April 20, 2025 at 3:07 PM
One possible reason for this is that there's no real interaction with an external environment. Every state/action is internal. In other words, the "environment" is essentially the model itself, apart from the final reward. So in a sense, we're already doing model-based RL.
April 20, 2025 at 3:07 PM
Why is this the case? Why can a model so easily learn to generate tens of thousands of tokens of CoT, despite receiving a sparse reward only at the end? And why can it succeed even with the most basic policy gradient algorithm?
April 20, 2025 at 3:07 PM
This is somewhat similar to LLMs using chain-of-thought (CoT) reasoning versus generating a solution in one go. In CoT reasoning, the model breaks down a problem into smaller, sequential steps.

On an unrelated note, can we train latent diffusion models for text? 🤔 4/4
November 29, 2024 at 5:34 AM
Diffusion models, on the other hand, generate an image through multiple supervised steps. We break down the generation process into smaller tasks and guide the model through them.

This leads to much more stable training and better overall results. 3/4
November 29, 2024 at 5:34 AM
GANs generate an image in a single step, from random noise to the final output directly. The model has to learn the entire process in one go, using a limited number of layers in the network. 2/4
November 29, 2024 at 5:34 AM
Adding a data point that I am quite familiar with: currently, almost all SoTA audio generation models use GAN.
November 28, 2024 at 4:25 AM
So, what's next? If pre-training is slowing down, maybe post-training is about to become more important than ever!
November 23, 2024 at 5:43 PM
But now, we've exhausted most of the available internet text data. We can't keep scaling in the same way. We can still train models longer by repeating the data, but gains diminish after a few epochs.
November 23, 2024 at 5:43 PM
The wall isn't about LLMs like Gemini, Claude, or ChatGPT stagnating. These models will continue to improve, that's not in question.

The wall is about diminishing returns in scaling LLM pre-training. From GPT-1 to GPT-4, we've benefited from scaling model size and dataset size together.
November 23, 2024 at 5:42 PM

Full-on reinforcement learning, based on the interaction of the agent with the environment!

LLM is the agent, and user is the environment. With OpenAI having hundred of millions of monthly active users, I think they can do RL with real-world interactions to keep improving their model.
November 22, 2024 at 9:49 AM
But I have to say, I agree with all of his points. Long context window is the future, a future where models can have a full context for a given task, remember all the history, every interaction with the user. Steven explains it far better than I could, and it's well worth a read.
November 21, 2024 at 4:21 PM
Anyway, coming back to the essay: yes, Steven might be a bit biased, given that he works for Google, which provides the Gemini 1.5 models with the largest context window in the market right now (2M tokens). Naturally, he's inclined to advocate for using models with long context window.
November 21, 2024 at 4:21 PM
I found this very interesting, using LLMs (AIs) to assist people in their tasks by copying expert's worklow. In this case, NotebookLM is a helpful research assistant for writers. People should adopt this approach for their AI products!
November 21, 2024 at 4:21 PM