What if the news appears in the context upstream of the *same* FT data?
🚨 Contextual Shadowing happens!
Prefixing the news during FT *catastrophically* reduces learning!
10/n
What if the news appears in the context upstream of the *same* FT data?
🚨 Contextual Shadowing happens!
Prefixing the news during FT *catastrophically* reduces learning!
10/n
Training on synthetic Q/A pairs really boost knowledge integration!
7/n
Training on synthetic Q/A pairs really boost knowledge integration!
7/n
We call this the FT-ICL gap.
5/n
We call this the FT-ICL gap.
5/n
What happens to an LLM’s internal representations in the large context limit?
We find that LLMs form “in-context representations” to match the structure of the task given in context!
What happens to an LLM’s internal representations in the large context limit?
We find that LLMs form “in-context representations” to match the structure of the task given in context!