Core Francisco Parkg
corefpark.bsky.social
Core Francisco Parkg
@corefpark.bsky.social
⚠️⚠️ But here comes drama!!!

What if the news appears in the context upstream of the *same* FT data?

🚨 Contextual Shadowing happens!

Prefixing the news during FT *catastrophically* reduces learning!

10/n
May 21, 2025 at 12:07 AM
Among these protocols, Self-QA especially stood out, largely mitigating the FT-ICL gap and integrating the given knowledge into the model!

Training on synthetic Q/A pairs really boost knowledge integration!

7/n
May 21, 2025 at 12:07 AM
As expected, naïve fine-tuning on the raw facts isn’t enough to integrate knowledge across domains or model sizes up to 32B.

We call this the FT-ICL gap.

5/n
May 21, 2025 at 12:07 AM
New paper! “In-Context Learning of Representations”

What happens to an LLM’s internal representations in the large context limit?

We find that LLMs form “in-context representations” to match the structure of the task given in context!
January 5, 2025 at 4:02 PM