Alex Wettig
awettig.bsky.social
Alex Wettig
@awettig.bsky.social
PhD@Princeton trying to make sense of language models and their training data
Modern pre-training relies on crawling the web to collect trillions of tokens

We craft careful descriptions of topic and format categories and prompt an LLM to structure this loose collection of web pages

🔍 Explore our domains and see examples at weborganizer.allen.ai
February 18, 2025 at 12:31 PM