wololooo.bsky.social
@wololooo.bsky.social
Reposted
Blogpost to read today: strong argument that excessive focus on the first tokens is not something learned from data distribution (like model should naturally "care" about the start of the text to grasp the rest) but a fundamental feature of attention graph. publish.obsidian.md/the-tensor-t...
August 24, 2025 at 4:33 PM
Reposted
In a horrifying abuse of modern computers, here's a shell one-liner you can run on macOS to calculate the number of days between two dates in the most complex way imaginable - using uv and Deno and the Pydantic AI Python sandbox MCP server simonwillison.net/2025/Apr/18/...
April 18, 2025 at 4:54 AM
Reposted
This is a neat new variant on RAG - no vectors, not even full-text search, instead showing the model a header hierarchy and giving it a tool to read the relevant sections

My notes here: simonwillison.net/2024/Dec/6/r...
December 6, 2024 at 3:04 AM
Reposted
i wrote a custom llm sampler for llama-3.1-8b so it could only say words that are in the bible

github.com/vgel/biblica...
December 6, 2024 at 3:22 AM
Reposted
I always dreamed of a model that simultaneously

1. optimizes NLL of raw pixel data,
2. generates competitive high-res. natural images,
3. is practical.

But it seemed too good to be true. Until today!

Our new JetFormer model (arxiv.org/abs/2411.19722) ticks on all of these.

🧵
December 2, 2024 at 5:19 PM
Reposted
In a strange turn of events, George Hotz has written an article advocating for expedited nuclear war, and has also deleted all of his tweets
November 30, 2024 at 5:03 AM