Brian Kent
briankent.bsky.social
Brian Kent
@briankent.bsky.social
ML/AI at Sennder || ex Turi, Apple

Longer posts at https://crosstab.io/.
Reposted by Brian Kent
I'm sadly not at #ACL2025, but the work on tokenization seem to continue to explode. Here are the tokenization related papers I could find, in no particular order. Let me know if I missed any.
July 30, 2025 at 2:03 PM
LiteLLM does an impressive job tracking meter prices for a wide variety of LLMs, but their documentation is a bit thin about how to use that info. Here's a short example of how I use a CustomLogger class to track costs across multiple LLM calls.

www.crosstab.io/articles/lit...
How to track LLM costs with LiteLLM – Brian Patrick Kent
A more complete code example for LLM cost visibility with LiteLLM.
www.crosstab.io
June 29, 2025 at 7:14 PM
Reposted by Brian Kent
Anyone got a good alternative to Pocket as a read later / stash a copy of an article tool?
June 13, 2025 at 11:13 PM
Reposted by Brian Kent
I'll promise I will shut up about AI soon, but since so many asked I wrote down my agentic flow and also why I'm all the sudden writing Go. lucumr.pocoo.org/2025/6/12/ag...
Agentic Coding Recommendations
Current recommendations of agentic coding.
lucumr.pocoo.org
June 12, 2025 at 9:11 AM
Claude 3.7 Sonnet followed my text-to-SQL instructions flawlessly, but Claude Sonnet 4 just can't seem to get it right.

www.crosstab.io/articles/cla...
A Claude Sonnet 4 regression – Brian Patrick Kent
Claude Sonnet 4 seems to be a clear step backward in its ability to follow instructions regarding the format of code output.
www.crosstab.io
June 6, 2025 at 7:03 AM
Reposted by Brian Kent
Unfortunately housing theory of everything is correct and you can't unsee it once you see it:
worksinprogress.co/issue/the-ho...
The housing theory of everything - Works in Progress Magazine
Western housing shortages drive inequality, climate change, low productivity growth, obesity, and even falling fertility rates.
worksinprogress.co
May 29, 2025 at 11:18 PM
Is it just me or does Claude 4 Sonnet seem super overeager with code in the chat UI?

I just want to know how some API's output is structured and Claude is giving me hundreds of lines of fuzzy deduplication, error trapping, the whole works.
May 27, 2025 at 6:52 AM
Randomly came across this reddit post about a new document processing leaderboard.

So far, structured data extraction from documents is the killer app for VLMs but public benchmarks and leaderboards have been non-existent. Excited to see that changing.

www.reddit.com/r/MachineLea...
[P] Introducing the Intelligent Document Processing (IDP) Leaderboard – A Unified Benchmark for OCR, KIE, VQA, Table Extraction, and More
www.reddit.com
May 22, 2025 at 12:31 PM
DSPy has a lot going for it but obfuscating how prompts are constructed creates problems. Beware the footguns!

www.crosstab.io/articles/dsp...
A DSPy footgun – Brian Patrick Kent
Variable names in DSPy signatures must have semantic meaning for your LLM.
www.crosstab.io
May 21, 2025 at 7:34 PM
Reposted by Brian Kent
The now viral, incorrect meme that LLMs are just next token predictors is causing so much confusion
May 21, 2025 at 12:12 AM
I extended @ramikrispin.bsky.social's excellent work to use Claude Sonnet 3.7 to translate natural language data queries into runnable SQL.

Along the way, I showed that Claude can do this even with English questions against a non-English dataset.

www.crosstab.io/articles/llm...
May 19, 2025 at 1:10 PM
Reposted by Brian Kent
The “Paper Skygest” is a total validation of the bluesky thesis. Anyone can build a useful, tunable feed. It’s a bit sparse right now but it’ll be amazing once it takes off fully.
May 7, 2025 at 3:32 PM
What exactly passes for a foundation model these days?
May 6, 2025 at 2:28 PM
Reposted by Brian Kent
The brilliant Cosma Shalizi writing about LLMs is always worth reading:

www.programmablemutter.com/p/on-feral-l...
On Feral Library Card Catalogs, or, Aware of All Internet Traditions
A Guest Post by Cosma Shalizi
www.programmablemutter.com
April 17, 2025 at 4:43 PM
Reposted by Brian Kent
if you're a PhD student or postdoc working at the interface of personality psychology and CS/ML (construed broadly on both sides), and are interested in doing a full-time, remote, 3 - 6 month internship/residency at MidJourney, please DM me some kind of resume or CV-like thing
April 15, 2025 at 7:04 PM
Reposted by Brian Kent
Highly recommended

A video of Pre-Training GPT-4.5 by OpenAI (46 minutes)

www.youtube.com/watch?v=6nJZ...
Pre-Training GPT-4.5
YouTube video by OpenAI
www.youtube.com
April 11, 2025 at 6:06 PM
Reposted by Brian Kent
It turns out to be hard to evaluate natural language with natural language. What should we take away from the conundrum of LLM evaluation? www.argmin.net/p/evaluation...
Evaluation or Valuation
The infinite regress of evaluating large language models
www.argmin.net
April 10, 2025 at 3:03 PM
I don't get it, why does Meta prohibit people in the EU from using Llama 4 models?

www.llama.com/llama4/use-p...
Llama 4 Acceptable Use Policy
Llama 4 Acceptable Use Policy
www.llama.com
April 9, 2025 at 11:36 AM
Kicking the blog back into gear...

www.crosstab.io/articles/202...
Things I read while the algae grew in my fur - April 2025 – Brian Patrick Kent
Interesting things I’ve read over the past month.
www.crosstab.io
April 8, 2025 at 3:54 PM
Reposted by Brian Kent
Meta just dropped Llama 4 on a weekend! Two new open weight models (Scout and Maverick) and a preview of a model called Behemoth - Scout has a 10 million token context

Best information right now appears to be this blog post: https://ai.meta.com/blog/llama-4-multimodal-intelligence/
The Llama 4 herd: The beginning of a new era of natively multimodal AI innovation
We’re introducing Llama 4 Scout and Llama 4 Maverick, the first open-weight natively multimodal models with unprecedented context support and our first...
ai.meta.com
April 5, 2025 at 7:54 PM
Do we have a term yet for the *types* of tokens that come out of modern tokenizers?

For example, if we use word-level tokenization on the phrase "rose is a rose is a rose is a rose" we have 10 tokens but only 3 ____.

What's the term in the blank?

Example from plato.stanford.edu/entries/type...
Types and Tokens (Stanford Encyclopedia of Philosophy)
plato.stanford.edu
April 5, 2025 at 10:35 AM
I know there are more important things going on in the world, but I really like VS Code's Interactive Python mode. I feel like I get most of the benefits of a Jupyter notebook but still within a plain old .py script.
April 4, 2025 at 8:47 AM
Reposted by Brian Kent
Here's the table of contents for my lengthy new piece on how I use LLMs to help me write code https://simonwillison.net/2025/Mar/11/using-llms-for-code/
March 11, 2025 at 2:15 PM
I hate to feed the LLM hype, but here's a fun use case I worked through last weekend: Claude 3.7 Sonnet as (beginning) piano teacher.

www.crosstab.io/articles/llm...
LLM as piano teacher – Brian Patrick Kent
Here’s an AI use case I haven’t seen come up elsewhere: LLM as (beginner) piano teacher.
www.crosstab.io
March 13, 2025 at 9:17 AM