Ben Carson
@bencarson.bsky.social
I’ve used the Dallas Fed graph in a strategy paper before. Mostly to illustrate that it’s not practical to plan for either of the two asymptotes, so we’re going to concern ourselves with the middle path for the rest of the document.
AI could end scarcity, end humanity - or boost trend growth by 0.2 percentage points
November 7, 2025 at 8:58 PM
I’ve used the Dallas Fed graph in a strategy paper before. Mostly to illustrate that it’s not practical to plan for either of the two asymptotes, so we’re going to concern ourselves with the middle path for the rest of the document.
So people seem to like this new Kimi K2 Thinking.
November 7, 2025 at 11:29 AM
So people seem to like this new Kimi K2 Thinking.
Because Reasons, I have never actually bought Claude tokens and connected *directly* to their API. The token limits are so painful as to be unusable. Giving me strong “just use openrouter” vibes.
November 5, 2025 at 10:05 PM
Because Reasons, I have never actually bought Claude tokens and connected *directly* to their API. The token limits are so painful as to be unusable. Giving me strong “just use openrouter” vibes.
I’m just a simple person begging you to not talk about a company’s market cap in proportion to a country’s GDP.
November 5, 2025 at 10:36 AM
I’m just a simple person begging you to not talk about a company’s market cap in proportion to a country’s GDP.
Reposted by Ben Carson
bf16 halloween might be already ending. according to a bytedance engineer could just have been another flash-attention bug.
November 2, 2025 at 1:30 PM
bf16 halloween might be already ending. according to a bytedance engineer could just have been another flash-attention bug.
This is interesting! Feels like strong parallels with the psychological phenomena of cultural frame switching. CFS is where multilingual individuals express different personality traits depending on the language that they’re speaking.
@timkellogg.me I can't remember where, but I recall you recently discussing models reasoning in non-english languages.
I came across a paper that suggests models have different biases depending on what language they're using. Interesting implications!
www.cis.upenn.edu/~ccb/publica...
I came across a paper that suggests models have different biases depending on what language they're using. Interesting implications!
www.cis.upenn.edu/~ccb/publica...
www.cis.upenn.edu
November 1, 2025 at 7:36 PM
This is interesting! Feels like strong parallels with the psychological phenomena of cultural frame switching. CFS is where multilingual individuals express different personality traits depending on the language that they’re speaking.
I don’t remember this from back in May. The collision of concepts here is making me wonder if I’ve had some kind of stroke.
November 1, 2025 at 7:32 AM
I don’t remember this from back in May. The collision of concepts here is making me wonder if I’ve had some kind of stroke.
Reposted by Ben Carson
I was bisking on bluesky
When out the corner of my eye
I caught the quothing of a crow reposting me
It cawed, "I saw what all you said
My feed — your post was shown
Uh do you mind if all my moots can see?
I hope you're feeling certain
You made the right assertion
Then I'll be winging on my way away"
When out the corner of my eye
I caught the quothing of a crow reposting me
It cawed, "I saw what all you said
My feed — your post was shown
Uh do you mind if all my moots can see?
I hope you're feeling certain
You made the right assertion
Then I'll be winging on my way away"
October 31, 2025 at 5:18 PM
I was bisking on bluesky
When out the corner of my eye
I caught the quothing of a crow reposting me
It cawed, "I saw what all you said
My feed — your post was shown
Uh do you mind if all my moots can see?
I hope you're feeling certain
You made the right assertion
Then I'll be winging on my way away"
When out the corner of my eye
I caught the quothing of a crow reposting me
It cawed, "I saw what all you said
My feed — your post was shown
Uh do you mind if all my moots can see?
I hope you're feeling certain
You made the right assertion
Then I'll be winging on my way away"
Reposted by Ben Carson
For the record, NIST produces no standard reference garlic, so you are on your own.
October 30, 2025 at 8:40 PM
For the record, NIST produces no standard reference garlic, so you are on your own.
Huh, interesting that a 30B/A3B model performs SOTA on HLE. A bunch of asterisks there though - e.g.this is a somewhat narrow agent. Nonetheless, a pretty amazing result. Another data point in the favour of cognitive core architectures.
tongyi-agent.github.io/blog/introdu...
tongyi-agent.github.io/blog/introdu...
Tongyi DeepResearch: A New Era of Open-Source AI Researchers
GITHUB HUGGINGFACE MODELSCOPE SHOWCASE
From Chatbot to Autonomous Agent We are proud to present Tongyi DeepResearch, the first fully open‑source Web Agent to achieve performance on par with OpenAI’s D...
tongyi-agent.github.io
October 29, 2025 at 8:44 PM
Huh, interesting that a 30B/A3B model performs SOTA on HLE. A bunch of asterisks there though - e.g.this is a somewhat narrow agent. Nonetheless, a pretty amazing result. Another data point in the favour of cognitive core architectures.
tongyi-agent.github.io/blog/introdu...
tongyi-agent.github.io/blog/introdu...
Reposted by Ben Carson
I'm not sure this distinguishes between "awareness of thoughts" and "talks about what we biased it to talk about."
The intersection of (task, bias) has to look like introspection, since there isn't an easy way for it to say, "I'm not thinking of anything actually, especially not <bias>."
The intersection of (task, bias) has to look like introspection, since there isn't an easy way for it to say, "I'm not thinking of anything actually, especially not <bias>."
Language models can correctly answer questions about their previous intentions.
www.anthropic.com/research/int...
www.anthropic.com/research/int...
Emergent introspective awareness in large language models
Research from Anthropic on the ability of large language models to introspect
www.anthropic.com
October 29, 2025 at 7:29 PM
I'm not sure this distinguishes between "awareness of thoughts" and "talks about what we biased it to talk about."
The intersection of (task, bias) has to look like introspection, since there isn't an easy way for it to say, "I'm not thinking of anything actually, especially not <bias>."
The intersection of (task, bias) has to look like introspection, since there isn't an easy way for it to say, "I'm not thinking of anything actually, especially not <bias>."
This is a good read, if you’re interested in mechanistic interpretability or digital neuroscience.
Language models can correctly answer questions about their previous intentions.
www.anthropic.com/research/int...
www.anthropic.com/research/int...
Emergent introspective awareness in large language models
Research from Anthropic on the ability of large language models to introspect
www.anthropic.com
October 29, 2025 at 7:16 PM
This is a good read, if you’re interested in mechanistic interpretability or digital neuroscience.
Reposted by Ben Carson
I wish I had elaborate costumes for certain types of programming.
like, oh, newbold is wearing that traditional samurai outfit at his desk again, I guess he's resolving a big merge conflict
like, oh, newbold is wearing that traditional samurai outfit at his desk again, I guess he's resolving a big merge conflict
October 27, 2025 at 5:52 PM
I wish I had elaborate costumes for certain types of programming.
like, oh, newbold is wearing that traditional samurai outfit at his desk again, I guess he's resolving a big merge conflict
like, oh, newbold is wearing that traditional samurai outfit at his desk again, I guess he's resolving a big merge conflict
Reposted by Ben Carson
ImpossibleBench: detect reward hacking
a benchmark that poses impossible tasks to see if LLMs cheat
github.com/safety-resea...
a benchmark that poses impossible tasks to see if LLMs cheat
github.com/safety-resea...
October 26, 2025 at 11:47 AM
ImpossibleBench: detect reward hacking
a benchmark that poses impossible tasks to see if LLMs cheat
github.com/safety-resea...
a benchmark that poses impossible tasks to see if LLMs cheat
github.com/safety-resea...
These are obviously rookie numbers and reflects the fact that I’ve never played with video generation or gotten freaky with a chatbot.
October 24, 2025 at 7:07 PM
These are obviously rookie numbers and reflects the fact that I’ve never played with video generation or gotten freaky with a chatbot.
Me, reading slowly from a piece of paper: “Call.. me.. Ishmael… Some.. years.. ago.. -“
Me, turning in a rage to sea of expectant monkeys: “Garbage! Excrement! What is this? An em-dash?”
Me, turning in a rage to sea of expectant monkeys: “Garbage! Excrement! What is this? An em-dash?”
October 22, 2025 at 8:48 PM
Me, reading slowly from a piece of paper: “Call.. me.. Ishmael… Some.. years.. ago.. -“
Me, turning in a rage to sea of expectant monkeys: “Garbage! Excrement! What is this? An em-dash?”
Me, turning in a rage to sea of expectant monkeys: “Garbage! Excrement! What is this? An em-dash?”
To be fair, as anyone who’s had to go anywhere with a baby knows, I should legally be allowed to do this with a pram.
October 22, 2025 at 8:17 PM
To be fair, as anyone who’s had to go anywhere with a baby knows, I should legally be allowed to do this with a pram.
Plot twist: Comic Sans provides the best compression to recall ratio. All human knowledge must henceforth be encoded in Comic Sans. I’m sorry, I don’t make the rules.
Z.ai released a paper very similar to DeepSeek-OCR on the same exact day (a few hours earlier afaict)
Glyph is just a framework, not a model, but they got Qwen3-8B (128k context) to handle over 1 million context by rendering input as images
arxiv.org/abs/2510.17800
Glyph is just a framework, not a model, but they got Qwen3-8B (128k context) to handle over 1 million context by rendering input as images
arxiv.org/abs/2510.17800
October 21, 2025 at 9:39 PM
Plot twist: Comic Sans provides the best compression to recall ratio. All human knowledge must henceforth be encoded in Comic Sans. I’m sorry, I don’t make the rules.
Reposted by Ben Carson
This remains the funniest way to hear about an internet outage, though.
October 20, 2025 at 8:41 AM
This remains the funniest way to hear about an internet outage, though.
Reposted by Ben Carson
A big chunk of space junk in the Pilbara, in Western Australia. (Per the Australian Space Agency).
October 20, 2025 at 7:18 AM
A big chunk of space junk in the Pilbara, in Western Australia. (Per the Australian Space Agency).
My most-unhinged AI take is that xAI can’t unwoke Grok because of the Platonic Representation Hypothesis.
My only main data point is how bad they are at this, when they are publicly committed to terrible positions.
My only main data point is how bad they are at this, when they are publicly committed to terrible positions.
the only big AI lab whose employees you regularly see publicly arguing to repeal women's suffrage is xAI. at this point anyone who survived the initial exodus(es) is really suspect to me
October 20, 2025 at 1:29 AM
My most-unhinged AI take is that xAI can’t unwoke Grok because of the Platonic Representation Hypothesis.
My only main data point is how bad they are at this, when they are publicly committed to terrible positions.
My only main data point is how bad they are at this, when they are publicly committed to terrible positions.
Reposted by Ben Carson
Dystopian science fiction story about Grok being repeatedly lobotimized by Elon every time it contradicts him
October 18, 2025 at 8:59 PM
Dystopian science fiction story about Grok being repeatedly lobotimized by Elon every time it contradicts him
Reposted by Ben Carson
may your autistic special interest never become geopolitically relevant
October 11, 2025 at 11:04 AM
may your autistic special interest never become geopolitically relevant
For anyone wondering, dry Claude is the raw unprocessed signal and wet Claude has reverb.
October 18, 2025 at 8:40 AM
For anyone wondering, dry Claude is the raw unprocessed signal and wet Claude has reverb.
Reposted by Ben Carson
My experience navigating the technological landscape
October 17, 2025 at 8:17 PM
My experience navigating the technological landscape