Dave
davedashftw.bsky.social
Dave
@davedashftw.bsky.social
Senior tech exec based out of Asia.
They’re dumb fucks. They can’t even figure out the impact of the FBX on shipping prices and how it impacts the US, with the entire intelligence apparatus behind them.

They just decided to wing this shit with no solid data on the impact.

They’re completely ignorant to their allies capabilities too.
March 26, 2025 at 4:28 PM
They don’t want to make their boss look bad. Simple as that.

But more broadly if you’re expecting the military to save the US from this mess then keep looking. Americans elected Trump. Unless he does something so brazen like attack blue states for no reason, the military has his back.
March 26, 2025 at 4:02 PM
Have you considered that Jeff Goldstein is Jewish therefore controls the media, and used his space lasers to get into the chat?

(/s for those that need it).
March 26, 2025 at 1:32 PM
Good journalism matters.

If this was the NYT they would have swept it under the covers in fear of making the orange man mad.
March 26, 2025 at 1:19 PM
They’re not that smart, or they wouldn’t have added the journalist.

Trump can just give Russia what they want anyway and only 1/3rd of Americans will think this is bad.
March 26, 2025 at 11:07 AM
Likely vice president IMO.

Putin doesn’t use technology let alone chat platforms.
March 26, 2025 at 9:21 AM
I am on the side of weak emergence.
March 26, 2025 at 5:01 AM
They do not work without emergence.

We do not really know what is happening in large scale neural networks.

This is an area of active research and debate. There are those that refuse to believe that there’s any weak emergence going on, and that it’s more strong emergence and predictable.
March 26, 2025 at 5:01 AM
One final last point. Reductionism. “They just guess words”, “they’re just math”.

Well your brains are just electrons and atoms, yet consciousness arises via weak emergence. If we want to go further everything is just quantum fields.

Emergent qualities are a fact of neural networks.
March 26, 2025 at 5:01 AM
Depends on the model, context window, retrieval method, etc.

Older models tend to “forget” tokens in the middle on say a paper and over emphasis tokens at the beginning or end.

Newer models don’t do this.

Also newer models (reasoning) use Chain of Thought and other techniques to improve accuracy
March 26, 2025 at 4:41 AM
If they were just “search engines with bias” they could not perform any counterfactual analysis at all.

And newer architectures will perform better now as well, GPT4.5 in our testing is good at linking new ideas vs just parroting information.
March 26, 2025 at 4:38 AM
And also just to provide another point of view, MIT researchers on reasoning vs reciting: arxiv.org/abs/2307.02477

This may appear to support the view that LLMs just parrot information, but the counter factual results for GPT4 in this paper are not actually bad, and again RLHF biases the models.
Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks
The impressive performance of recent language models across a wide range of tasks suggests that they possess a degree of abstract reasoning skills. Are these skills general and transferable, or specia...
arxiv.org
March 26, 2025 at 4:38 AM
Tests by MSR on pre-RLHF GPT4, very famous paper:

arxiv.org/abs/2303.12712

Pre-RLHF is importent because RLHF dumbs down models (imo) and forces them to output a certain way. Which is why we saw good performance in DeepSeek as it had very little.
Same with early versions of Sydney (Bing Chat).
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Artificial intelligence (AI) researchers have been developing and refining large language models (LLMs) that exhibit remarkable capabilities across a variety of domains and tasks, challenging our unde...
arxiv.org
March 26, 2025 at 4:38 AM
I don’t have time to argue with a bunch of AI skeptics on the internet, but those who are actually curious and want to learn, some papers to read:

SLM trained on children’s books with emergent qualities.
arxiv.org/pdf/2305.07759
arxiv.org
March 26, 2025 at 4:38 AM
Oh also just to add a couple of years back a researcher trained a SLM on nothing but children’s books, and got significantly better cognitive abilities that what was in the training dataset. He also published his paper.
March 26, 2025 at 12:16 AM
LLMs don’t do as well as humans on some of this stuff, and part of that is because of the RLHF above limits the model output . But new capabilities are being developed as we speak on this.

There are papers out there on this but I’m not going to out myself, so you can find them yourself.
March 26, 2025 at 12:05 AM
The answer is no knows why. It was why GPT3 was such a big deal in the first place. We ran experiments on poetry that was extremely unlikely to ever be in the common crawl dataset, we made up hypothetical theory of mind tasks, very specific coding tasks, and also tested it for counterfactual tasks.
March 26, 2025 at 12:05 AM
When we started testing GPT3 sized models we started to get emergent qualities from those outputs. In other words, outputs that were novel. And the bigger the model parameter space, the more emergent capabilities we get.
March 26, 2025 at 12:05 AM
So note they’re not “looking up” a database or index like in a search engine. They’re figuring out what you’re likely talking about, then running that through a multi-billion system of “traffic lights”. This can in theory lead to any real combination of outputs based on its known tokens.
March 26, 2025 at 12:05 AM
In a way this RLHF limits the output of the model, since I’ve actually worked on large scale LLMs (before most people knew what they are), I’ve also seen their capabilities pre RLHF.
March 26, 2025 at 12:05 AM
The neural network then predicts the token based on literally billions of parameters (weights and biases) which are set during the training process BUT ALSO during a process called RLHF (reinforcing learning by human feedback), which adds additional bias to the output tokens based off human training
March 26, 2025 at 12:05 AM
A LLM is made of an attention block and the other is a neural network. The attention block works out the product of the tokens, and then feeds into the neural network. It’s working out what’s important, so it focuses on processing the important tokens fast. This was key in scaling.
March 26, 2025 at 12:05 AM
The amount of wrongness in this thread reminds me of vaccine skeptics. As for my credentials, I’m a former AI engineering leader who worked at at one of the companies that gave birth to the current gen (transformer architecture) of LLMs. I’ve been working on them since about 2019.
March 25, 2025 at 11:58 PM