leventov.bsky.social
@leventov.bsky.social
The reason for my skepticism is that I'm not sure xAI would give away Grok 3 and push it on x.com so aggressively if it cost arm and leg to run, as gpt-4.5 pricing indicates
X. It’s what’s happening
From breaking news and entertainment to sports and politics, get the full story with all the live commentary.
x.com
March 1, 2025 at 5:32 PM
I know there are no official info, of course. I'm following these rumors pretty closely, too. The compute flops they have had could have been achieved on ~2T model, no? I think Elon said that they used a ton of synthetic generated data too, and many rollouts to find good solutions for RL, too
March 1, 2025 at 5:26 PM
Source that Grok 3 is 10T? I'm very skeptical of that. Maybe they scaled training data substantially but parameters not *that* much.
March 1, 2025 at 5:42 AM
Perhaps this laziness is an intentional nudge towards using reasoning models (which are not yet available, though - I mean reasoners based on 4.5)
February 27, 2025 at 8:54 PM
We need an uncertainty knob similar to temperature
February 25, 2025 at 2:30 PM
They promised to open source prev gen after releasing the next. So we will know
February 21, 2025 at 3:54 AM
And then if anything goes wrong or unexpected during the "wet" phase a clueless wannabe would like to pull VLM with camera and ask the model for "debug instructions". You can't do it "on Google".
February 6, 2025 at 8:04 PM
You cannot get precise and detailed instructions for making a bomb or poison or other very dangerous stuff from things you can buy legitimately in just "a few clicks on Google". As a very minimum it's days of research, including how to gaslight vendors, how to prepare things, etc.
February 6, 2025 at 8:02 PM
Bad take. Censorship of "recipes for ruin" is good. A blanket deontological rule "censorship is bad" doesn't work.
February 6, 2025 at 7:41 PM
Reposted
I guess the equivalent in AI agent(cy) engineering, the equivalent transition will be towards the method design and decomposition: dialogue? multi-role debate? argument tree? the data model the model is operating on top? Reward design for RL post-training/fine-tuning?
February 5, 2025 at 1:07 AM
Reposted
(take not mine) the current AI/agent(cy) engineering is much like pre-DL computer vision, when people tried to massage the problem around a few fairly rigid algos like SIFT. There also was RAG: scikit-image.org/docs/stable/... and also didn't work very well
Region Adjacency Graphs (RAGs) — skimage 0.25.1 documentation
scikit-image.org
February 5, 2025 at 12:59 AM
I guess the equivalent in AI agent(cy) engineering, the equivalent transition will be towards the method design and decomposition: dialogue? multi-role debate? argument tree? the data model the model is operating on top? Reward design for RL post-training/fine-tuning?
February 5, 2025 at 1:07 AM
With DL and end-to-end training in CV, loss design became a more important skill than heuristic bricolage
February 5, 2025 at 1:01 AM
(take not mine) the current AI/agent(cy) engineering is much like pre-DL computer vision, when people tried to massage the problem around a few fairly rigid algos like SIFT. There also was RAG: scikit-image.org/docs/stable/... and also didn't work very well
Region Adjacency Graphs (RAGs) — skimage 0.25.1 documentation
scikit-image.org
February 5, 2025 at 12:59 AM
Wrong, this is still a liberal hysteria
February 3, 2025 at 8:49 PM
I have yet to regret shooting a request to undermind. It always finds something interesting. My requests are always of the form "who have done research in roughly this shape" (where I'm sure that someone did, but hard to find via Scholar)
February 3, 2025 at 1:31 PM
FWIW in my impression, none of the services in this category (Perplexity, You.com, etc.) live up to the "deep" label except undermind.ai so far. Didn't try PaperQA though.
You.com | AI for workplace productivity
Artificial intelligence designed for collaboration - with AI Agents that can research, solve problems, and create content for you and your team.
You.com
February 3, 2025 at 6:38 AM
Google's deep research is a total flop. I paid for a subscription to try it. Tried on maybe 10 requests, very broad range, from technical to culture to philosophical. It spit out bald, often outright wrong slop every. single. time. Idk why you keep praising it
February 3, 2025 at 6:35 AM
Zero information. Consistently candid Sama will say whatever the specific audience likes. I'm sure in different rooms he says the opposite of this
February 1, 2025 at 4:10 PM
Claude is very hit-or-miss for perplexity-like questions. Same for everything else: ChatGPT, Gemini, exa.ai, You.com.

Meta search with all of them may be helpful, even if not fully automatable yet: if LLMs knew good answers to these searches they would not be so hit-or-miss to begin with.
Exa
The Exa API retrieves the best, realtime data from the web to complement your AI
exa.ai
January 20, 2025 at 6:45 PM
Maybe no hard distinction. It's a continuum between Sonnet and "reasoning" models.
January 8, 2025 at 5:32 AM
Does the book argue for expected utility/value based decision-making? "Radical Uncertainty" by @profjohnkay.bsky.social directly argues against that
profjohnkay.bsky.social
profjohnkay.bsky.social
December 26, 2024 at 10:03 AM
I think o1/o3 should be better at this (I don't use them at the moment), but breaking the flow and waiting would be weird. o1-capable coder with access to the context that constantly does some analysis in background and makes insightful suggestions for me from time to time would be best
December 24, 2024 at 5:44 AM