Lightnews — Scholar-powered news

leventov.bsky.social

@leventov.bsky.social

The reason for my skepticism is that I'm not sure xAI would give away Grok 3 and push it on x.com so aggressively if it cost arm and leg to run, as gpt-4.5 pricing indicates

X. It’s what’s happening

From breaking news and entertainment to sports and politics, get the full story with all the live commentary.

x.com

March 1, 2025 at 5:32 PM

leventov.bsky.social

@leventov.bsky.social

I know there are no official info, of course. I'm following these rumors pretty closely, too. The compute flops they have had could have been achieved on ~2T model, no? I think Elon said that they used a ton of synthetic generated data too, and many rollouts to find good solutions for RL, too

March 1, 2025 at 5:26 PM

leventov.bsky.social

@leventov.bsky.social

Source that Grok 3 is 10T? I'm very skeptical of that. Maybe they scaled training data substantially but parameters not *that* much.

March 1, 2025 at 5:42 AM

leventov.bsky.social

@leventov.bsky.social

Perhaps this laziness is an intentional nudge towards using reasoning models (which are not yet available, though - I mean reasoners based on 4.5)

February 27, 2025 at 8:54 PM

leventov.bsky.social

@leventov.bsky.social

We need an uncertainty knob similar to temperature

February 25, 2025 at 2:30 PM

leventov.bsky.social

@leventov.bsky.social

They promised to open source prev gen after releasing the next. So we will know

February 21, 2025 at 3:54 AM

leventov.bsky.social

@leventov.bsky.social

And then if anything goes wrong or unexpected during the "wet" phase a clueless wannabe would like to pull VLM with camera and ask the model for "debug instructions". You can't do it "on Google".

February 6, 2025 at 8:04 PM

leventov.bsky.social

@leventov.bsky.social

You cannot get precise and detailed instructions for making a bomb or poison or other very dangerous stuff from things you can buy legitimately in just "a few clicks on Google". As a very minimum it's days of research, including how to gaslight vendors, how to prepare things, etc.

February 6, 2025 at 8:02 PM

leventov.bsky.social

@leventov.bsky.social

Bad take. Censorship of "recipes for ruin" is good. A blanket deontological rule "censorship is bad" doesn't work.

February 6, 2025 at 7:41 PM

Reposted

leventov.bsky.social

@leventov.bsky.social

I guess the equivalent in AI agent(cy) engineering, the equivalent transition will be towards the method design and decomposition: dialogue? multi-role debate? argument tree? the data model the model is operating on top? Reward design for RL post-training/fine-tuning?

February 5, 2025 at 1:07 AM

Reposted

leventov.bsky.social

@leventov.bsky.social

(take not mine) the current AI/agent(cy) engineering is much like pre-DL computer vision, when people tried to massage the problem around a few fairly rigid algos like SIFT. There also was RAG: scikit-image.org/docs/stable/... and also didn't work very well

Region Adjacency Graphs (RAGs) — skimage 0.25.1 documentation

scikit-image.org

February 5, 2025 at 12:59 AM

leventov.bsky.social

@leventov.bsky.social

I guess the equivalent in AI agent(cy) engineering, the equivalent transition will be towards the method design and decomposition: dialogue? multi-role debate? argument tree? the data model the model is operating on top? Reward design for RL post-training/fine-tuning?

February 5, 2025 at 1:07 AM

leventov.bsky.social

@leventov.bsky.social

With DL and end-to-end training in CV, loss design became a more important skill than heuristic bricolage

February 5, 2025 at 1:01 AM

leventov.bsky.social

@leventov.bsky.social

(take not mine) the current AI/agent(cy) engineering is much like pre-DL computer vision, when people tried to massage the problem around a few fairly rigid algos like SIFT. There also was RAG: scikit-image.org/docs/stable/... and also didn't work very well

Region Adjacency Graphs (RAGs) — skimage 0.25.1 documentation

scikit-image.org

February 5, 2025 at 12:59 AM

leventov.bsky.social

@leventov.bsky.social

Wrong, this is still a liberal hysteria

February 3, 2025 at 8:49 PM

leventov.bsky.social

@leventov.bsky.social

I have yet to regret shooting a request to undermind. It always finds something interesting. My requests are always of the form "who have done research in roughly this shape" (where I'm sure that someone did, but hard to find via Scholar)

February 3, 2025 at 1:31 PM

leventov.bsky.social

@leventov.bsky.social

FWIW in my impression, none of the services in this category (Perplexity, You.com, etc.) live up to the "deep" label except undermind.ai so far. Didn't try PaperQA though.

You.com | AI for workplace productivity

Artificial intelligence designed for collaboration - with AI Agents that can research, solve problems, and create content for you and your team.

You.com

February 3, 2025 at 6:38 AM

leventov.bsky.social

@leventov.bsky.social

Google's deep research is a total flop. I paid for a subscription to try it. Tried on maybe 10 requests, very broad range, from technical to culture to philosophical. It spit out bald, often outright wrong slop every. single. time. Idk why you keep praising it

February 3, 2025 at 6:35 AM

leventov.bsky.social

@leventov.bsky.social

Zero information. Consistently candid Sama will say whatever the specific audience likes. I'm sure in different rooms he says the opposite of this

February 1, 2025 at 4:10 PM

leventov.bsky.social

@leventov.bsky.social

Claude is very hit-or-miss for perplexity-like questions. Same for everything else: ChatGPT, Gemini, exa.ai, You.com.

Meta search with all of them may be helpful, even if not fully automatable yet: if LLMs knew good answers to these searches they would not be so hit-or-miss to begin with.

Exa

The Exa API retrieves the best, realtime data from the web to complement your AI

exa.ai

January 20, 2025 at 6:45 PM

leventov.bsky.social

@leventov.bsky.social

Maybe no hard distinction. It's a continuum between Sonnet and "reasoning" models.

January 8, 2025 at 5:32 AM

leventov.bsky.social

@leventov.bsky.social

Does the book argue for expected utility/value based decision-making? "Radical Uncertainty" by @profjohnkay.bsky.social directly argues against that

profjohnkay.bsky.social

December 26, 2024 at 10:03 AM

leventov.bsky.social

@leventov.bsky.social

I think o1/o3 should be better at this (I don't use them at the moment), but breaking the flow and waiting would be weird. o1-capable coder with access to the context that constantly does some analysis in background and makes insightful suggestions for me from time to time would be best

December 24, 2024 at 5:44 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news