Lightnews — Scholar-powered news

Deedy

@deedydas.bsky.social

Baidu, the Google of China, just dropped two models today:
— ERNIE 4.5: beats GPT 4.5 for 1% of price
— Reasoning model X1: beats DeepSeek R1 for 50% of price.

China continues to build intelligence too cheap to meter. The AI price war is on.

March 16, 2025 at 5:17 PM

Deedy

@deedydas.bsky.social

"Make it look like I was on a luxury five star hotel vacation"

Google Gemini really cooked with this one.

This is next gen photo editing.

March 14, 2025 at 2:24 AM

Deedy

@deedydas.bsky.social

WOW the new Google Flash model is the first time ever that you can do targetted edits of pictures with English.

"Make the steak vegetarian"
"Make the bridge go away"
"Make the keyboard more colorful"

And my favorite
"Give the OpenAI logo more personality"

March 13, 2025 at 6:06 AM

Deedy

@deedydas.bsky.social

AI is now making cutting edge science better.

The Nature published that reasoning LLMs found errors in 1% of the 10,000 research papers it analyzed with 35% false positive rate for $0.15-1/paper.

Anthropic founder’s view of “a country of geniuses in a data center” is happening.

March 9, 2025 at 3:40 AM

Deedy

@deedydas.bsky.social

HUGE New research paper shows how a 7B param AI model (90%) can beat OpenAI o1 (80%) on the MIT Integration Bee.

LADDER:
— Generate variants of problem
— Solve, verify, use GRPO (DeepSeek) to learn
TTRL:
— Do 1&2 when you see a new problem

New form of test time compute scaling!

March 7, 2025 at 5:02 PM

Deedy

@deedydas.bsky.social

How well can computers sort 1 trillion numbers?

SortBenchmark, in distributed systems, measures this.
— How fast? 134s
— How cheap? $97
— How many in 1 minute? 370B numbers
— How much energy? ~59kJ or walking for 15mins

Every software engineer should know this.

March 3, 2025 at 3:30 AM

Deedy

@deedydas.bsky.social

BREAKING DeepSeek just let the world know they make $200M/yr at 500%+ profit margin.

Revenue (/day): $562k
Cost (/day): $87k
Revenue (/yr): ~$205M

This is all while charging $2.19/M tokens on R1, ~25x less than OpenAI o1.

If this was in the US, this would be a >$10B company.

March 1, 2025 at 5:07 AM

Deedy

@deedydas.bsky.social

Claude's new Github "talk to your code" integration changes how engineers understand software.

Fork a repo.
Select a folder.
Ask it anything.
It even shows you what %age of the context window each folder takes.

Here it visualizes yt-dlp's (Youtube downloader) flow:

February 26, 2025 at 2:34 AM

Deedy

@deedydas.bsky.social

I asked all 3 Deep Researches to "compare 25 LLMs in a table on 20 axes" to figure out which one was the best.

The winner was OpenAI.

It had the most detailed, high-quality and accurate answer, but you do pay $200/mo for it.

February 15, 2025 at 2:46 AM

Deedy

@deedydas.bsky.social

"The Mundanity of Excellence" [1989] is a timeless essay everyone ought to read in todays day and age.

Excellence is boring. It's making the same boring "correct" choice over and over again. You win by being consistent for longer.

Our short attention spans tend to forget that.

February 14, 2025 at 2:47 AM

Deedy

@deedydas.bsky.social

HUGE: OpenAI o3 scores 394 of 600 in the International Olympiad of Informatics (IOI) 2024, earning a Gold medal and 18 in the world.

The model was NOT contaminated with this data and the 50 submission limit was used.

We will likely see superhuman coding models this year.

February 12, 2025 at 4:30 PM

Deedy

@deedydas.bsky.social

Everyone should be using this website to understand the inside of an LLM.

I'm surprised more people don't know about it. Benjamin Bycroft made this beautiful interactive visualization to show exactly how the inner workings of each of the weights of an LLM work.

Here's a link:

February 12, 2025 at 3:06 AM

Deedy

@deedydas.bsky.social

New research shows that LLMs don't perform well on long context.

Perfect needle-in-the-haystack scores are easy—attention mechanisms can match the word. When you require 1-hop of reasoning, performance degrades quickly.

This is why guaranteeing correctness for agents is hard.

February 10, 2025 at 4:53 PM

Deedy

@deedydas.bsky.social

Internal OpenAI models have improved to ~50 in the world or 3045 on Codeforces and will hit #1 by the end of the year, said Sam Altman yesterday in Japan, up from o3's 2727 (#175)!

This is a monumental result en route to AGI.

February 9, 2025 at 2:34 AM

Deedy

@deedydas.bsky.social

PDF parsing is pretty much solved at scale now.

Gemini 2 Flash's $0.40/M tokens and 1M token context means you can now parse 6000 long PDFs at near perfect quality for $1

February 6, 2025 at 5:37 PM

Deedy

@deedydas.bsky.social

Who has better Deep Research, Google or OpenAI?

Deep research generates ~10 page reports in ~15mins by scouring 100s of websites. This could replace a lot of human work. I tried both so you don't have to.

The verdict: OpenAI is faster and better quality despite being more $$

February 6, 2025 at 2:44 AM

Deedy

@deedydas.bsky.social

Gemini's just launched their new Flash models. They are cheaper, better and have 8x the context of GPT 4o-mini!

Per million input (cached), input and output tokens:
Gemini 2 Flash Lite: $0.01875, $0.075, $0.30
Gemini 2 Flash: $0.025, $0.1, $0.40
GPT 4o-mini: $0.075, $0.15, $0.60

February 5, 2025 at 4:39 PM

Deedy

@deedydas.bsky.social

You can upload a screenshot of HackerNews to Lovable and get a backend, frontend and SQL for a HackerNews clone.

February 5, 2025 at 3:17 AM

Deedy

@deedydas.bsky.social

OpenAI Deep Research for complex structured tables is an incredible way to learn.

"write a table of the 25 most important chip architectures over time, and come up with 12 columns to compare them on"

February 4, 2025 at 4:33 PM

Deedy

@deedydas.bsky.social

Social Media Age Distribution (US)

February 3, 2025 at 5:00 PM

Deedy

@deedydas.bsky.social

Social Media Platforms by Gender Ratio (

February 3, 2025 at 2:45 AM

Deedy

@deedydas.bsky.social

If you feel behind on learning about Large Language Models, try these:

– Readers: free 200+ page book covering pre-training, generative models, prompting and alignment

– Programmers: Karpathy’s neural networks zero to hero playlist including implementing GPT-2 from scratch

February 2, 2025 at 5:08 PM

Deedy

@deedydas.bsky.social

Foundational models don’t work as well in non English languages.

If India wants to build its foundational model, it should digitize all its records, hope its 1T+ tokens, keep it on lock to its own models and then have the best models in Indic languages.

February 2, 2025 at 2:25 AM

Deedy

@deedydas.bsky.social

o3-mini model comparison chart: price, features and performance.

Misses some of the awesome analysis in the system card, but pretty nicely covers where we are.

Cheaper, better models.

January 31, 2025 at 9:30 PM

Deedy

@deedydas.bsky.social

Google rolled out Gemini 2 Flash today for free to its products.

— one of the most high quality non-reasoning LLMs
— super fast (150tok/s+)
— 1M tok context window.

API price isnt out but was previously $0.075/$0.30 per M input/output tokens. Big move from Google.

January 31, 2025 at 4:18 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news