Lightnews — Scholar-powered news

Daniel Mewes

@dmewes.com

Poetiq's methodology on top of Gemini 3 & GPT 5.1 exceeds average human performance on ARC-AGI-2!
This is huge.

Only caveat is that they evaluated on the public set - it might have been used in post training of Gemini 3? Looking forward to see private eval results! poetiq.ai/posts/arcagi...

Traversing the Frontier of Superintelligence

Poetiq is proud to announce a major milestone in AI reasoning. We have established a new state-of-the-art (SOTA) on the ARC-AGI-1 & 2 benchmarks, significantly advancing both the performance and the e...

poetiq.ai

November 21, 2025 at 5:56 PM

Daniel Mewes

@dmewes.com

There have been a few works that showed signs of LLM introspection since I wrote my article. Though the consensus still seems to be that it's quite unreliable. amongai.com/2024/12/24/l...

October 29, 2025 at 7:01 PM

Reposted by Daniel Mewes

Simon Willison

@simonwillison.net

This was a tough but necessary decision - I posted my own notes on this here, from the perspective of a current PSF board member simonwillison.net/2025/Oct/27/...

October 27, 2025 at 8:34 PM

Daniel Mewes

@dmewes.com

I have to admit the mechanism behind cable impedance and impedance matching never really clicked for me, despite being a licensed radio amateur for 20 years.
...until I watched this video by AlphaPhoenix - it's such an amazing visualization of what's going on in a cable! youtu.be/RkAF3X6cJa4?...

What does "impedance matching" actually look like? (electricity waves)

YouTube video by BetaPhoenix

youtu.be

October 24, 2025 at 3:22 PM

Daniel Mewes

@dmewes.com

Claude Haiku 4.5 outperforms Sonnet 4 in some coding benchmarks (such as SWE-bench Verified). This is exciting, since it's 1/3 the price.

However, in my actual use, I've found it to be a bit underwhelming compared to Sonnet 4 (not to mention Sonnet 4.5).

What has been your experience with Haiku?

October 23, 2025 at 5:35 PM

Daniel Mewes

@dmewes.com

This is so true.
I think almost nobody is talking about symbolic GOFAI these days, so I'm not concerned about that.
But all of ML being re-branded to AI lately, while AI has simultaneously been made synonymous with generative AI in many places, has led to so much confusion.

Ethan Mollick @emollick.bsky.social · Oct 22

The fallout from the fact that data science/classical machine learning & generative AI are both called "AI" has been remarkably broad & persistent

Policy addresses the wrong harms, companies have been confused about who should lead efforts, hiring is misguided, academic discussion is often muddled.

October 22, 2025 at 5:51 PM

Daniel Mewes

@dmewes.com

The age of single-serving, disposable (simple) software is here. I now frequently have LLMs write software for me that I use exactly once.
Often to convert some data or create visualizations, but also one-off new features to add to some open-source application that I only want to use once.

October 8, 2025 at 12:07 AM

Reposted by Daniel Mewes

thebes

@vgel.me

new blog post! why do LLMs freak out over the seahorse emoji? i put llama-3.3-70b through its paces with the logit lens to find out, and explain what the logit lens (everyone's favorite underrated interpretability tool) is in the process.

link in reply!

October 5, 2025 at 2:36 PM

Daniel Mewes

@dmewes.com

Today we're releasing Sculpture, an AI coding tool that combines async agents with the ability to collaborate with agents locally.

It also comes with built-in verifiers to automatically check the quality of AI written code. More to come! imbue.com/sculptor-ann...

A picture of @kanjun.bsky.social making an excited gesture, with the label "developers, developers, developers" written over it.

September 30, 2025 at 5:04 PM

Daniel Mewes

@dmewes.com

I used AI vibe coding tools to port Anthropic's API client to Common Lisp: github.com/danielmewes/...
It was a very quick and fun mini project that taught me a thing or two about Lisp. Use it at your own risk.

GitHub - danielmewes/anthropic-sdk-cl-port: An AI-written port of the Anthropic client SDK to Common Lisp.

An AI-written port of the Anthropic client SDK to Common Lisp. - danielmewes/anthropic-sdk-cl-port

github.com

September 22, 2025 at 4:42 PM

Daniel Mewes

@dmewes.com

I feel like each of Anthropic's three post mortems is missing some key explanation step in their root cause descriptions. www.anthropic.com/engineering/... 🧵

A postmortem of three recent issues

This is a technical report on three bugs that intermittently degraded responses from Claude. Below we explain what happened, why it took time to fix, and what we're changing.

www.anthropic.com

September 18, 2025 at 5:47 PM

Daniel Mewes

@dmewes.com

My Hulu / Disney+ subscription will be preempted indefinitely.

September 18, 2025 at 5:14 AM

Reposted by Daniel Mewes

Ethan Mollick

@emollick.bsky.social

As AI systems keep getting better at very hard problems while getting more opaque, the way that we work with AI is shifting shifting from being collaborators who shape the process to being supplicants who receive the output.

I discussed what that means. www.oneusefulthing.org/p/on-working...

On Working with Wizards

Verifying magic on the jagged frontier

www.oneusefulthing.org

September 11, 2025 at 8:55 PM

Reposted by Daniel Mewes

Sung Kim

@sungkim.bsky.social

Meta trained a special “aggregator” model that learns how to combine and reconcile different answers into a more accurate final one, instead of relying on simple majority voting or reward model ranking on multiple model answers.

September 9, 2025 at 2:03 PM

Reposted by Daniel Mewes

Ethan Mollick

@emollick.bsky.social

The funny thing about the prediction that AI would be writing 90% of all code by now is that the prediction's failure distracts from the fact that AI adoption in code writing is actually extremely high, it was already over 30% in December, 2024 according to one measure, with large economic impact.

September 3, 2025 at 4:19 PM

Daniel Mewes

@dmewes.com

I remember seeing this graph and thinking it was about the influence of training data on LLM responses as well. Quite misleading.

Ethan Mollick @emollick.bsky.social · Sep 2

This chart is everywhere and is being horribly misinterpreted.

This is not where the training data for AI comes from, it is a study done by a SEO firm that claims to show how often sites come up at least once in THE WEB SEARCH FUNCTION of certain AI agents when they do a web search for more info.

September 2, 2025 at 3:52 AM

Daniel Mewes

@dmewes.com

I think we're starting to see diminishing returns from LLM pre- and post-training. The limitations of today's LLMs are unlikely to just disappear with the next bigger model.
This is not all bad: we can start focusing on how to work around those limitations and how to put current LLMs to work.

August 29, 2025 at 4:25 PM

Daniel Mewes

@dmewes.com

Just experienced phantom breaking for the first time in my Kia with HDA2 (Highway Drive Assist) and it made me sentimental about the exciting times of early Tesla Autopilot 😢

August 29, 2025 at 4:00 PM

Daniel Mewes

@dmewes.com

Thanks to the genius of a well-thought-out tariff policy, manufacturing trucks (for the US market) in Mexico now incurs lower tariffs than manufacturing them domestically. 🤦 www.reuters.com/business/aut...

August 27, 2025 at 4:21 AM

Reposted by Daniel Mewes

sakanaai.bsky.social

@sakanaai.bsky.social

What if we could evolve AI models like organisms, letting them compete, mate, and combine their strengths to produce ever-fitter offspring?

Excited to share our new paper, “Competition and Attraction Improve Model Fusion” presented at GECCO 2025 (runner-up for best paper)!

arxiv.org/abs/2508.16204

Competition and Attraction Improve Model Fusion

Model merging is a powerful technique for integrating the specialized knowledge of multiple machine learning models into a single model. However, existing methods require manually partitioning model parameters into fixed groups for merging, which restricts the exploration of potential combinations and limits performance. To overcome these limitations, we propose M2N2, an evolutionary algorithm with three key features: 1/ dynamic adjustment of merging boundaries to progressively explore a broader range of parameter combinations; 2/ a diversity preservation mechanism inspired by the competition for resources in nature, to maintain a population of diverse, high-performing models that are particularly well-suited for merging; and 3/ a heuristic-based attraction metric to identify the most promising pairs of models for fusion. Our experimental results demonstrate, for the first time, that model merging can be used to evolve models entirely from scratch. Specifically, we apply M2N2 to evolve MNIST classifiers from scratch and achieve performance comparable to CMA-ES, while being computationally more efficient. Furthermore, M2N2 scales to merge specialized language and image generation models, achieving state-of-the-art performance. Notably, it preserves crucial model capabilities beyond those explicitly optimized by the fitness function, highlighting its robustness and versatility.

August 25, 2025 at 2:48 AM

Daniel Mewes

@dmewes.com

Study finds that polarization and echo chambers on social media may be inherent to the medium and not due to algorithms. www.science.org/content/arti...

Don’t blame the algorithm: Polarization may be inherent in social media

In simulations, AI-generated users of stripped-down social media without content algorithms still split into polarized echo chambers

www.science.org

August 24, 2025 at 5:11 PM

Reposted by Daniel Mewes

Dulany, pumpkinhead truther 🎃

@dulanyw.bsky.social

Impressive result: 99.9% on AIME 2025 with both open models and 85% fewer tokens!

The trick? Measuring confidence in the reasoning traces as they are being generated, then ending low confidence traces early.

jiaweizzhao.github.io/deepconf/

Deep Think with Confidence

Deep Think with Confidence (DeepConf): A simple yet powerful method that significantly improves both reasoning efficiency and performance at test time by leveraging model-internal confidence signals.

jiaweizzhao.github.io

August 24, 2025 at 1:28 PM

Reposted by Daniel Mewes

Ethan Mollick

@emollick.bsky.social

It seems like there is not enough of a policy response to the existence of self-driving cars, with 57M miles of data, Waymo’s autonomous vehicles experience 85% less serious injuries & 79% less injuries overall than cars with human drivers

2.4 million are injured & 40k killed in US accidents a year

August 23, 2025 at 1:45 PM

Daniel Mewes

@dmewes.com

One thing I've learned is that when Americans call something "European style", it's 100% not like anything I've ever seen or eaten in Europe. I think it's just an attribute used to mean "a bit different from how we usually do it in America"?

August 20, 2025 at 3:01 AM

Reposted by Daniel Mewes

Eugene Vinitsky 🍒

@eugenevinitsky.bsky.social

This is clearly where this is all going

App (represented as globe) supported by stack of turtles. In order the turtles are named LLM, LLM Judge, LLM Judge Judge, LLM Supreme Court

August 17, 2025 at 10:00 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news