Lightnews — Scholar-powered news

jm

@lomz.me

in Claude Code, which models can read a skill file with 3 simple criteria to follow a specific code style?

Sonnet 4.5 can for sure. But haiku 4.5 cannot consistently.

Python service writing evaluation

- Haiku averaged 88.9% across all checks
- Sonnet achieved 100.0% across all checks

December 27, 2025 at 1:18 AM

jm

@lomz.me

One day into building my eval driven python skills for claude code and I am already running into instruction following issues (haiku 4.5).

There's only 3 instructions and it can't follow them consistently.

My whole skill can fit in a post, so I'll attach it in the replies.

December 24, 2025 at 1:39 PM

jm

@lomz.me

"it's a gamechanger, trust me bro" in AI coding circles is like a sickness, and @scottspence.dev writing a real evaluation for his svelte claude code plugin is my medicine. Thank you for this!

I'm making python plugins using this strategy because data driven is the way!

github.com/jack-michaud...

GitHub - jack-michaud/faire

Contribute to jack-michaud/faire development by creating an account on GitHub.

github.com

December 23, 2025 at 1:00 PM

Reposted by jm

jm

@lomz.me

why did i join a company that has prod on us-east-1

September 18, 2023 at 7:11 PM

Reposted by jm

jm

@lomz.me

i have been informed that yesterday's aws outage did not affect us, our code is just prone to crashing

September 19, 2023 at 12:26 PM

jm

@lomz.me

"i've been bone bottomed" after dying in the beginning of #silksong will never not be funny

September 6, 2025 at 2:00 PM

Reposted by jm

Dan Goldstein

@dggoldst.bsky.social

New from Microsoft Research & Harvard Business School colleagues

"Shifting Work Patterns with Generative AI" by Eleanor Wiske Dillon, Sonia Jaffe, Nicole Immorlica, Christopher T. Stanton

arxiv.org/abs/2504.11436

April 19, 2025 at 10:13 PM

Reposted by jm

Neel Bhandari

@neelbhandari.bsky.social

1/🚨 𝗡𝗲𝘄 𝗽𝗮𝗽𝗲𝗿 𝗮𝗹𝗲𝗿𝘁 🚨
RAG systems excel on academic benchmarks - but are they robust to variations in linguistic style?

We find RAG systems are brittle. Small shifts in phrasing trigger cascading errors, driven by the complexity of the RAG pipeline 🧵

April 17, 2025 at 7:55 PM

jm

@lomz.me

When a new language model conquers a benchmark, it's because that benchmark exposed an "adversarial case" in the model's function - then they stuff in a trillion tokens of data augmentation to fix that case.

But this highlights something fundamental about today's language model architectures...

jm @lomz.me · Dec 24

arxiv.org/abs/2406.12843

"Our results suggest that building robust AI systems is challenging even with extremely superhuman systems in some of the most tractable settings, and highlight two key gaps: efficient generalization in defenses, and diversity in training."

Can Go AIs be adversarially robust?

Prior work found that superhuman Go AIs can be defeated by simple adversarial strategies, especially "cyclic" attacks. In this paper, we study whether adding natural countermeasures can achieve robust...

arxiv.org

December 24, 2024 at 7:56 PM

jm

@lomz.me

arxiv.org/abs/2406.12843

"Our results suggest that building robust AI systems is challenging even with extremely superhuman systems in some of the most tractable settings, and highlight two key gaps: efficient generalization in defenses, and diversity in training."

Can Go AIs be adversarially robust?

Prior work found that superhuman Go AIs can be defeated by simple adversarial strategies, especially "cyclic" attacks. In this paper, we study whether adding natural countermeasures can achieve robust...

arxiv.org

December 24, 2024 at 7:48 PM

jm

@lomz.me

Still waiting for the GSM-Symbolic benchmarks on o3.. if you can pay $1000 for a model that gets something right between 80-99% of the time, is that still valuable?

December 20, 2024 at 7:57 PM

Reposted by jm

Lee Hurley

@hleehurley.com

"Wrong people listened to on puberty blockers ban".

The National in Scotland continuing to do some good work.

Wrong people listened to on puberty blockers ban
The National (Scotland)17 Dec 2024Steph Paton

Trans people are spoken about but not to by the mainstream press
IN indefinitely extending the ban on puberty blockers for trans young people, Labour have enacted the first piece of anti-LGBT legislation to come from Westminster in 36 years.

And like the ban on teaching of LGBTQ+ identities enacted in 1988, it is grounded in little more than a moral panic that harms young people.

Health Secretary Wes Streeting claims that the decision to extend the Conservative Party’s ban on something which is a reversible and safe means to give young people the time and space to decide what is right for them was taken after receiving expert advice. But the Government’s own consultation process tells another story.

If, as the Government’s proponents are so quick to say, we must take the heat out of the so-called transgender debate, the route to doing so must surely be through expert opinion and by listening to the experiences of young trans people themselves.

December 17, 2024 at 9:44 AM

Reposted by jm

fofr

@fofr.ai

I've trained a new handwriting flux lora.
It can do many styles.

Prompt it with HWRIT keyword, give it some short text, a handwriting style and some ink and paper types.

More examples and download links in 🧵

December 14, 2024 at 1:39 PM

Reposted by jm

Vicki

@vickiboykis.com

Want to clear up some misconceptions, pydantic is actually short for Pytholomew Daniel Ticonderoga, the inventor and patent-holder of the first disposable pencil that also forced you to write only on paper with dotted lines

December 3, 2024 at 12:13 AM

Reposted by jm

seven “swans a swimming” rasmussen

@toomanyspectra.bsky.social

can I code fast? no. but can I code well? also no. but does my code work? alas, no

November 30, 2024 at 9:39 PM

Reposted by jm

Tiziano Piccardi

@tiziano.bsky.social

New paper: Do social media algorithms shape affective polarization?

We ran a field experiment on X/Twitter (N=1,256) using LLMs to rerank content in real-time, adjusting exposure to polarizing posts. Result: Algorithmic ranking impacts feelings toward the political outgroup! 🧵⬇️

November 25, 2024 at 8:32 PM

Reposted by jm

Itai Yanai

@itaiyanai.bsky.social

Doing good science is 90% finding a science buddy to constantly talk to about the project.

November 9, 2024 at 10:53 PM

Reposted by jm

Jon Bois

@jonbois.bsky.social

posting "Gentle reminder that it's okay to unplug today and take care of yourself. Pass it on." immediately before commencing a 19-hour, 277-post meltdown

November 5, 2024 at 3:05 PM

jm

@lomz.me

I miss DJ Filthy K

pixelatedboat aka “mr bluesky” @pixelatedboat.bsky.social · Sep 21

A notable cultural shift over the last 40 years is that DJs used to have names like DJ Funky Paul and now DJs all have names like DJ Adult Circumcision

September 21, 2023 at 11:20 AM

Reposted by jm

pixelatedboat aka “mr bluesky”

@pixelatedboat.bsky.social

A notable cultural shift over the last 40 years is that DJs used to have names like DJ Funky Paul and now DJs all have names like DJ Adult Circumcision

September 21, 2023 at 10:24 AM

Reposted by jm

Nome

@nome.bsky.social

"Create a problem to solve for pay," this dog was the first tech bro.

September 20, 2023 at 11:57 PM

jm

@lomz.me

wondering how possible it is to automate a system to add people to a mute list. we could make a handle to @ which allows suggestions to happen. then there's a language model that is really good at classifying toxic content called wormgpt that could be used to check submissions. fairly possible

September 21, 2023 at 1:51 AM