Lightnews — Scholar-powered news

Hrishi

@olickel.com

770 followers 1.3K following 140 posts

Previously CTO, Greywing (YC W21). Building something new at the moment.

Writes at https://olickel.com

Posts Replies Media Videos

Hrishi

@olickel.com

www.astralcodexten.com/p/what-is-m...

What Is Man, That Thou Are Mindful Of Him?

...

www.astralcodexten.com

September 19, 2025 at 4:20 PM

Hrishi

@olickel.com

What do I mean by agentic model? Very simply put, it's the ability to hold macro instructions in view across an increasing number of turns, and to use primary tools (read, write, edit, shell) consistently without getting lost. Added bonus is the ability to learn from mistakes further up the chain!

July 13, 2025 at 6:09 PM

Hrishi

@olickel.com

Editing multiple files, reading new context, maintaining agentic state (not forgetting where you were), or forgetting instructions. This is repo with included prompts, notes, plans, lots of things to mistake for context and be poisoned by.

The output was >1M tokens, and it wasn't an easy task.

July 13, 2025 at 6:09 PM

Hrishi

@olickel.com

Definitely going to make one! I'm looking forward to re-reading the report and making one about what we learned

June 2, 2025 at 2:44 PM

Hrishi

@olickel.com

This looks really interesting btw, haven't tried yet

bsky.app/profile/jps...

Justin Schroeder (@jpschroeder.com)

I wrote a library to scratch this itch: jsonreader.formkit.com https://jsonreader.formkit.com

bsky.app

June 1, 2025 at 4:15 PM

Hrishi

@olickel.com

Oh and NL vs Code outputs.

at least 70% of problems I've seen can be fixed by improving one of these areas.

bsky.app/profile/dan...

June 1, 2025 at 4:15 PM

Hrishi

@olickel.com

Full paper and results - also can we normalise releasing preprints in Notion? So much easier to read, annotate and understand!
x.com/StellaLisy/...

May 29, 2025 at 6:27 AM

Hrishi

@olickel.com

GRPO clips impact based on token probability. Lower prob tokens can move less than higher prob tokens. This means that even with random rewards (especially so), models push more into what was in-distribution. For -MATH, this is code - It thinks better in code. Therefore it gets better overall.

May 29, 2025 at 6:27 AM

Hrishi

@olickel.com

3. Find new ways of measuring rewards signals from NL reasoning, instead of switching to code or structured output for measurement (which can corrupt results).

Finally, about the random rewards improving benchmark performance: It's the clipping term.

May 29, 2025 at 6:27 AM

Hrishi

@olickel.com

Actionable takeaways for us:
1. Test code-as-reasoning pathways (no code interpreter, interleaved in thinking instead of as a toolcall or output).
2. Measure model aptitude and performance in thinking with code vs without.

May 29, 2025 at 6:27 AM

Hrishi

@olickel.com

Honestly it's relevant to almost all work - most agentic flows have 10-20 transitions (sometimes more) per loop.

Most flows today treat NL as reasoning, code as execution, and structured data as an extraction method. There might be problems with this approach.

May 29, 2025 at 6:27 AM

Hrishi

@olickel.com

Trying different problems on multiple models, there's a distinct difference in answer and reasoning quality in code vs NL.

This is heavily relevant to the work we're doing, which involves transitioning between NL reasoning and code boundaries repeatedly.

May 29, 2025 at 6:27 AM

Hrishi

@olickel.com

Testing this locally surprised me too. Something is definitely happening here - and it's also apparent when testing Opus vs Sonnet 4. Models reason very, VERY differently when using code vs natural language - displaying very different aptitudes working through the same problem.

May 29, 2025 at 6:27 AM

Hrishi

@olickel.com

Now for the schemas, I agree with this assessment. Opus is the best for describing data - it has a way of being methodical that the other models (or tools) don't really have. They all managed to load the data properly, which is still a big leap.

May 24, 2025 at 6:46 PM

Hrishi

@olickel.com

Here are the databases they came up with (claude code made this image).

May 24, 2025 at 6:46 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news