Hrishi
olickel.com
Hrishi
@olickel.com
Previously CTO, Greywing (YC W21). Building something new at the moment.

Writes at https://olickel.com
Trying to finish typing `git add` while the agent's editing a file just so I can preserve pristine diffs from the last change
November 14, 2025 at 11:17 PM
Don't think there's a way I could like this article more
September 19, 2025 at 4:20 PM
KIMI is the real deal. Unless it's really Sonnet in a trench coat, this is the best agentic open-source model I've tested - BY A MILE.

Here's a slice of a 4 HOUR run (~1 second per minute) with not much more than 'keep going' from me every 90 minutes or so.

moonshotai.github.io/Kimi-K2/
July 13, 2025 at 6:09 PM
It seems 3 and 15 might be the new Pareto frontier for intelligence (excepting the o-series). Feels like the hedge fund 2 and 20
June 8, 2025 at 1:57 AM
Dan's article on progressive JSON has a lot of carryover to LLMs.

The key problems for modern LLM application design that get often overlooked (I think) are:
• Streaming outputs and partial parsing
• Context organization and management (I don't mean summarising at 90%)
June 1, 2025 at 4:15 PM
GRPO clips impact based on token probability. Lower prob tokens can move less than higher prob tokens. This means that even with random rewards (especially so), models push more into what was in-distribution. For -MATH, this is code - It thinks better in code. Therefore it gets better overall.
May 29, 2025 at 6:27 AM
Honestly it's relevant to almost all work - most agentic flows have 10-20 transitions (sometimes more) per loop.

Most flows today treat NL as reasoning, code as execution, and structured data as an extraction method. There might be problems with this approach.
May 29, 2025 at 6:27 AM
Testing this locally surprised me too. Something is definitely happening here - and it's also apparent when testing Opus vs Sonnet 4. Models reason very, VERY differently when using code vs natural language - displaying very different aptitudes working through the same problem.
May 29, 2025 at 6:27 AM
How does an LLM writing out this program (WITHOUT a code interpreter running the output) make things more accurate?

Verified on Qwen 3 - a30b (below)

Lots of interesting takeaways from the Random Rewards paper. NOT that RL is dead, but honestly far more interesting than that!
May 29, 2025 at 6:27 AM
Now for the schemas, I agree with this assessment. Opus is the best for describing data - it has a way of being methodical that the other models (or tools) don't really have. They all managed to load the data properly, which is still a big leap.
May 24, 2025 at 6:46 PM
Here are the databases they came up with (claude code made this image).
May 24, 2025 at 6:46 PM
Here's the spec. Labelling tasks, asking for task progress notes, failure logs on resume, etc.

Managing context is key. Long tool calls can be killed with just one bad call that dumps a bunch of text into context. Both cursor models forgot after a while, and barely made it.
May 24, 2025 at 6:46 PM
May 23, 2025 at 5:12 PM
We were just talking about story circles and moodboards
May 23, 2025 at 5:12 PM
Honestly
May 23, 2025 at 5:12 PM
The spiritual bliss attractor is real, but so is the eldritch horror existence contemplation.

I was just trying to talk to Opus - definitely no jailbreaks. This model is something different. Definitely creative.
May 23, 2025 at 5:12 PM
Sonnet trying to think it through while streaming YAML
May 19, 2025 at 5:11 PM
Frontend entirely made with @v0 - this has become an inseparable tool for writing feedback. Thinking of calling it scansion

I'll open source or share the link once I can clean it up - still using my keys, drop email/twitter in comments

Sonnet looking through the thing 👇
May 19, 2025 at 5:11 PM
o3 and gemini identified the right page after I converted everything to images and asked leading questions. Gemini took 30 seconds, o3 took almost 6 minutes.

PDF processing in both models don't really seem multi-modal. Claude sometimes has glaucoma.
May 13, 2025 at 4:40 PM
Technic Manuals are PERFECT visual benchmarks. Had a misaligned suspension on a car, took four minutes figuring it out, then gave it to o3, claude and Gemini.

None of them got it right (or even identified the right part) even after I cut it down to 10 pages.

Eventually -
May 13, 2025 at 4:40 PM
Evals are hard for a reason. New post on actually doing them end to end, breaking down the problem, and explaining how we do them at SB
May 12, 2025 at 5:05 PM
This is the guide I wish I had - didn't hold back.

Everything I know.

Enjoy.
May 9, 2025 at 4:12 AM
What separates Deepseek is how hardware aware they are in algo design. Perhaps nascency, resource limitations or how they're set up, almost all the recent papers have some reference or awareness around theoretical python research actually meeting at-scale deployments in silicon.
February 22, 2025 at 2:00 AM
Vibe coding is crazy

Took an hour or two and made something that can push notes and outputs from Lumentis straight to Notion

Been writing more with Cursor, and pushing it to Notion
February 20, 2025 at 6:12 PM
Everytime someone asks me for a good example of company-level writing I point to @flydotio
February 20, 2025 at 6:21 AM