Kristian Muñiz
banner
krismuniz.com
Kristian Muñiz
@krismuniz.com
Software developer, designer, and open source enthusiast. 🇵🇷

Technologist turned software engineer by necessity. Building software products

https://krismuniz.com
But "The P-LLM cannot write a plan based on data it can’t read" is a substantial impact to the utility of LLMs and central to the prompt injection challenge, no?

If the P-LLM is detached from the data it needs to plan from aren't we back to using an LLM for generating a program that can run LLM(s)?
April 12, 2025 at 4:25 AM
Absurd decision making, disconnected from reality.

I've followed you for years and know that Google was extremely lucky to have you, any company would be (perhaps your own?).

Regardless of what you do next, I'm sure that as a community we'll continue to follow your work. Please take care!
April 12, 2025 at 2:30 AM
You should make a business out of that, sounds lucrative 💰
March 30, 2025 at 7:29 PM
Metaphors are fun though
March 30, 2025 at 5:52 PM
I found a modern version of this dropoverapp.com
Dropover - Easier Drag and Drop on your Mac.
Dropover is a drag and drop utility that makes it simple to collect, organize, share, and process files with floating shelves.
dropoverapp.com
March 30, 2025 at 5:15 PM
Yeah drag-and-drop with trackpads can be painful
March 30, 2025 at 5:10 PM
hahahah I *just* posted a half-baked idea that resembles this in this very thread. Should've read the full conversation
March 30, 2025 at 5:09 PM
I would argue that there's no right way to do this interaction. It feels unnatural and counterintuitive. I wish I could have a "shelf" I could put dragged items on temporarily while I scroll 😆
March 30, 2025 at 5:08 PM
Brilliant. Yes!
March 29, 2025 at 4:44 AM
In your defense, you can't land a pilot either
March 29, 2025 at 1:25 AM
Ah, hint from Greg Brockman himself. Seems like the "powerful decoder" here is a diffusion model.
March 28, 2025 at 2:05 AM
Yeah, I read the System Card. It can still be autoregressive sampling. From my observations it still makes mistakes that a diffusion model would make, like omitting details, failing to count, producing garbled text, etc.
March 28, 2025 at 1:50 AM
Increasingly, large multimodal models are becoming more and more powerful and one of the first ways we can optimize them is by simplifying their I/O and writing powerful, thick encoders/decoders.
March 28, 2025 at 1:09 AM
*of sampling the next token.

Had to cut some characters.
March 26, 2025 at 6:28 AM
And it's not structural or semantic consistency, but some information gets lost in the process. Perhaps it's safety mechanisms preventing certain behaviors like using people's likeness.
March 26, 2025 at 6:25 AM
Could that be a plausible solution? Using GPT-4o to generate initial image representations and passing these representations to a diffusion model component that specializes in creating high-quality, high-resolution visual outputs?
March 26, 2025 at 4:18 AM
What I know so far, autoregressive models are more expensive to run than diffusion models – of course slower too, latency correlated with cost.

I'm still surprised that resolution is so good. It's almost too good. Could it be a hybrid Transformer + Diffusion approach?
March 26, 2025 at 4:12 AM
Wow, this is just so much better than what's out there, especially for prompt adherence. Aesthetically, I'm seeing a bit of a bias, but it could very well be deliberate.
March 25, 2025 at 9:52 PM
Goddammit 🤦🏻‍♂️ right, that's the whole point of this update
March 25, 2025 at 9:47 PM
By image output I mean sampling tokens that get decoded into rasterised bitmaps. There's some vectorial quality to the generated images.
March 25, 2025 at 9:42 PM
I have a feeling, completely unproven, that this is more than just image output. The infographics are so crisp, it feels like there's some sort of very powerful generative layout engine powering this. Either that or I completely had the wrong intuition about diffusion models.
March 25, 2025 at 9:32 PM
lmao
March 22, 2025 at 3:57 PM