sidhusmart.bsky.social
@sidhusmart.bsky.social
Flow Matching for Generative Modeling: https://buff.ly/3T4bb9W
Latent Consistency Models: https://buff.ly/3COzdmh
December 2, 2024 at 12:20 PM
What’s your crazy idea for AI-generated images? 🚀 What other techniques or tools have you come across?
December 2, 2024 at 12:20 PM
What's exciting is that every leap forward isn't just about improving efficiency. It's about knocking down barriers so that our collective imagination can go further. Blockade Labs allows developers to generate 360-degree environments and Decart OASIS - a fully-generated AI game is already here!
December 2, 2024 at 12:20 PM
Latent Consistency Models (LCM): While FM rewrites the rules, LCM focuses on optimizing the existing technique. By “teaching” a smaller, faster model to mimic diffusion models, LCM drastically reduces the time needed for inference. We go from 100s of denoising step to 1-4.
December 2, 2024 at 12:20 PM
Flow Matching (FM): A reimagination of how image generation could work. Instead of step-by-step denoising, FM replaces it with a generalized approach that’s faster, more flexible, and requires fewer steps to generate an image. It's a new ground-up approach to image generation.
December 2, 2024 at 12:20 PM
But these challenges are exactly what motivate brilliant minds to find solutions - which is what I came across in my recent paper quest. Researchers have come up with ways to solve the problem of image generation speed often with completely different approaches:
December 2, 2024 at 12:20 PM
Idea: What if you could have a real-time world generated as you navigate it from within your AR/VR glasses!
Roadblock: Stable Diffusion actually takes a few seconds to generate images as it has to go through multiple denoising steps (upto 100)
December 2, 2024 at 12:20 PM
Here's a link to the SoM paper for those who are interested in reading - arxiv.org/pdf/2310.11441
If you're interested in signing up for my course, use this link to get a discount - uplimit.com/go/ai-produc...
Uplimit - Building AI Products with OpenAI
The recent emergence of Large Language Models (LLMs) such as ChatGPT and their ability to operate as open-ended generative systems capable of multiple tasks like question-answering, programming, image...
uplimit.com
November 21, 2024 at 5:26 PM
8/ P.S. I would also highly recommend a trip to Albania. It’s stunning, underexplored, and full of hidden gems. 🌍
November 21, 2024 at 5:26 PM
7/ It’s also exactly the kind of thing I teach in my course, Building AI Products. We explore practical techniques for text, images, and other modalities—and how to turn them into real-world applications.
If you're interested, our next batch starts in December!
November 21, 2024 at 5:26 PM
6/ I found this pretty cool because visual prompting is a whole new dimension compared to text-based prompting techniques like role-play or chain-of-thought. It opens doors to exciting possibilities in AI-powered tools.
November 21, 2024 at 5:26 PM
5/ SoM prompting is brilliant because it transforms how AI interprets visual data. I guess you are giving cues to the model about what to focus on, boosting accuracy dramatically. In the paper, the researchers suggested automating this with Segment Anything Model (SAM).
November 21, 2024 at 5:26 PM
4/ In the second attempt, I added four small markers to the image—highlighting key spots like cars and mountains! GPT-4o nailed it.
Not only did it correctly identify the location as Albania, but it also looked at number plates and suggested the exact area I was heading to!
November 21, 2024 at 5:26 PM
3/ First, I uploaded the photo as it was.
The AI’s response was vague and unhelpful. No clear idea where the picture was taken.
November 21, 2024 at 5:26 PM
2/ I came across an interesting research paper that describes a technique called Set-of-Mark (SoM) prompting, designed for LMMs like GPT-4o.
SoM is a way to guide AI’s “vision” by adding small visual markers to an image. Curious, I decided to test it out.
November 21, 2024 at 5:26 PM