Roopal Garg
roopalgarg.bsky.social
Roopal Garg
@roopalgarg.bsky.social
Multimodal Multi-lingual research at Google DeepMind for Gemini post-training.
#NLProc #Multimodal
Pinned
📢 Excited to unveil our latest research, ImageInWords (IIW)! 🚀We're pushing the boundaries of image descriptions with a new seeded, sequential, human-in-the-loop approach producing SoTA, articulate, hyper-detailed descriptions.

arXiv: arxiv.org/abs/2405.02793
#NLProc #ComputerVision #Multimodal
ImageInWords: Unlocking Hyper-Detailed Image Descriptions
Despite the longstanding adage "an image is worth a thousand words," generating accurate hyper-detailed image descriptions remains unsolved. Trained on short web-scraped image text, vision-language mo...
arxiv.org
Reposted by Roopal Garg
🥁Introducing Gemini 2.5, our most intelligent model with impressive capabilities in advanced reasoning and coding.

Now integrating thinking capabilities, 2.5 Pro Experimental is our most performant Gemini model yet. It’s #1 on the LM Arena leaderboard. 🥇
March 25, 2025 at 5:25 PM
folks working on one or more of the following

🖼️ Image Descriptions to improve Image-Text alignment
AND/OR
💬Multi/Cross Lingual image-text understanding/generation
AND/OR
🌏Geo-Cultural representation and learning

Please DM if you are willing to discuss the current state/challenges/future-work.
November 25, 2024 at 6:57 AM
Reposted by Roopal Garg
New starter pack! go.bsky.app/GZ4hZzu
October 28, 2024 at 9:43 AM
We had a great experience presenting our work on ImageInWords to the community #EMNLP2024 . Thank you everyone for stopping by🙏! Looking forward to future work and seeing image descriptions as a foundational multi-modal task! @emnlpmeeting.bsky.social @deep-mind.bsky.social #NLProc #Multimodal
November 23, 2024 at 10:53 PM
Reposted by Roopal Garg
November 19, 2024 at 3:48 AM
Reposted by Roopal Garg
hello new followers! we’re actively hiring on our generative media team in Mountain View: boards.greenhouse.io/deepmind/job...

we work on image, video, audio, etc… come work with us if you’re interested! apply asap :)
Research Engineer, GenMedia
Mountain View, California, US
boards.greenhouse.io
November 22, 2024 at 6:08 AM
📢 Excited to unveil our latest research, ImageInWords (IIW)! 🚀We're pushing the boundaries of image descriptions with a new seeded, sequential, human-in-the-loop approach producing SoTA, articulate, hyper-detailed descriptions.

arXiv: arxiv.org/abs/2405.02793
#NLProc #ComputerVision #Multimodal
ImageInWords: Unlocking Hyper-Detailed Image Descriptions
Despite the longstanding adage "an image is worth a thousand words," generating accurate hyper-detailed image descriptions remains unsolved. Trained on short web-scraped image text, vision-language mo...
arxiv.org
November 21, 2024 at 12:26 AM