Niladri Shekhar Dutt
banner
niladridutt.bsky.social
Niladri Shekhar Dutt
@niladridutt.bsky.social
Research Intern @adobe.com | PhD @ucl.ac.uk | @ellis.eu | ex-Nvidia, Berkeley | Interested in generative modelling in vision and graphics + reasoning (LLMs)

https://niladridutt.com/
🧵9/10 We quantitaively evaluate on the Adobe5k dataset as well as conduct user studies by expert and novice users. Our evaluations show that MonetGPT outperforms open-source alternatives and performs comparably to Google Photos AutoEnhance (closed-source).
May 27, 2025 at 3:13 PM
🧵8/10 Photo editing is subjective 🎨. Our framework adapts to user preference by guidance from natural language tags like ‘vibrant’ or ‘retro vibe’ to produce personalized and stylistically distinct retouching plans from the same input image.
May 27, 2025 at 3:13 PM
🧵7/10 Our puzzle-based training with a 'reasoning as a pathway' approach allows MonetGPT to generate detailed justifications for each edit, delivering truly explainable image retouching
May 27, 2025 at 3:13 PM
🧵6/10 🧩 Puzzle C builds planning capabilities. The model learns to generate a complete, multi-step retouching plan to enhance a photo, structuring its reasoning as a sequence of discrete issues and solutions for clarity and control.
May 27, 2025 at 3:13 PM
🧵5/10 🧩 Puzzle B imparts aesthetic judgement. By ranking professionally edited photos against altered versions, the MLLM learns to recognize the visual characteristics of an optimally adjusted image for any given operation, building an internal aesthetic model.
May 27, 2025 at 3:13 PM
🧵4/10 🧩 Puzzle A builds an understanding of individual operations. The MLLM learns to map visual changes in before/after images to a specific tool and its precise parameter value, effectively learning the semantics of our procedural library.
May 27, 2025 at 3:13 PM
🧵1/10 Excited to share our #SIGGRAPH paper "MonetGPT: Solving Puzzles Enhances MLLMs' Image Retouching Skills" 🌟
We explore how to make MLLMs operation-aware by solving visual puzzles and propose a procedural framework for image retouching
#MLLM
May 27, 2025 at 3:13 PM
🧵9/10 We quantitaively evaluate on the Adobe5k dataset as well as conduct user studies by expert and novice users. Our evaluations show that MonetGPT outperforms open-source alternatives and performs comparably to Google Photos AutoEnhance (closed-source).
May 27, 2025 at 3:04 PM
🧵8/10 Photo editing is subjective 🎨. Our framework adapts to user preference by guidance from natural language tags like ‘vibrant’ or ‘retro vibe’ to produce personalized and stylistically distinct retouching plans from the same input image.
May 27, 2025 at 3:04 PM
🧵7/10 Our puzzle-based training with a 'reasoning as a pathway' approach allows MonetGPT to generate detailed justifications for each edit, delivering truly explainable image retouching
May 27, 2025 at 3:04 PM
🧵6/10 🧩 Puzzle C builds planning capabilities. The model learns to generate a complete, multi-step retouching plan to enhance a photo, structuring its reasoning as a sequence of discrete issues and solutions for clarity and control.
May 27, 2025 at 3:04 PM
🧵5/10 🧩 Puzzle B imparts aesthetic judgement. By ranking professionally edited photos against altered versions, the MLLM learns to recognize the visual characteristics of an optimally adjusted image for any given operation, building an internal aesthetic model.
May 27, 2025 at 3:04 PM
🧵4/10 🧩 Puzzle A builds an understanding of individual operations. The MLLM learns to map visual changes in before/after images to a specific tool and its precise parameter value, effectively learning the semantics of our procedural library.
May 27, 2025 at 3:04 PM
Hi Londoners

Join us on April 15 for an evening on Gen AI for 3D at UCL! We have an amazing list of keynote speakers and lightning talks. Register at londongenai.github.io

Very excited to co-organize this with Michael and Preddy!
April 10, 2025 at 9:58 AM
Who will tell the silicon valley tech bros that it wasn't them alone
February 7, 2025 at 8:34 PM
Finding the perfect banner image is so easy now.
Just generated mine with chatGPT/Dall-E 3!
November 16, 2024 at 11:48 PM