Lightnews — Scholar-powered news

Russ Salakhutdinov

@rsalakhu.bsky.social

VP of Research, GenAI @ Meta (Multimodal LLMs, AI Agents), UPMC Professor of Computer Science at CMU, ex-Director of AI research at @Apple, co-founder Perceptual Machines (acquired by Apple)

Posts Replies Media Videos

Russ Salakhutdinov

@rsalakhu.bsky.social

Our approach shows strong generalization and versatility in generating accurate prompts for objects, styles and images across multiple T2I models, including Stable Diffusion, DALL-E, and Midjourney. It also enables easy editing and multi-concept prompt generation.

April 28, 2025 at 10:52 PM

Russ Salakhutdinov

@rsalakhu.bsky.social

Prompt engineering for personalized image generation is labor-intensive, requires model-specific tuning, limiting generalization.

PRISM uses VLMs and iterative in-context learning to automatically generate effective, human-readable prompts using only black-box access to image generation models.

April 28, 2025 at 10:52 PM

Russ Salakhutdinov

@rsalakhu.bsky.social

With small perturbations (less than 5% of total web page pixels), attackers can execute targeted adversarial goals with up to 67% success rates.

We also find that inference-time compute that is often used to improve model performance can introduce new vulnerabilities and harm robustness.

February 19, 2025 at 10:18 PM

Russ Salakhutdinov

@rsalakhu.bsky.social

4/4 Llama 3.1 70B agents successfully complete 16.7% of tasks on 150k websites. Agents trained on human-annotated data from Mind2Web and WebLINX struggle to generalize to real-world websites. Adding synthetic data significantly improves generalization.

With B Trabucco, G Sigurdsson, R Piramuthu

February 12, 2025 at 2:22 AM

Russ Salakhutdinov

@rsalakhu.bsky.social

3/4 Language models perform competitively with human annotators, achieving:
- 97% accuracy in detecting and filtering harmful content
- 89% success rate in generating feasible tasks
- 82% accuracy in judging successful task completions

February 12, 2025 at 2:21 AM

Russ Salakhutdinov

@rsalakhu.bsky.social

2/4 The pipeline follows a three-step process:
- LLM generates tasks for 150k websites
- LLM agents complete these tasks and produce trajectories
- LLM reviews the trajectories and evaluates their success

February 12, 2025 at 2:21 AM

Russ Salakhutdinov

@rsalakhu.bsky.social

3/3 Joint work with Tiffani Min, Yue Wu, Jimin Sun, Max Kaufmann, Fahim Tajwar, Yonatan Bisk

February 10, 2025 at 10:30 PM

Russ Salakhutdinov

@rsalakhu.bsky.social

2/3 Offline-collected state transitions are evaluated using PRMs to determine optimal intervention timing, creating labeled trajectories for training the helper model.

This minimizes costly intervention calls during training while leveraging PRMs to enhance robustness to off-policy data.

February 10, 2025 at 10:29 PM

Reposted by Russ Salakhutdinov

Paul Vicol

@paulvicol.bsky.social

🌲 Ruslan Salakhutdinov (@rsalakhu.bsky.social) from CMU (@scsatcmu.bsky.social) opened the workshop with a talk on Tree Search for Language Model Agents.

Timestamp 36:20 in neurips.cc/virtual/2024...

📎 arxiv.org/abs/2407.01476

#NeurIPS2024 #AdaptiveFoundationModels

December 19, 2024 at 4:59 AM

Russ Salakhutdinov

@rsalakhu.bsky.social

2/2 Our findings show that even when unlearning a single fact, current methods either fail to properly unlearn with high recall or end up unlearning many other irrelevant facts.

Paper: arxiv.org/abs/2410.15153
Code+Dataset: github.com/wrh14/deep_u...

joint work R Wu, C Yadav, K Chaudhuri.

December 3, 2024 at 2:43 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news