Lightnews — Scholar-powered news

Reposted by John (Yueh-Han) Chen

@maksym-andr.bsky.social

🚨Excited to release OS-Harm! 🚨

The safety of computer use agents has been largely overlooked.

We created a new safety benchmark based on OSWorld for measuring 3 broad categories of harm:
1. deliberate user misuse,
2. prompt injections,
3. model misbehavior.

June 19, 2025 at 3:28 PM

Reposted by John (Yueh-Han) Chen

NYU Center for Data Science

@nyudatascience.bsky.social

Frontier AI systems failed to reliably flag safety risks related to more than 40% of common safety facts tested in the SAGE‑Eval benchmark by Yueh-Han (John) Cheni, @guydav.bsky.social, and @brendenlake.bsky.social.

nyudatascience.medium.com/even-the-top...

Even the Top LLM Failed to Reliably Flag Some Risks Related to 40% of Safety Facts

CDS’ SAGE‑Eval shows top‑performing AI models failed at least 42% of safety warnings in novel scenarios.

nyudatascience.medium.com

August 29, 2025 at 4:09 PM

Reposted by John (Yueh-Han) Chen

NYU Center for Data Science

@nyudatascience.bsky.social

CDS PhD student @vishakhpk.bsky.social, with co-authors @johnchen6.bsky.social, Jane Pan, Valerie Chen, and CDS Associate Professor @hhexiy.bsky.social, has published new research on the trade-off between originality and quality in LLM outputs.

Read more: nyudatascience.medium.com/in-ai-genera...

In AI-Generated Content, A Trade-Off Between Quality and Originality

New research from CDS researchers maps the trade-off between originality and quality in LLM outputs.

nyudatascience.medium.com

May 30, 2025 at 4:07 PM

Reposted by John (Yueh-Han) Chen

Guy Davidson ✈️ NeurIPS 2025

@guydav.bsky.social

Fantastic new work by @johnchen6.bsky.social (with @brendenlake.bsky.social and me trying not to cause too much trouble).

We study systematic generalization in a safety setting and find LLMs struggle to consistently respond safely when we vary how we ask naive questions. More analyses in the paper!

John (Yueh-Han) Chen @johnchen6.bsky.social · May 29

Do LLMs show systematic generalization of safety facts to novel scenarios?

Introducing our work SAGE-Eval, a benchmark consisting of 100+ safety facts and 10k+ scenarios to test this!

- Claude-3.7-Sonnet passes only 57% of facts evaluated
- o1 and o3-mini passed <45%! 🧵

May 30, 2025 at 5:32 PM

Reposted by John (Yueh-Han) Chen

Brenden Lake

@brendenlake.bsky.social

Failures of systematic generalization in LLMs can lead to real-world safety issues.

New paper by @johnchen6.bsky.social and @guydav.bsky.social, arxiv.org/abs/2505.21828

John (Yueh-Han) Chen @johnchen6.bsky.social · May 29

Do LLMs show systematic generalization of safety facts to novel scenarios?

Introducing our work SAGE-Eval, a benchmark consisting of 100+ safety facts and 10k+ scenarios to test this!

- Claude-3.7-Sonnet passes only 57% of facts evaluated
- o1 and o3-mini passed <45%! 🧵

May 29, 2025 at 7:17 PM

John (Yueh-Han) Chen

@johnchen6.bsky.social

Do LLMs show systematic generalization of safety facts to novel scenarios?

Introducing our work SAGE-Eval, a benchmark consisting of 100+ safety facts and 10k+ scenarios to test this!

- Claude-3.7-Sonnet passes only 57% of facts evaluated
- o1 and o3-mini passed <45%! 🧵

May 29, 2025 at 4:51 PM

Reposted by John (Yueh-Han) Chen

Vishakh Padmakumar

@vishakhpk.bsky.social

What does it mean for #LLM output to be novel?
In work w/ johnchen6.bsky.social, Jane Pan, Valerie Chen and He He, we argue it needs to be both original and high quality. While prompting tricks trade one for the other, better models (scaling/post-training) can shift the novelty frontier 🧵

April 29, 2025 at 4:35 PM

Reposted by John (Yueh-Han) Chen

Ai2

@ai2.bsky.social

Meet Ai2 Paper Finder, an LLM-powered literature search system.

Searching for relevant work is a multi-step process that requires iteration. Paper Finder mimics this workflow — and helps researchers find more papers than ever 🔍

Screenshot of the Ai2 Paper Finder interface

March 26, 2025 at 7:07 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news