Lightnews — Scholar-powered news

Peter Hase

@peterbhase.bsky.social

Overdue job update — I am now:
- A Visiting Scientist at @schmidtsciences.bsky.social, supporting AI safety & interpretability
- A Visiting Researcher at Stanford NLP Group, working with @cgpotts.bsky.social

So grateful to keep working in this fascinating area—and to start supporting others too :)

July 14, 2025 at 5:06 PM

Peter Hase

@peterbhase.bsky.social

Are p-values missing in AI research?

Bootstrapping makes model comparisons easy!

Here's a new blog/colab with code for:
- Bootstrapped p-values and confidence intervals
- Combining variance from BOTH sample size and random seed (eg prompts)
- Handling grouped test data

Link ⬇️

May 19, 2025 at 3:06 PM

Reposted by Peter Hase

Vaidehi Patil

@vaidehipatil.bsky.social

🚨 Introducing our @tmlrorg.bsky.social paper “Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation”
We present UnLOK-VQA, a benchmark to evaluate unlearning in vision-and-language models, where both images and text may encode sensitive or private information.

May 7, 2025 at 6:55 PM

Reposted by Peter Hase

Geoffrey Irving

@girving.bsky.social

AISI has a new grant program for funding academic and nonprofit-affiliated research in (1) safeguards to mitigate misuse risk and (2) AI control and alignment to mitigate loss of control risk. Please apply! 🧵

www.aisi.gov.uk/grants

Grants | The AI Security Institute (AISI)

View AISI grants. The AI Security Institute is a directorate of the Department of Science, Innovation, and Technology that facilitates rigorous research to enable advanced AI governance.

www.aisi.gov.uk

March 5, 2025 at 11:04 AM

Reposted by Peter Hase

Elias Stengel-Eskin

@esteng.bsky.social

🎉Very excited that our work on Persuasion-Balanced Training has been accepted to #NAACL2025! We introduce a multi-agent tree-based method for teaching models to balance:

1️⃣ Accepting persuasion when it helps
2️⃣ Resisting persuasion when it hurts (e.g. misinformation)

arxiv.org/abs/2410.14596
🧵 1/4

January 23, 2025 at 4:51 PM

Peter Hase

@peterbhase.bsky.social

Anthropic Alignment Science is sharing a list of research directions we are interested in seeing more work on!

Blog post below 👇

akbir khan @akbir.bsky.social · Jan 10

What can AI researchers do *today* that AI developers will find useful for ensuring the safety of future advanced AI systems? To ring in the new year, the Anthropic Alignment Science team is sharing some thoughts on research directions we think are important.
alignment.anthropic.com/2025/recomme...

Recommendations for Technical AI Safety Research Directions

alignment.anthropic.com

January 11, 2025 at 3:25 AM

Reposted by Peter Hase

Sam Bowman

@sleepinyourhat.bsky.social

New work from my team at Anthropic in collaboration with Redwood Research. I think this is plausibly the most important AGI safety result of the year. Cross-posting the thread below:

Title card: Alignment Faking in Large Language Models by Greenblatt et al.

December 18, 2024 at 5:47 PM

Reposted by Peter Hase

ACL

@aclmeeting.bsky.social

We invite nominations to join the ACL2025 PC as reviewer or area chair(AC). Review process through ARR Feb cycle. Tentative timeline: Review 1-20 Mar 2025, Rebuttal is 26-31 Mar 2025. ACs must be available throughout the Feb cycle. Nominations by 20 Dec 2024:
shorturl.at/TaUh9 #NLProc #ACL2025NLP

Volunteer to join ACL 2025 Programme Committee

Use this form to express your interest in joining the ACL 2025 programme committee as a reviewer or area chair (AC). The review period is 1st to 20th of March 2025. ACs need to be available for variou...

forms.gle

December 16, 2024 at 12:28 AM

Peter Hase

@peterbhase.bsky.social

Recruiting reviewers + ACs for ACL 2025 in Interpretability and Analysis of NLP Models
- DM me if you are interested in emergency reviewer/AC roles for March 18th to 26th
- Self-nominate for positions here (review period is March 1 through March 20): docs.google.com/forms/d/e/1F...

Volunteer to join ACL 2025 Programme Committee

Use this form to express your interest in joining the ACL 2025 programme committee as a reviewer or area chair (AC). The review period is 1st to 20th of March 2025. ACs need to be available for variou...

docs.google.com

December 10, 2024 at 10:39 PM

Reposted by Peter Hase

Serena Booth

@reniebird.bsky.social

I'm hiring PhD students at Brown CS! If you're interested in human-robot or AI interaction, human-centered reinforcement learning, and/or AI policy, please apply. Or get your students to apply :). Deadline Dec 15, link: cs.brown.edu/degrees/doct.... Research statement on my website, www.slbooth.com

Brown CS: Applying To Our Doctoral Program

We expect strong results from our applicants in the following:

cs.brown.edu

December 7, 2024 at 9:12 PM

Peter Hase

@peterbhase.bsky.social

I will be at NeurIPS next week!

You can find me at the Wed 11am poster session, Hall A-C #4503, talking about linguistic calibration of LLMs via multi-agent communication games.

December 6, 2024 at 12:34 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news