Hongli Zhan ✈️ ICML
banner
hongli-zhan.bsky.social
Hongli Zhan ✈️ ICML
@hongli-zhan.bsky.social
http://honglizhan.github.io
PhD Candidate 🤘@UTAustin | previously @IBMResearch @sjtu1896 | NLP for social good
I'll be at #ICML to present SPRI next week! Come by our poster on Tuesday, July 15, 4:30pm, and let’s catch up on LLM alignment! 😃

🚀TL;DR: We introduce Situated-PRInciples (SPRI), a framework that automatically generates input-specific principles to align responses — with minimal human effort.

🧵
July 8, 2025 at 3:05 PM
I’m excited to share that our paper has been accepted at #ICML2025! 🎉🥳🎊

This work was done during my internship at IBM Research, and it wouldn’t have been possible without a top-notch team and my amazing advisor 👏
May 2, 2025 at 9:27 PM
[4/5] In addition, when applying SPRI to generate SFT data for alignment, we observe substantial improvement on TruthfulQA.
February 6, 2025 at 10:44 PM
[3/5] We tested SPRI on 3 tasks: generating 1) cognitive reappraisals, 2) instance-specific rubrics for LLM-as-a-judge, and 3) SFT data for alignment.

SPRI turns out to work great for tasks that require complex principles, showcasing on-par performance as expert-guided methods.
February 6, 2025 at 10:44 PM
[2/5] Short for Situated-PRInciples, SPRI involves 2 stages: 1) synthesizing context-situated principles, and 2) crafting principle-guided responses.

In each stage, a base model and a critic model are used to create principles and responses from scratch through critique-refine.
February 6, 2025 at 10:44 PM
Constitutional AI works great for aligning LLMs, but the principles can be too generic to apply.

Can we guide responses with context-situated principles instead?

Introducing SPRI, a system that produces principles tailored to each query, with minimal to no human effort.

arxiv.org/pdf/2502.03397
February 6, 2025 at 10:43 PM