banner
laurenbjiang.bsky.social
@laurenbjiang.bsky.social
CS PhD at UPenn | Research Intern at Microsoft OAR | Foundation Models | Reasoning | Post-Training | Personalization | Multimodality | She/Her
Check our paper and data here
📖 Paper: arxiv.org/pdf/2512.06688
🤗 Data: huggingface.co/datasets/bo...

🎉 Huge thanks to my amazing collaborators, mentors, and advisors to make this work possible.

🧵(5/5)
December 22, 2025 at 7:25 PM
✅ Personalization can be incentivized through Reinforcement Learning. PersonaMem-v2 provides rich metadata that RL needs for verifiable rewards.

🏆 With this data, a small reasoning model, Qwen3-4B, outperforms GPT-5, and an agentic memory delivers SOTA personalization with 16× efficiency.

🧵(3/5)
December 22, 2025 at 7:25 PM
🎯 In real-world, users usually don't state their preferences explicitly to AI, so in PersonaMem-v2, most user preferences are only implicitly revealed in contexts.

🌈 It spans 1000 personas and 20,000+ preferences over 300+ topics, enabling richer training and evaluation for personalized AI.

🧵(4/5)
December 22, 2025 at 7:25 PM
🌍✨ Personalized Intelligence is receiving increasing attention from many top-tier AI labs in industry.

This reflects a broader shift: AI that serves millions of users shall move beyond one-size-fits-all behaviors, enabling long-term user engagements.

🧵(2/5)
December 22, 2025 at 7:25 PM
👥 Authors: Bowen Jiang*, Zhuoqun Hao*, Young-Min Cho, Bryan Li, Yuan Yuan, Dr. Sihao Chen, Prof. Lyle Ungar, Prof. Camillo J. Taylor, Prof. Dan Roth (*co-first authors) 🏛️ University of Pennsylvania (8/8)
April 23, 2025 at 6:00 PM
💵 Generating long-context data can be both scalable and cost-effective! We develop a modular data curation pipeline to synthesize persona-oriented, multi-session user–model conversations with long context. (7/8)
April 23, 2025 at 6:00 PM
Our findings (continue)

🗣️ LLMs recall basic facts and preferences fine -- but struggle to apply your latest preferences in their responses.
🚨Hardest part? Applying your preferences in new situations.
🔍 RAG and 🧠 external memory modules help in personalization. (6/8)
April 23, 2025 at 6:00 PM
Our findings👇

📊 Gemini-1.5, GPT-4.5, and GPT-4.1 lead in overall accuracy, but still hover around 52% on multiple-choice.
🤔 Reasoning models (o4-mini, o1, and DeepSeek-R1) do not outperform their non-reasoning peers. (5/8)
April 23, 2025 at 6:00 PM
Each sample contains a user-LLM interaction history that ends with a user query -- as a multiple-choice for the chatbot to answer. All options are plausible, but only one is tailored to the user's current profile. (4/8)
April 23, 2025 at 6:00 PM
Paper Title -- Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale

📄 arXiv arxiv.org/pdf/2504.14225
🌐 Project Page zhuoqunhao.github.io/PersonaMem....
🐙 GitHub github.com/bowen-upenn...
🤗 Hugging Face huggingface.co/datasets/bo...
(3/8)
bowen-upenn/PersonaMem · Datasets at Hugging Face
huggingface.co
April 23, 2025 at 6:00 PM
🎨7 personalization skills tested in 15 scenarios
👩🏻‍💻Evaluate LLM's ability to understand evolving persona from 180+ multi-session user-chatbot conversation history
🌟Realistic long-context evaluation up to 1M tokens (2/8)
April 23, 2025 at 6:00 PM